|
|
|
|
|
|
|
|
| Mark Sharp is a Senior Linguist at Corpora Software, which he joined in 2002. His role as a linguist involves the research and design of linguistic strategies that are to be used in Corpora's products. He majored in Linguistics (with a minor in Italian) at the University of Reading. He then went straight on to do a Master's in Computational Linguistics at the University of Essex. Before university, Mark spent some years in the British Civil Service.
|
|
|
|
How Innovative Document Analysis Technologies can improve Information Capture
Mark Sharp, Senior Linguist, Corpora Software, 4 Stirling House, Stirling Road, Guildford, Surrey, GU2 7RF, United Kingdom
This paper discusses recent innovative approaches to unstructured information. During the past few years there has been increasing interest in using linguistic techniques (Natural Language Processing) in order to recover context and meaning from unstructured documents. I discuss some linguistic strategies that can be employed to automatically analyse these documents; these include the use of taxonomies and ontologies for information mining and categorisation.
I explain that it is now possible to automate what once was a completely manual task. One can create fact databases from streams of incoming unstructured data; one need not be dependent on form filling to structure one's information. Using these technologies it is now possible to move earlier to develop a new compound; to make a call on a particular line of research and to identify potential Adverse Drug Reactions (ADR).
I discuss how today's changing workforce is challenging us to better organise our organisation's knowledge, and to be more creative in its retrieval through search. Which search strategies should be employed to help us to quickly and effortlessly find the information we need? I discuss how taxonomies can step in to simplify the user's search experience, if used appropriately.
|
|
|
|
|
|
|