How Co-Occurrence can Complement Semantics?

Atanas Kiryakov and Borislav Popov, Ontotext

Analysis of texts is an obvious way for semantic annotation and extraction of structured knowledge. A basic task is the recognition of references to entities (people, locations, organizations, etc). A next step is relation extraction, e.g. identifying that an organization is located in a particular city. Automatic extraction of such relations is a tough linguistic problem - the solutions are either very partial, expensive to implement, or slow. On the other hand, relationships are crucial for the usability of the extracted knowledge for navigation and search purposes.
We demonstrate how efficient co-occurrence analysis, performed on top of semantic annotation, can be used for several purposes: relation extraction, faceted search, and popularity timelines. The faceted search interface allows an easy way for augmenting full-text search by means of entity references, derived through co-occurrence profiling and semantic relationships. Although this sort of analytics can be used in virtually any domain, their development within the KIM platform was driven by the requirements for news analysis and research. We demonstrate the usage of these interfaces on top of 1 million news articles - a corpus of the major international news for the last five years.
This sort of co-occurrence analysis has the potential of aiding identity resolution, which is recognized to be a crucial problem for several tasks: cross-document co-reference resolution, record linkage, object linking, and data integration

Atanas Kiryakov is head of Ontotext lab (of Sirma Group) - an outstanding Semantic Web technology provider; the laboratory is involved in research projects for more than 80 MEuro. Kiryakov joined Sirma as a software engineer in 1993 to become partner and member of the board later on and found Ontotext in year 2000. His current research interests are in semantic annotation and search, large-scale semantic repositories and reasoning, ontology design, information extraction. He is author of more than 20 articles and book chapters.

Borislav Popov joined Ontotext short after its founding and participated in several projects related to semantic annotation. He was in charge of the Ontotext's participation in several projects, including PrestoSpace and MediaCampaign, concerning A/V archive management and media research. He is author of more than 10 articles in areas ranging from Hidden-Markov Models to Semantic Annotation. Popov is leading the development of the KIM semantic annotation platform since year 2002. His current research interests are massive-scale semantic annotation, web mining and innovative approaches to searching




