Recent years have seen a rapid proliferation of information sources e.g., on the World-wide Web, in virtually every area of human endeavor. Such autonomous information sources are based on different ontologies, i.e., conceptualizations of the entities, properties, and relationships in the respective domains of discourse. However, practical applications (e.g., building predictive models from disparate data sources, assembling composite web services using components from multiple repositories) call for mechanisms that bridge the semantic gaps between disparate ontologies using mappings that express terms (concepts, properties, and relationships) in a target ontology in terms of those in one or more source ontologies. Such mappings may be established by domain experts or automatically using tools designed to discover such mappings from data. In either case, it is necessary to check whether the resulting mappings are consistent, and if necessary, make them consistent by eliminating a subset of the mappings. We consider the problem of identifying the largest (maximum) subset of mappings in the restricted, yet practically important setting of hierarchical ontologies. Specifically, we consider mappings that assert that a concept in one ontology is a subconcept, superconcept, or equivalent concept of a concept in another ontology. We model the problem of identifying the largest consistent subset of such mappings between hierarchical ontologies as the problem of identifying the minimum feedback arc set in a directed acyclic graph (DAG). Because identifying minimum feedback arc set is known to be NP-hard, it follows that identifying the maximum subset of consistent mappings between hierarchical ontologies is NP-hard. We then explore several polynomial time algorithms for finding suboptimal solutions including a heuristic algorithm for (weighted) minimum feedback arc set problem in DAGs. We compare the performance of the various algorithms on several synthetic as well as real-world ontologies and mappings.
|School||IOWA STATE UNIVERSITY|
|Subjects||Artificial intelligence; Computer science|
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.