Knowledge is commonly modeled as a network of inter-related concepts. The Semantic Web and Linked Data arcs of research have generated networks of 25 billion nodes built on this linked-concept foundation. Substantial research efforts fuel the effort to expand the size and breadth of these data sets. Other research efforts have focused on building systems that utilize the data sets. Still other research uses the data sets as an object of study in and of themselves. This study explores the topology of a large-scale knowledge network to identify clusters of connected concepts known, in network-theoretic terms, as community structure.
Specifically, this study extends current research by investigating domains as community structures within ontologies. First, I frame ontologies as operationalizations of knowledge. This orientation reflects the extensive scholarship that shows that modeling knowledge as a network of unweighted relations between two concepts provides a useful, if imperfect, framework.
I hypothesize that domains emerge as clusters of tightly connected concepts within large-scale ontologies, and I adopt the community structure paradigm for identifying those clusters within the YAGO ontology and several variants. I demonstrate that standard network-theoretic community structure methods can identify domains within an ontology.
Quantitatively, this study provides evidence that domains can be iteratively extracted from large-scale ontologies, suggesting a hierarchy of domains that correspond to domains of varying granularity. Further, the study provides evidence for structural features such as domains, facets and upper ontologies.
Qualitatively, the study captures the bias in the YAGO ontology toward logico-deductive assertions in contrast to associative relations between concepts. The YAGO dataset is comprised of factual assertions and extracted hyperlinks which omit some actual relations between concepts and oversimplify the strength of other relations.
Methodologically, community structure algorithms are shown to produce varying results in the identification of clusters of concepts, even when using modularity-optimizing techniques. This phenomenon may indicate a need for a more robust model of ontology structure and/or employment of alternate community structure identification approaches that are emerging from network theorists.