Automatic conflation of digital gazetteer data: Summary of research and demonstration
by Hastings, Jordan Towner, Ph.D., UNIVERSITY OF CALIFORNIA, SANTA BARBARA, 2009, 167 pages; 3379473

Abstract:

Basic knowledge of the whereabouts and whatabouts of places on Earth was organized historically in hardcopy gazetteers. With the advent of computers, gazetteers are reëmerging in digital form, rapidly expanding in number and sophistication. To take full advantage of digital gazetteers – to validate contents, fill gaps in coverage, etc. – it is often necessary to consult and combine results from multiple sources of gazetteer data, which is tedious for humans and currently not done by machines. This dissertation defines a conceptual data structure together with operational procedures for addressing these problems.

Conflating – meaningfully combining – the contents of digital gazetteers entails matching up their entries in such a way that the identity of places is preserved. People have relatively little difficulty in deciding if two places are (probably) the same. We look first at location: places that are not collocated are unlikely to be the same. Almost concurrently, we require proximity between the places’ type classifications, notwithstanding vagueness in the typing schemes. Finally, we examine the places’ names, making wide allowances for spelling, abbreviations, directional suffixes, etc. Names that are sufficiently similar may support our assessment that a common place exists; dissimilar names do not necessarily dissuade us if other evidence is solid.

In this work, automated conflation of digital gazetteer data is accomplished using a troika of geospatial, geotaxial, and geonomial metrics that mimic the human cognitive process. The major aspects of place identification that are qualitative for humans are made explicit and quantitative in computations with these metrics. The approach is demonstrated for major hydrographic features in the Lake Tahoe basin, between California and Nevada. Overall, more than 85 per cent of the duplicate features identified by the automated system are correctly matched, as judged by human review.

 
AdviserMichael F. Goodchild
SchoolUNIVERSITY OF CALIFORNIA, SANTA BARBARA
SourceDAI/A 70-11, p. , Dec 2009
Source TypeDissertation
SubjectsGeography; Information science
Publication Number3379473
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3379473
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.