On the data mapping problem
by Fletcher, George H. L., Ph.D., INDIANA UNIVERSITY, 2007, 144 pages; 3276692

Abstract:

The emerging networked world promises new possibilities for information sharing and collaboration between autonomous data sources. Facilitating technologies, however, have not successfully addressed the most difficult forms of data heterogeneity which arise in these collaborations, such as differences in the structuring of data and semantic pluralism in the interpretation of data. At the heart of overcoming data heterogeneity is the data mapping problem: automating the discovery of effective mappings between autonomous structured data sources. The data mapping problem is one of the longest standing issues in data management. Fully automating the discovery of mappings is generally recognized as an "AI-complete" problem in the sense that it is as hard as the hardest problems of Artificial Intelligence. Consequently, data mapping solutions have typically focused on discovering restricted types of mappings. More robust solutions must also facilitate discovery of the richer structural and semantic transformations which inevitably arise in coordinating heterogeneous information systems.

In this dissertation, we make the following contributions towards a better understanding of the data mapping problem. (1) We give a novel formal statement of the general data mapping problem and of the important special case of mapping between relational data sources. (2) We propose a generic architecture for data mapping systems and describe an instantiation of this framework in the Tupelo system. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities for relational databases. (3) We present theoretical results on several fundamental questions regarding example-driven mapping discovery in systems such as Tupelo. (4) We present a new declarative formalism for expressing dynamic relational transformations as a tool for further investigations into the relational data-metadata mapping space.

 
Advisor
SchoolINDIANA UNIVERSITY
SourceDAI/B 68-08, p. , Nov 2007
Source TypeDissertation
SubjectsArtificial intelligence; Computer science
Publication Number3276692
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3276692
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.