A semantic analysis of XML schema matching for B2B systems integration
by Kim, Jaewook, Ph.D., UNIVERSITY OF MARYLAND, BALTIMORE COUNTY, 2011, 141 pages; 3459936

Abstract:

One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema matching, which is known to be costly and error-prone. Many automatic schema matching approaches have been proposed, but the challenge is still daunting because of the complexity of schemas and immaturity of technologies in semantic representation, measuring, and reasoning.

The dissertation focuses on three challenging problems in schema matching. First, the existing approaches have often failed to sufficiently investigate and utilize semantic information imbedded in the hierarchical structure of the XML schemas. Secondly, due to synonyms and polysemies found in natural languages, the meaning of a data node in the schema cannot be determined solely by the words in its label. Thirdly, it is difficult to correctly identify the best set of matching pairs for all data nodes between two schemas.

To overcome these problems, we propose new innovative approaches for XML schema matching, particularly applicable to XML schema integration and data transformation between heterogeneous e-Business systems. Our research supports two different tasks: integration task between two different component schemas; and transformation task between two business documents which confirm to different document schemas.

For the integration task, we propose an approximate approach that produces the best matching candidates between global type components of two schemas, using their layer specific semantic similarities. For the transformation task, we propose another approximate approach that produces the best sets of matching pairs for all atomic nodes between two schemas, based on their linguistic and structural semantic similarities. We evaluate our approaches with the state of the art evaluation metrics and sample schema sets obtained from several e-Business standard organizations and e-Business system vendors. A variety of computer experiments have been conducted with encouraging results that show the proposed approaches are valuable for addressing difficulties in XML schema matching.

 
AdviserYun Peng
SchoolUNIVERSITY OF MARYLAND, BALTIMORE COUNTY
SourceDAI/B 72-09, p. , Jul 2011
Source TypeDissertation
SubjectsComputer engineering; Information technology; Computer science
Publication Number3459936
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3459936
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.