Supporting knowledge discovery in data stream management systems
by Thakkar, Hetal M., Ph.D., UNIVERSITY OF CALIFORNIA, LOS ANGELES, 2008, 126 pages; 3354423

Abstract:

A growing number of applications, including network traffic monitoring and highway congestion analysis, continuously generate massive data streams. Management of these streams presents many new research challenges, which include Quality of Service (QoS) guarantees, window and other synopses. Therefore, many research projects have focused on building Data Stream Management Systems (DSMSs) to address these challenges [ACC03, ABW03, CCD03]. However, all of these systems are limited to simple continuous queries over data streams, i.e., they do not support advanced applications, such as data stream mining. However, such advanced applications are critical in many real-world scenarios, such as web click-stream analysis, market basket data mining, and credit card fraud detection. The importance of data stream mining is further illustrated by research projects focusing on devising fast & light algorithms for online mining [CWY04, JQS03, CZ04, WFY03, EKS98, FOR06, MTZ08]. However, besides devising fast & light algorithms deployment of online data stream mining methods presents many difficult challenges. In particular data stream mining methods must be deployed with all essentials that DSMSs provide for simpler applications, including QoS, load shedding, and synopses. Thus, in this dissertation we extend a DSMS into an online data mining workbench by the following research advances: (1) The power of our DSMS, namely Stream Mill, and its language were extended to support more advanced queries, such as online mining, sequence queries, etc., by extending the query language (namely SQL), (2) A suite of online mining algorithms are integrated into the DSMS, to provide advanced mining techniques, such as ensemble-based methods [WFY03, CZ04, FORO6]), and (3) Data mining models and workflows are introduced to support specification of the complete mining process. This stimulates ease-of-use, since all users can now simply invoke the workflow, as opposed to recreating the flow by himself/herself. The framework also allows experts to add new mining algorithms. We demonstrate that the resulting data stream mining workbench achieves performance and extensibility, which are unmatched, even by static mining workbenches.

 
AdviserCarlo Zaniolo
SchoolUNIVERSITY OF CALIFORNIA, LOS ANGELES
SourceDAI/B 70-04, p. , Aug 2009
Source TypeDissertation
SubjectsComputer science
Publication Number3354423
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3354423
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.