Tree-based methods to model dependent data
by Mendez, Guillermo, Ph.D., ARIZONA STATE UNIVERSITY, 2008, 151 pages; 3304864

Abstract:

It is well known that observations gathered in the real world are not perfectly independent as is assumed in many data analyses. Typically a linear mixed model (LMM) is used when the clustering structure is known and it provides a powerful tool. Small area estimation, that is, predicting population means of a variable of interest, when only a few observations from each population are sampled, is one application of LMMs. In contrast to LMMs, most supervised learning methods do not take into account whether the data have some clustered structure. Random forests, one such method which uses many randomized decision trees, is a popular algorithm used to model large complex data sets because it tends to produce accurate predictions.

In this dissertation, two estimators of residual variance are proposed using random forest and they are studied through simulations. A robust modeling technique for mixed-effects data is then proposed, called Mixed Random Forest (MRF), that uses regression trees and accounts for the data's clustered structure. The performance of the MRF algorithm is compared to that of LMMs for different underlying functions and different values of the variance components. The MRF method is shown to perform better in terms of mean squared prediction error (MSPE) when the underlying function is complex, such as conditionally linear. The theoretical MSPE of the predicted group mean is also derived and an estimator of the MSPE is proposed. The performance of the MSPE estimator is investigated via simulations and the results back up the theoretical result. The MRF method is applied to data from the American Community Survey in the small area estimation context.

 
Advisor
SchoolARIZONA STATE UNIVERSITY
SourceDAI/B 69-03, p. , Jun 2008
Source TypeDissertation
SubjectsMathematics; Statistics; Computer science
Publication Number3304864
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3304864
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.