Free factories: From the quantum coreworld to the Personal Genome Project
by Zaranek, Alexander Wait, Ph.D., HARVARD UNIVERSITY, 2009, 132 pages; 3351021

Abstract:

This dissertation develops technical and governance infrastructure for a "free factory" by building on parallels with free and open source software and related communities. By viewing varied technologies and people as comprising free factories—or a federation of co-operating and competing factories with certain common ideals and infrastructure—I argue many scientific questions become easier to answer.

In the first chapter, I briefly summarize the dissertation. I then describe the hardware, staff and other resources required to implement the computational aspects of a free factory with reasonable economies of scale. In the next chapter, I use the infrastructure to search for DNA and RNA editing events in more than 600 million genomic traces from ten organisms at NCBI. I find numerous examples of traces that support the existence of these phenomena and set the stage for a more comprehensive investigation. The subsequent chapter uses the same tools to analyze four individual human genomes for variants of clinical interest. This work demonstrates such analyses need not lead to costly or harmful medical workup. In the last chapter, I describe the initial data release of the Personal Genome Project. The release is derived from two gigabases of targeted sequence data from ten individuals. I investigate the quality of the data by comparison with Affymetrix 500K SNPs and discuss one variant of clinical interest. This data release—linking scientists, physicians and members of the general public—demonstrates the utility of free factories for advancing the state-of-the-art in personalized, genomic medicine.

In Appendix A, I indicate how the Quantum Coreworld—earlier work on a digital evolution system consistent with the rules of quantum information processing—could efficiently use free factories. Such projects could allow free factories to fully utilize idle resources. Finally, in Appendix B, a novel, open-source primary data analysis pipeline is used to reprocess 100 gigabytes of image data derived from the exome of a Personal Genome Project participant. This approach demonstrates a 14% increase in placeable reads, on the PGP sample, over the vendor's pipeline.

 
AdviserGeorge M. Church
SchoolHARVARD UNIVERSITY
SourceDAI/B 70-03, p. , May 2009
Source TypeDissertation
SubjectsBioinformatics; Biophysics; Computer science
Publication Number3351021
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3351021
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.