The Composite Endpoint Protocol (CEP): High-performance partial content distribution
by Weigle, Eric Haynes, Ph.D., UNIVERSITY OF CALIFORNIA, SAN DIEGO, 2007, 201 pages; 3268351

Abstract:

This dissertation introduces the Composite Endpoint Protocol (CEP) which solves two related problems: large-scale high performance transfers, and partial content distribution. Achieving high performance in large-scale networks, with speeds above 1Gbps and latency up to 200ms, is difficult; individual machines can not fully exploit overall system capacity, and existing protocols (e.g. TCP) have well-known problems. Similarly, while whole-file content distribution is well studied, when individual clients each desire different parts of a file new techniques are required. The core algorithms and abstractions needed to exploit large scale networks or provide sub-file distribution semantics do not exist.

The underlying problem is fundamental: transfer scheduling. Given a set of heterogeneous nodes which have data and nodes which need some subset of that data, perform transfers to best satisfy all nodes’ demands. No strong semantics are implied here; subsets of this data may be replicated, missing, not fall on block/word boundaries, etc. The solution is a transfer scheduler which implicitly or explicitly specifies which nodes transfer what data and when.

CEP solves the transfer scheduling problem using minimal centralization for metadata/scheduling and infrastructure for fully distributed data transmission. Hybrid centralized/distributed algorithms and heuristics dynamically generate the most desirable transfers as system state evolves. In this way, CEP enables both large-scale high performance transfers and provides rich partial content distribution semantics. The dissertation includes the following contributions: (1) An efficient mechanism for multiple heterogeneous nodes/processes (a composite endpoint) to take part in a single logical connection, where core algorithms run in O( n log n) for the common case; (2) Simple, flexible interfaces for describing data layouts and composite endpoint communication, backed by a general mathematical abstraction; (3) Multiple transfer scheduling algorithms which produce high performance (over 10 Gbps), high resolution, and when possible provably optimal output, with detailed analysis of each; (4) A scalable and robust composite endpoint architecture which supports tens of thousands of participants and transparently survives server failures.

We describe the theoretical and real-world underpinnings of this problem, including in-depth analysis of the algorithms involved, discuss two implementations of the Composite Endpoint Protocol, as provide an empirical evaluation showing the benefits of CEP under a variety of conditions: over 10× faster than Apache, BitTorrent, DHTs, or uniform striping techniques.

 
AdviserAndrew A. Chien
SchoolUNIVERSITY OF CALIFORNIA, SAN DIEGO
SourceDAI/B 68-05, p. , Nov 2007
Source TypeDissertation
SubjectsComputer science
Publication Number3268351
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3268351
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.