Support for dynamic management of parallelism in chip multiprocessors
by Contreras Salas, Gilberto, Ph.D., PRINCETON UNIVERSITY, 2008, 147 pages; 3323178

Abstract:

In recent years, the microprocessor industry has been revolutionized by the introduction of the chip multiprocessor (CMP). Created as an alternative to single-core designs, CMPs promise to mitigate two of the most serious challenges of modern high-performance singlecore processors: design complexity and power consumption [63][65][75].

Workloads that rely on throughput are likely to benefit from CMP architectures with modest effort. However, extending the performance potential of CMPs broadly to sequential applications remains a difficult problem. Conventional compiler approaches have largely failed to extract sufficient thread-level parallelism from single-threaded applications to take advantage of many cores [80][90], leaving it to the programmer to extract cost-effective parallelism.

With the purpose of creating easy-to-use tools for the development of parallel applications, industry and academia have developed parallel runtime systems and libraries that allow programmers to focus their efforts on the identification of parallelism rather than worrying about how parallelism is managed and/or mapped to the underlying architecture [15][38][39][42][56][76][81]. Dynamic management of parallelism, or the ability to take created parallelism and dynamically assign it to available execution resources, is currently used by many runtime libraries such as OpenMP and the Intel Threading Building Blocks runtime library to provide improved performance. While parallel runtime libraries make it easier for programmers to develop parallel code, software-based dynamic management of parallelism inflicts a performance cost on parallel applications as the runtime library is called to make runtime decisions. For aggressively-annotated parallel code, usage of software-based runtime libraries implies the possibility of exposing software management overheads, which at significant levels can render the existing parallelization approach cost-ineffective. Moreover, with parallelism management cost increasing with increasing core counts, performance portability of applications across large core counts is severely affected.

This dissertation proposes a low-overhead, low-latency dynamic parallelism management solution aimed at improving parallelism performance. The proposed solution not only allows parallel applications to make effective use of large core counts, but it also allows them to gracefully adapt to dynamic changes in system characteristics such as core-speed and core-count variations. To this end, this work sets forth four overarching goals: (1) perform an in-depth characterization of two popular parallel runtime libraries with the goal of identifying some of the benefits and shortcomings in their dynamic management of parallelism; (2) provide a detailed study of how software-based approaches are able to, or fail to, mitigate performance heterogeneity caused by technology variations; (3) develop parallelism redistribution policies that utilize global information with the aim of improving load balancing and performance scalability; and (4) describe Squadron, a comprehensive framework aimed at providing superior performance through low-overhead, low-latency dynamic management of parallelism capable of achieving performance improvements ranging from 18% to 13X over existing software-based solutions.

The end result of this dissertation is a detailed study of dynamic management of parallelism in software, as well as its performance potential under hardware support. The characterization results presented in this work can help runtime system designers create more optimal designs by offering valuable insights into some of the major sources of overheads currently limiting the scalability of software solutions. Squadron serves as the first step in the development of an attractive solution for future CMP architectures looking to offer superior parallelism performance through specialized hardware support.

 
AdviserMargaret R. Martonosi
SchoolPRINCETON UNIVERSITY
SourceDAI/B 69-08, p. , Nov 2008
Source TypeDissertation
SubjectsElectrical engineering
Publication Number3323178
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3323178
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.