Modeling execution and predicting performance in multi-GPU environments
by Schaa, Dana, M.S., NORTHEASTERN UNIVERSITY, 2009, 76 pages; 1468036

Abstract:

Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains, due to the potential for approaching or exceeding the performance of a large cluster of CPUs with a single GPU for many parallel applications. Obtaining high performance on a single GPU has been widely researched, and researchers typically present speedups on the order of 10-100X for applications that map well to the GPU programming model and architecture. Progressing further, we now wish to utilize multiple GPUs to continue to obtain larger speedups, or allow applications to work with more or finer-grained data.

Although existing work has been presented that utilizes multiple GPUs as parallel accelerators, a study of the overhead and benefits of using multiple GPUs has been lacking. Since the overhead affecting GPU execution are not as obvious or well-known as with CPUs, developers may be cautious to invest the time to create a multiple-GPU implementation, or to invest in additional hardware without knowing whether execution will benefit. This thesis investigates the major factors of multi-GPU execution and creates models which allow them to be analyzed. The ultimate goal of our analysis is to allow developers to easily determine how a given application will scale across multiple GPUs.

Using the scalability (including communication) models presented in this thesis, a developer is able to predict the performance of an application with a high degree of accuracy. For the applications evaluated in this work, we saw an 11% average difference and 40% maximum difference between predicted and actual execution times. The models allow for the modeling of both various numbers and configurations of GPUs, and for various data sizes—all of which can be done without having to purchase hardware or fully implement a multiple-GPU version of the application. The performance predictions can then be used to select the optimal cost-performance point, allowing the appropriate hardware to be purchased for the given applications needs.

 
AdviserDavid Kaeli
SchoolNORTHEASTERN UNIVERSITY
SourceMAI/ 48-01, p. , Oct 2009
Source TypeThesis
SubjectsElectrical engineering
Publication Number1468036
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:1468036
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.