Off-chip bandwidth for multicore processors: Managing the next big wall
by Ahsan, Bushra, Ph.D., CITY UNIVERSITY OF NEW YORK, 2010, 101 pages; 3408483

Abstract:

As we approach billion transistors on chip, the number of on-chip cores is skyrocketing. With the number of on-chip cores increasing, the traffic generated from these cores is also increasing. Recent studies have shown that this surge of traffic in multicores is bad news for supercomputing design. This is due to off-chip contention amongst applications running on multiple cores.

Traffic in a multicore system is divided into on-chip traffic (traffic amongst cores) and off-chip traffic (traffic from chip to memory). The off-chip traffic is mainly generated by on-chip cache hierarchy and is divided into traffic towards memory, due to writebacks, and from memory, due to read misses. There is a huge body of research on managing cache hierarchies, improving their performance and hence reducing the number of cache misses. Bandwidth requirement has always been of secondary importance. In the multicore and many-core era, this is no longer the case. The cache hierarchy designer must take into account both cache performance and traffic generated by the cache in order not to put pressure on the available bandwidth. If off-chip bandwidth is not managed, a 16 core machine will not give much performance benefit over a dual core machine.

In most processor architectures, the cache hierarchy consists of several private caches per core, followed by a shared Last-Level Cache (LLC). This LLC is the last wall before hitting off-chip and is the cause of off-chip bandwidth traffic i.e. the writebacks. LLC, therefore, is a highly important factor in off-chip traffic generation. We manage the LLC in order to attain overall off-chip bandwidth management in a multicore system. In this thesis various methods to improve bandwidth by reducing traffic towards memory are proposed. We present hardware and hybrid techniques of varying complexities that work in rhyme to manage bandwidth for multicores. All techniques proposed to save bandwidth require very little overhead and reduce off-chip traffic considerably while not effecting overall performance. By bandwidth management we come closer to the ultimate goal of supercomputer on chip.

 
AdvisersMohamed Zahran; Tarek N. Saadawi
SchoolCITY UNIVERSITY OF NEW YORK
SourceDAI/B 71-07, p. , Jul 2010
Source TypeDissertation
SubjectsComputer engineering; Electrical engineering
Publication Number3408483
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3408483
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.