Complementary compiler and architecture features for embedded VLIW processors
by Stotzer, Eric J., Ph.D., UNIVERSITY OF HOUSTON, 2010, 166 pages; 3470430

Abstract:

High performance embedded computing is characterized by data-intensive loop kernels and abundant parallelism. As the computational requirements and complexity for such systems continue to increase, technologies are adapted from the general-purpose and scientific computing domains while still meeting strict embedded system cost, performance, and power constraints. To meet this challenge, embedded processor designers rely extensively on architectural parallelism to increase system performance and compact instruction encodings to reduce program code size.

Instruction-level parallelism (ILP) is a combination of compilation techniques and architectural features that exploit the fine-grain parallelism present at the machine instruction level. Very long instruction word (VLIW) processors are designed to exploit ILP and have multiple functional units partitioned into clusters with local register files. VLIW processors require optimizing compilers that statically schedule resources before program execution. Due to limits in the scalability of single processor systems and improvements in the transistor density of integrated circuits, multiple VLIW processors are now connected together along with specialized accelerators on single chip systems. This situation creates new challenges and opportunities for compilers to exploit both course-grain thread level parallelism (TLP) executing on multiple processors and fine-grain ILP within each processor.

In this dissertation, we first present novel complementary compiler and architecture technologies that improve the power and performance efficiency of an existing embedded VLIW processor. This is accomplished by reducing program code size and improving the performance of software-pipelined loops. We then present a model for prototyping and programming tightly coupled accelerators. Finally, we show the initial results of implementing OpenMP for an embedded multicore VLIW processor. The experimental results show that the combination of architecture enhancements and compiler optimizations dramatically improve the efficiency of embedded application code.

 
Advisor
SchoolUNIVERSITY OF HOUSTON
SourceDAI/B 71-08, p. , Aug 2010
Source TypeDissertation
SubjectsElectrical engineering; Computer science
Publication Number3470430
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3470430
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.