Chip multiprocessors (CMPs) are becoming increasingly popular as performance improvements brought by increasing clock frequency alone are approaching their limits. Other factors, such as ease of verification and validation of individual cores (as compared to complex unicore architectures) and the ability to exploit both thread level (coarse grain) and instruction level (fine grain) parallelism, also boost trends towards chip multiprocessing.
While CMPs have already made their way into the commercial market, software support for CMPs is still in its infancy, and is expected to be the main roadblock to the effective use of CMPs. As the number of processor cores is expected to keep increasing, how to fully utilize the abundant computing resources on CMPs becomes a critical issue. Two possible ways of exploiting CMPs include a single application scenario and a multiple application scenario. In the first case, the entire CMP is dedicated to a single application at a time. This option can be effectively exploited by applications that are getting increasingly complex and data intensive (particularly large codes from scientific computing, database and embedded image/video processing domains), unless the number of cores on the chip is increased beyond a certain count. In the multi-application scenario, which is also called multi-tasking, multiple (independent) applications are executed on the CMP at the same time. This is expected to be an important alternative at least in the short term especially for applications that have limited parallelism.
Parallelization has been studied for many years since the invention of the first parallel machine. However, there are still a lot of open questions to be solved such as compiler-based automatic parallelization. The multi-core era brings new challenges into the research scope of parallel architecture and applications. For example, resources on a single chip such as caches and inter-connections are now shared by multiple processing elements, which is not the case for traditional parallel or distributed systems.
Motivated by these observations, this dissertation work focuses on how to adapt application executions dynamically to improve the energy-efficiency and quality-of-service by utilizing the application level characteristics in the resource management. There are two major reasons to study the dynamic adaptations of applications. First, an application can have different characteristics and computing requirements during different phases of computations. Second, future computer architecture will have parameter variations and heterogeneity due to process variations or heterogeneous system design. My work investigates dynamic application adaptations, the partition of processors cores among multiple applications, and different thread scheduling schemes for both threads of the same application and threads across concurrently-running independent applications.
Performance has been the most important metric for computing in the past. This has been changed recently as optimization metrics other than performance are becoming increasingly important. These metrics include availability and energy efficiency among others. In many execution scenarios where CMPs are involved, satisfying multiple metrics (e.g., achieving high availability and low energy consumption) can be critical. In the first half of this dissertation, several approaches are studied to adapt the application executions at runtime in order to improve the energy efficiency. Different metrics are used with an emphasis on the tradeoff between performance and energy consumptions. Helper threads are proposed to collect characteristics of the application execution threads and to determine the ideal resource allocation schemes.
Managing quality of service (QoS) and providing service differentiation have drawn a lot of research interest recently, especially as CMPs become prevalent and virtualization environments are widely deployed. The lack of quality assurance on CMPs has become a major concern because concurrently-executing applications or even threads for the same application can compete arbitrarily on shared resources such as cores, shared caches, and off-chip memory bandwidth. Various hardware resource partitioning/reservation schemes have been discussed in the literature. In the second half of this dissertation, a software approach is investigated for QoS by dynamic tuning the time-slices of multi-application workloads in time-sharing operating systems. In order to satisfy the QoS requirements specified by users, a formal feedback control framework is built to dynamically tune the process nice values and associated CPU time-slices for simultaneously running applications. Experimental results show that the proposed framework can successfully track the quality of service targets and provide service differentiation.