Software is becoming a larger fraction of engineering effort. Aggressive compiler optimization, in particular those that address loops, can significantly improve the performance of the software, thus justifying the additional compilation time requirements. This is in particular true in the embedded system domain where software has become a key element of the design process and performance/energy is of a critical concern. A particular area in software optimization, where little work has been established, is control flow performance and energy optimization, which is the main focus of this thesis.
Arithmetic expressions are the fundamental building blocks of hardware and software systems. In this thesis we first propose a method for solving the arithmetic expression equivalence problem using partial evaluation. In our method, we use interval analysis to substantially prune the domain space of arithmetic expressions and limit the evaluation effort to a sufficiently limited set of subspaces. Using this technique, we present three novel control flow optimization techniques to improve performance and energy in software.
In the first technique, we present a short-circuit code transformation method to optimize conditional blocks in high-level programs. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed to determine cases when one or the other of the true/false paths are guaranteed to execute. In such cases, code is generated to bypass the evaluation of the conditional expression. In instances when the bypass code is faster to evaluate than the conditional expression, a net performance gain is obtained.
In the second technique, we present a loop transformation method to optimize loops containing nested conditional blocks. Results from conditional expression evaluation combined with loop dependency information are used to partition the iteration space of the nested loop. As a result, the loop nest is decomposed such as to eliminate the conditional test, thus substantially reducing the execution time. Our technique completely eliminates the conditional from the loops (unlike previous techniques) thus further facilitating the application of traditional optimizations and improving the overall speedup.
In the third technique, we present a method to lower the software energy consumption using the technique mentioned above for loop transformation. In essence, each of the smaller loops can be executed at a lower voltage/frequency setting, yielding overall energy reduction.
Applying the proposed transformation techniques on kernels taken from different application domains, we measured a substantial performance and energy improvement.