We propose a new approach that automatically parallelizes Java programs. The approach collects trace information online during program execution and dynamically recompiles methods that can be executed in parallel. We also describe a cost/benefit model that makes parallelization decisions, as well as a parallel execution environment for running parallelized code. We implement these techniques on Jikes RVM. Finally, we evaluate our approach by parallelizing sequential benchmarks and comparing the performance to the manually parallelized version of those benchmarks. According to experimental results, our approach incurs low overheads and achieves competitive speeds compared to manually parallelized code.1 IntroductionMultiprocessing has already become mainstream in both personal computers and servers. Even on embedded devices, CPUs with 2 or more processors are increasingly used. However, software development is currently failing to catch up with hardware. Designing multiprocessor computer programs is still a difficult task and requires a lot of experience and skills. Furthermore, a large number of legacy programs designed for single-processor computers are still running and need to be parallelized for better performance. All these facts require a good approach for parallelizing programs. The compiler that automatically parallelizes source code into executable code for multiprocessor computers is a very promising solution to this challenge. Many research works have been conducted in this field, which show the capability of compiler-based parallelization, especially for scientific and numerical applications. However, there are several limitations in these approaches due to the lack of dynam...... middle of paper ......e.g. To address this problem, we compact a section of continuous addresses into a data record, which in our experiments has been shown to save a lot of memory space. We also try to simplify dependency analysis by introducing the dependent section, which is a section in a trace containing all instructions dependent on another trace. For example, a single loop has 100 instructions, and the 80th and 90th instructions carry dependency between iterations of the loop. In this case, the dependent section of the loop body track is between 80 and 90 after dependency analysis. The dependent section is used based on the observation that mostly only a small section of instructions in a trace carries dependency. Furthermore, using a single dependent section for each trace significantly reduces synchronization/blocking overheads in the busy waiting mode used in our parallel execution model.
tags