Announcement for Downloading full text filePlease respect the Copyright Act.
All digital full text dissertation and theses from this website are authorized the copyright owners. These copyrighted full-text dissertation and theses can be only used for academic, research and non-commercial purposes. Users of this website can search, read, and print for personal usage. In respect of the Copyright Act of the Republic of China, please do not reproduce, distribute, change, or edit the content of these dissertations and theses without any permission. Please do not create any work based upon a pre-existing work by reproduction, Adaptation, Distribution or other means.
URN etd-0813108-143807 Statistics This thesis had been viewed 2776 times. Download 1449 times. Author Ming-Yuan Zhong Author's Email Address No Public. Department Computer Science and Enginerring Year 2007 Semester 2 Degree Master Type of Document Master's Thesis Language English Page Count 46 Title Power Improvement Using Block-Based Loop Buffer with Innermost Loop Control Keyword Basic block Trace cache Innermost loop Loop buffer Loop buffer Innermost loop Trace cache Basic block Abstract A loop buffer is a memory located between CPU and level one instruction cache, called IL1 hereafter. The difference between the loop buffer and the cache dedicate for instructions is that the loop buffer only keeps the instructions in sequence. Therefore it contains the advantages of smaller size and high speed over the main cache. The instruction fetch unit can obtain the maximum benefit from loop buffer while the size of loop buffer is large enough to contain whole instructions in a loop, the instructions just need to be fetched from the cache only one time and then it can deliver instructions to CPU core at very low energy level.
In the previous researches, the controller begins to detect the innermost loop at the fetch stage. The branches whether are predicted taken or not taken mainly depend on the branch predictor. Once the backward branches or the forward branches in the loop are miss-predicted, the controlled have to flush the instructions in the buffer, detect and refill a new loop from the main cache. Especially, the forward branches are so instable that the predictor cannot bring its value into play. Instead, this appearance will cause more wasted fetch power. Here, we attempt to lead the concept of a trace cache, which is quiet bulky and complicated in the architecture of the loop buffer. If using a trace cache as a loop buffer, we do save the energy. Contrarily, it debases the integral performance due to long latency at fetch stage. We therefore propose these methods of (1) doing innermost loop detection at commit stage and filling/active at fetch stage; and (2) assisting loop buffer in storing the innermost loops with forward branches to pack the instructions captured from the instruction cache as basic blocks. With the preceding modifications, we hope to strengthen the loop buffer for gaining performance and reducing more power.
Results with SPEC2000 indicate that up to 45% (integer benchmarks) and 55% (floating benchmarks) of reductions in instruction fetch power compared with the design without loop buffer. Furthermore, we got 3% (integer benchmarks) and 2% (floating benchmarks) of power improvement than the design of the loop buffer that deal with loops at fetch stage.
Advisor Committee Jong-Jiann Shieh - advisor
Chia-Ming Chang - co-chair
Rung-Bin Lin - co-chair
Files Date of Defense 2008-06-26 Date of Submission 2008-08-13