We evaluated the efficiency, power consumption, and useful resource consumption of Clockhands, STRAIGHT, and present RISC. We additionally developed a Clockhands soft-core processor written in SystemVerilog and used it for {hardware} analysis.
We used a cycle-accurate simulator, Onikiri2 [45], for the efficiency analysis and McPAT [20] for the power consumption analysis. Onikiri2 is an execution-driven simulator just like gem5 [3], however it may well simulate extra detailed pipeline conduct, together with numerous speculations and replays. We applied a Clockhands 32-bit 166-instruction RV64G-compatible ISA on Onikiri2. We additionally prolonged Onikiri2 to simulate the Clockhands pipeline conduct precisely. The parameters of the processors used within the analysis are listed in Desk 2. The parameters of the six-fetch mannequin are derived from the parameters of Apple M1 processor [14]. Within the bigger fashions, we aggressively enlarged the ROB as a result of it doesn’t have advanced capabilities comparable to associative search within the present mainstream structure, whereas conservatively enlarged the scheduler and the load-store queue due to their advanced construction and the controversial nature of their expandability.
The benchmark applications used for our analysis have been bzip2, mcf_s, lbm_s, and xz_s included in SPEC2006/2017 [40, 41] and CoreMark [8]. We use these benchmarks, that are written fully in C, as a result of we’re presently solely in a position to develop a C compiler, as C++/Fortran compilers are very advanced and require a substantial amount of effort to develop. We used consultant areas for every program utilized in a earlier STRAIGHT examine [17]. We modified them in order that they include > 50M directions for SPEC benchmarks.
The benchmark applications have been compiled utilizing LLVM [19]. Our compiler was constructed on high of LLVM model 12.0.1 and applied the algorithms described in Part 6. The compiler for RISC-V is one with the identical model of LLVM, and the compiler for STRAIGHT was obtained from the authors of the prevailing examine [13].
7.2 Outcomes
1) Efficiency: Fig. 13 reveals the efficiency of every mannequin. This determine reveals the inverse of the cycles elapsed to run the benchmark, normalized by the worth in RISC-V. This end result signifies that the efficiency of Clockhands is sort of the identical as that of RISC-V whereas offering the benefit of no want for renaming. Within the 6-fetch and above fashions, the efficiency enchancment continues as much as 16-fetch, despite the fact that we used a configuration of the identical back-end complexity. The efficiency of Clockhands is 97.9%, 97.3%, 98.9%, 100.0%, and 101.6% of that of RISC-V, in 4-fetch, 6-fetch, 8-fetch, 12-fetch, and 16-fetch mannequin, respectively. The efficiency of Clockhands is 9.9%, 7.6%, 6.6%, 6.5%, and seven.2% greater than that of STRAIGHT, in 4-fetch, 6-fetch, 8-fetch, 12-fetch, and 16-fetch mannequin, respectively.
Clockhands reveals equal to or higher efficiency than STRAIGHT in all of the benchmarks. In CoreMark, Clockhands reveals greater efficiency than RISC-V because of quicker restoration from department mispredictions, just like STRAIGHT. In bzip2, Clockhands reveals efficiency equal to or higher than RISC-V because of quicker restoration from department mispredictions. Though STRAIGHT has the identical property, the efficiency degradation because of elevated instruction rely is bigger. In mcf_s, Clockhands reveals decrease efficiency than RISC-V as a result of it nonetheless has extra directions than RISC-V, though the variety of directions is enormously lowered than STRAIGHT, as described under. In lbm_s, as described under, in contrast to STRAIGHT, Clockhands succeeded in dealing with long-life values and was in a position to scale back the variety of mv and load directions, so its efficiency is about the identical as RISC-V. In xz_s, STRAIGHT and Clockhands present efficiency degradation because of instruction execution order that’s completely different from RISC-V on account of distance adjustment. It is because xz_s is a program that makes use of up the integer arithmetic unit, and the instruction order enormously impacts the latency.
2) Vitality Consumption: Fig. 14 reveals the power comparability. The Clockhands processor saved 7.4% within the 8-fetch mannequin, 17.5% within the 12-fetch mannequin, and 24.4% within the 16-fetch mannequin, in comparison with the RISC-V one owing to the elimination of the renaming course of. The adoption of distance expressions has eradicated the necessity for renaming, and the variety of directions has hardly elevated, leading to a major discount in energy consumption.
3) Instruction Breakdown: Fig. 15 reveals a breakdown of the sorts of directions executed. The variety of directions executed in Clockhands was lowered by enormously lowering the variety of mv and nop directions. As well as, the variety of load and retailer directions, which tended to extend in STRAIGHT, was lowered. Because of this, the variety of directions executed in Clockhands was efficiently lowered to the identical degree as RISC-V. Our compiler continues to be underdeveloped, and we anticipate to additional scale back the variety of directions by additional enchancment.
4) Hand Utilization: Fig. 16 reveals the distribution of which hand was written to. As talked about in Part 4.3, the t hand, the place short-term values are written, is probably the most generally used. The v hand, which holds loop constants, is written much less usually however learn extra usually, which is per what can be anticipated from the character of the loop constants. Additionally, the s hand is written extraordinarily few instances however learn many instances; it’s because it holds values which might be referenced many instances, comparable to SP and arguments. In mcf_s, the place there are various perform calls, the s hand is usually used to place in arguments, as described in Part 4.4.
5) Register Lifetime: Fig. 17 reveals the register lifetime. In STRAIGHT, the distribution ends at 127, the utmost reference distance. RISC-V and Clockhands have comparable distributions, which signifies that Clockhands efficiently handles long-life values. Evaluating RISC-V and Clockhands, Clockhands has longer vertical and horizontal traces, particularly in lbm_s. It is because a number of variables co-located in a single hand may have comparable lifetimes.
To additional make clear why Clockhands ISA was in a position to handle long-life values, we’ll assessment the lifetime for every hand. Fig. 18 reveals the register lifetime for every hand. The lifetime of registers within the t hand was as quick as about 100 as a result of short-term values are written in it as described in Part 4.3. The lifetime of registers within the u hand, the place values with longer lifetime are written, was longer than that of the t hand. The lifetime of registers within the v hand, the place loop constants are written, was extra longer. The lifetime of registers within the s hand, the place SP and performance arguments are written, had completely different properties than the others. It is extremely quick in mcf_s and really lengthy within the others, which may be very completely different. That is because of the frequent perform calls in mcf_s. On the whole, SP and performance arguments have an extended lifetime, however this isn’t the case with frequent perform calls. The explanation Clockhands ISA can cope with long-life values is that now we have used hand on this method.
Desk 3:Useful resource utilization of sentimental processors.
Structure
Look-up tables
Flip-flops
LUTs
FFs
4-way
RISC-V
2310
998
101483
31081
STRAIGHT
442
572
96631
28769
Clockhands
401
560
99913
30968
8-way
RISC-V
12309
7521
190380
45708
STRAIGHT
787
1092
188118
43928
Clockhands
761
1086
185701
42254
16-way
RISC-V
30230
14938
350377
63338
STRAIGHT
1641
2132
354105
57214
Clockhands
1432
2162
349074
55220
6) {Hardware} Complexity: Clockhands structure doesn’t complicate {hardware}. The useful resource utilization of (a) RISC-V, (b) STRAIGHT, and (c) Clockhands processors for FPGA is summarized in Desk 3. For our analysis, we used RV32IM-compatible FPGA-optimized out-of-order comfortable processor RSD [22] as a baseline, however with modifications for every structure. We evaluated three front-end widths: 4, 8, and 16. We confirmed that CoreMark [8] program runs appropriately and the comfortable processor runs on Xilinx Virtex UltraScale FPGA XCVU440. This desk reveals {that a} Clockhands processor may be constructed with equal or fewer assets than a RISC-V processor. Due to the space illustration, a light-weight bodily register allocation is realized. This property is common no matter fetch width.