The topic of my master's degree thesis is Virtual Memory. More specifically, it is about a low-power Translation Lookaside Buffer design. You can look at my
previous posts for an introduction.
We have proven that more than 97% of virtual adress reading is done from the same page. So it is clear that the motivation of putting the TLB in low-power mode is a good idea if the cost of changing power mode is not too high.
The motivation part is ok. Now what I need is calculating the latency that is caused by putting the TLB in low-power mode. After that, I need to put there a mini-TLB, and simulate the situation. If the result is not better, then I'll say that mini-TLB is not required.
Note: Virtualbox->Settings->Display->Enable 3d acceleration
We check if the power mode is changed for every instruction:
if (itlb) {
if (CACHE_TAGSET(itlb, IACOMPRESS(fetch_regs_PC)) == itlb->last_tagset) {
if (low_power == 0) {
power_mode_changed = 1;
} else {
power_mode_changed = 0;
}
low_power = 1;
++low_power_count;
} else {
if (low_power == 1) {
power_mode_changed = 1;
} else {
power_mode_changed = 0;
}
low_power = 0;
++normal_power_count;
}
tlb_lat = cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC),
NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL);
if (power_mode_changed) {
tlb_lat += 1;
}
So if the power mode is changed, we add 1 clock cycle to the latency. Because according to Zhao et. al. we can use 1 cycle approximately. [1]
After running
SPEC2000 programs with and without power-mode changes, I calculated the cost of changing power mode. Increase in total cycles depend on the latency caused by switching power mode.
GZIP
Latency: 1 cycle
normal: 580713144
power mode: 626491951
increment:
Latency: 10 cycle
normal: 580713144
power mode: 907701677
increment:
Latency: 25 cycle
normal: 580713144
power mode: 1388998717
increment:
Latency: 75 cycle
normal: 580713144
power mode: 2994729275
increment:
Latency: 100 cycles
normal: 580713144
power mode: 3797723125
increment: %18
crafty: latency=100 cycle: 5507522361
latency=75 cycle: 4325626411
latency=25 cycle: 1963406203
latency=10 cycle: 1256704284
latency=1 cycle: 864566362
latency=0 cycle: 800535108
parser: latency=0 cycle: 651830989
latency=1 cycle: 682037147
latency=10 cycle: 964905768
latency=25 cycle: 1483275240
latency=75 cycle: 3228328486
latency=100 cycle: 4101332934
MESA
Latency: 1 cycle
normal: 706604
power mode: 710210
increment: % 0.5
Latency: 10 cycle
normal: 706604
power mode: 873556
increment: %23
Latency: 25 cycle
normal: 706604
power mode: 1153888
increment: %63
Latency: 75 cycle
normal: 706604
power mode: 2092983
increment: %196
Latency: 100 cycles
normal: 706604
power mode: 2562917
increment: %262
Latency: 8000 cycles
total cycle: 706570
power mode: 151051283
increment: %212
MCF
Latency: 1 cycle
normal: 16270
power mode: 17236
increment: %5
Latency: 10 cycles
normal: 16270
power mode: 21796
increment: %33
Latency: 25 cycles
normal: 16270
power mode: 29826
increment: %83
Latency: 75 cycles
normal: 16270
power mode: 58668
increment: %260
Latency: 100 cycles
normal: 16270
power mode: 73243
increment: %350
BZIP2
Latency: 1 cycle
normal: 562851
power mode: 564804
increment:%0.34
Latency: 10 cycles
normal: 562851
power mode: 573788
increment: %1.9
Latency: 25 cycles
normal: 562851
power mode: 590123
increment: %4.8
Latency: 75 cycles
normal: 562851
power mode: 647554
increment: %15
Latency: 100 cycles
normal: 562851
power mode: 676534
increment: %20
PARSER
Latency: 1 cycles
normal: 15593
power mode: 16596
increment: %6
Latency: 10 cycles
normal: 15593
power mode: 20507
increment: %31
Latency: 25 cycles
normal: 15593
power mode: 27416
increment: %75
Latency: 75 cycles
normal: 15593
power mode: 52106
increment: %234
Latency: 100 cycles
normal: 15593
power mode: 64581
increment: %314
VPR
Latency: 1 cycle
normal: 19343
power mode: 20829
increment: %7
Latency: 10 cycle
normal: 19343
power mode: 27955
increment: %44
Latency: 25 cycle
normal: 19343
power mode: 40560
increment: %109
Latency: 75 cycle
normal: 19343
power mode: 83936
increment: %333
Latency: 100 cycles
normal: 19343
power mode: 105786
increment: %446
EQUAKE
Latency: 1 cycle
normal: 22668
power mode: 24636
increment: %8
Latency: 10 cycles
normal: 22668
power mode: 34961
increment: %54
Latency: 25 cycles
normal: 22668
power mode: 52939
increment: %133
Latency: 75 cycles
normal: 22668
power mode: 114399
increment: %404
Latency: 100 cycles
normal: 22668
power mode: 145374
increment: %541
[1] Zhao, L. E. I., et al. "A leakage efficient instruction tlb design for embedded processors." IEICE TRANSACTIONS on Information and Systems 94.8 (2011): 1565-1574.
Comments