The WeakReference class, monitoring memory leak and garbage collection in a Java application

Image
 Below is a Stack implementation that uses an internal resizeable array structure.  public class MyStack< T > implements Stack< T > { private static final int CAPACITY = 100 ; private Object[] array ; private int pos = 0 ; public MyStack () { this . array = new Object[ CAPACITY ] ; } @Override public void push ( T item) { if ( pos >= array . length / 2 ) { Object[] newArray = new Object[ pos * 2 ] ; System. arraycopy ( array , 0 , newArray , 0 , array . length ) ; array = newArray ; } array [ pos ++] = item ; } @Override public T pop () { if (isEmpty()) { throw new RuntimeException( "empty stack" ) ; } @SuppressWarnings ( "unchecked" ) T item = ( T ) array [ pos - 1 ] ; pos -= 1 ; return item ; } @Override @SuppressWarnings ( "unchecked" ) public T peek...

Returning to SimpleScalar and TLB

The topic of my master's degree thesis is Virtual Memory. More specifically, it is about a low-power Translation Lookaside Buffer design. You can look at my previous posts for an introduction.  

We have proven that more than 97% of virtual adress reading is done from the same page. So it is clear that the motivation of putting the TLB in low-power mode is a good idea if the cost of changing power mode is not too high.

The motivation part is ok. Now what I need is calculating the latency that is caused by putting the TLB in low-power mode. After that, I need to put there a mini-TLB, and simulate the situation. If the result is not better, then I'll say that mini-TLB is not required.

Note: Virtualbox->Settings->Display->Enable 3d acceleration

We check if the power mode is changed for every instruction:

 if (itlb) {

    if (CACHE_TAGSET(itlb, IACOMPRESS(fetch_regs_PC)) == itlb->last_tagset) {

    if (low_power == 0) {
                power_mode_changed = 1;
    } else {
                power_mode_changed = 0;
    }

    low_power = 1;
    ++low_power_count;

    } else {

            if (low_power == 1) {
                power_mode_changed = 1;
        } else {
                power_mode_changed = 0;
        }

            low_power = 0;
            ++normal_power_count;
    }

    tlb_lat = cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC),
                     NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle,      NULL, NULL);


   if (power_mode_changed) {
           tlb_lat += 1;
   }

So if the power mode is changed, we add 1 clock cycle to the latency. Because according to Zhao et. al. we can use 1 cycle approximately. [1]


After running SPEC2000 programs with and without power-mode changes, I calculated the cost of changing power mode. Increase in total cycles depend on the latency caused by switching power mode.


GZIP

Latency: 1 cycle
normal:  580713144
power mode: 626491951
increment:

Latency: 10 cycle
normal: 580713144
power mode: 907701677
increment:

Latency: 25 cycle
normal: 580713144
power mode: 1388998717
increment:

Latency: 75 cycle
normal: 580713144
power mode: 2994729275
increment:

Latency: 100 cycles
normal: 580713144
power mode: 3797723125
increment: %18



crafty: latency=100   cycle: 5507522361
latency=75  cycle: 4325626411
latency=25  cycle: 1963406203
latency=10  cycle: 1256704284
latency=1   cycle: 864566362
latency=0   cycle: 800535108

parser: latency=0  cycle: 651830989
    latency=1  cycle: 682037147
    latency=10 cycle: 964905768
    latency=25 cycle: 1483275240
    latency=75 cycle: 3228328486
    latency=100 cycle: 4101332934


MESA

Latency: 1 cycle
normal: 706604
power mode: 710210
increment: % 0.5

Latency: 10 cycle
normal: 706604
power mode: 873556
increment: %23

Latency: 25 cycle
normal: 706604
power mode: 1153888
increment: %63

Latency: 75 cycle
normal: 706604
power mode: 2092983
increment: %196

Latency: 100 cycles
normal: 706604
power mode: 2562917
increment: %262

Latency: 8000 cycles
total cycle: 706570
power mode: 151051283
increment:  %212

MCF

Latency: 1 cycle
normal: 16270
power mode: 17236
increment: %5

Latency: 10 cycles
normal: 16270
power mode: 21796
increment: %33

Latency: 25 cycles
normal: 16270
power mode: 29826
increment: %83

Latency: 75 cycles
normal: 16270
power mode: 58668
increment: %260

Latency: 100 cycles
normal: 16270
power mode: 73243
increment: %350

BZIP2

Latency: 1 cycle
normal: 562851
power mode: 564804
increment:%0.34

Latency: 10 cycles
normal: 562851
power mode: 573788
increment: %1.9

Latency: 25 cycles
normal: 562851
power mode: 590123
increment: %4.8

Latency: 75 cycles
normal: 562851
power mode: 647554
increment: %15

Latency: 100 cycles
normal: 562851
power mode: 676534
increment: %20



PARSER

Latency: 1 cycles
normal: 15593
power mode: 16596
increment: %6

Latency: 10 cycles
normal: 15593
power mode: 20507
increment: %31

Latency: 25 cycles
normal: 15593
power mode: 27416
increment: %75

Latency: 75 cycles
normal: 15593
power mode: 52106
increment: %234


Latency: 100 cycles
normal: 15593
power mode: 64581
increment: %314


VPR

Latency: 1 cycle
normal: 19343
power mode: 20829
increment: %7

Latency: 10 cycle
normal: 19343
power mode: 27955
increment: %44

Latency: 25 cycle
normal: 19343
power mode: 40560
increment: %109

Latency: 75 cycle
normal: 19343
power mode: 83936
increment: %333

Latency: 100 cycles
normal: 19343
power mode: 105786
increment: %446

EQUAKE

Latency: 1 cycle
normal: 22668
power mode: 24636
increment: %8

Latency: 10 cycles
normal: 22668
power mode: 34961
increment: %54

Latency: 25 cycles
normal: 22668
power mode: 52939
increment: %133

Latency: 75 cycles
normal: 22668
power mode:  114399
increment: %404

Latency: 100 cycles
normal: 22668
power mode: 145374
increment: %541



[1] Zhao, L. E. I., et al. "A leakage efficient instruction tlb design for embedded processors." IEICE TRANSACTIONS on Information and Systems 94.8 (2011): 1565-1574.

Comments

Popular posts from this blog

Trie Data Structure and Finding Patterns in a Collection of Words

My Crappy Looking Solution to "Binary Tree Common Ancestor" Problem

A Graph Application in Java: Using WordNet to Find Outcast Words