sim-outorder implements a very detailed out-of-order issue superscalar processor with a two-level memory system and speculative execution support. This simulator is a performance simulator, tracking the latency of all pipeline operations.
Let's have a look at SimpleScalar simulator's code related to iTLB. (Instruction Translation Lookaside Buffer)
First, there are simulator options related to iTLB:
/* register simulator-specific options */
void
sim_reg_options(struct opt_odb_t *odb)
{
/* TLB options */
opt_reg_string(odb, "-tlb:itlb",
"instruction TLB config, i.e., {|none}",
&itlb_opt, "itlb:16:4096:4:l", /* print */TRUE, NULL);
opt_reg_int(odb, "-tlb:lat",
"inst/data TLB miss latency (in cycles)",
&tlb_miss_lat, /* default */30,
/* print */TRUE, /* format */NULL);
}
/* check simulator-specific option values */
void sim_check_options(...)
{
/* sim-outorder.c line: 1098*/
/* use an I-TLB? */
if (!mystricmp(itlb_opt, "none"))
itlb = NULL;
else
{
if (sscanf(itlb_opt, "%[^:]:%d:%d:%d:%c",
name, &nsets, &bsize, &assoc, &c) != 5)
fatal("bad TLB parms: ::::");
itlb = cache_create(name, nsets, bsize, /* balloc */FALSE,
/* usize */sizeof(md_addr_t), assoc,
cache_char2policy(c), itlb_access_fn,
/* hit latency */1);
}
}
/* register simulator-specific statistics */
void
sim_reg_stats(struct stat_sdb_t *sdb) /* stats database */
{
/* sim-outorder.c line: 1311*/
if (itlb)
cache_reg_stats(itlb, sdb);
}
/* sim-outorder.c line: 4241*/
if (itlb)
{
/* access the I-TLB, NOTE: this code will initiate
speculative TLB misses */
tlb_lat =
cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC),
NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle,
NULL, NULL);
if (tlb_lat > 1)
last_inst_tmissed = TRUE;
/* I-cache/I-TLB accesses occur in parallel */
lat = MAX(tlb_lat, lat);
}
/* I-cache/I-TLB miss? assumes I-cache hit >= I-TLB hit */
if (lat != cache_il1_lat)
{
/* I-cache miss, block fetch until it is resolved */
ruu_fetch_issue_delay += lat - 1;
break;
}
Roadmap:
1) Run sim-outorder.c with different iTLB options.
Look at related statistics.
2) Run SPEC benchmarks with iTLB options.
Running with iTLB option:
export IDIR=/home/bahadir/simplescalar
$IDIR/simplesim-3.0/sim-outorder -tlb:itlb itlb:16:4096:4:l hello
itlb.accesses 9169 # total number of accesses
itlb.hits 9159 # total number of hits
itlb.misses 10 # total number of misses
itlb.replacements 0 # total number of replacements
itlb.writebacks 0 # total number of writebacks
itlb.invalidations 0 # total number of invalidations
itlb.miss_rate 0.0011 # miss rate (i.e., misses/ref)
itlb.repl_rate 0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate 0.0000 # invalidation rate (i.e., invs/ref)
iTLB related code in cache.h
unsigned int /* latency of access in cycles */
cache_access(struct cache_t *cp, /* cache to access */
enum mem_cmd cmd, /* access type, Read or Write */
md_addr_t addr, /* address of access */
void *vp, /* ptr to buffer for input/output */
int nbytes, /* number of bytes to access */
tick_t now, /* time of access */
byte_t **udata, /* for return of user data ptr */
md_addr_t *repl_addr); /* for address of replaced block */
Question: Does cache_access happen all the time? Let's change the code and make access count = zero
cd $IDIR/simplesim-3.0/
make config-pisa
make
if (0) /* TEST ! */
{
tlb_lat =
cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC),
NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle,
NULL, NULL);
...
}
itlb.accesses 0 # total number of accesses
itlb.hits 0 # total number of hits
itlb.misses 0 # total number of misses
itlb.replacements 0 # total number of replacements
itlb.writebacks 0 # total number of writebacks
itlb.invalidations 0 # total number of invalidations
We are inside this method: (line 4201)
/* fetch up as many instruction as one branch prediction and one cache line
acess will support without overflowing the IFETCH -> DISPATCH QUEUE */
static void ruu_fetch(void)
Where is it called? Inside the main loop:
for (;;)
{
/* commit entries from RUU/LSQ to architected register file */
ruu_commit();
/* service function unit release events */
ruu_release_fu();
/* ==> inserts operations into ready queue --> register deps resolved */
ruu_writeback();
/* decode and dispatch new operations */
ruu_dispatch();
/* call instruction fetch unit if it is not blocked */
if (!ruu_fetch_issue_delay)
ruu_fetch();
else
ruu_fetch_issue_delay--;
/* go to next cycle */
sim_cycle++;
}
Resource: Computer Architecture A Quantitative Approach
Question: How will we measure the energy consumption? (We wont decrease the TLB hit count right?)
Neler olduunu anlamak için kitabı tekrar oku. (RUU nedir. instructionlar nasıl fetch ediliyor??)
We are interested in instructions that access to memory.
Lets say an instruction has read a location in a page. We used the TLB and took the physical adress.
And we loaded the page to memory. Now, our next instruction maybe in the same page or not. How do we know? We only care about instructions right? (which is in the code segment of the executable file).
Paper says iTLB access pattern is different from dTLB access pattern. What's the access pattern?
Paper says we need locality analysis of iTLB references. How?
Question: iTLB is usually on the critical path, so using slow transistors on it may decrease the maximum clock frequency of a processor. (They turned to dynamic leakage control instead?)
Then the paper says there is no room for dynamic leakage control either, because iTLB is a very active component.
NOTE: They used QEMU emulator with in-order execution on MIPS architecture.
NOTE: Lei et.al. : There is high locality in instruction stream: instructions are fetched in program order, conditional jumps tend to jump close by and loops repeat the same code.
Page transition is due to function calls and long distance jumps.
When a program enters a physical page, same-page instruction fetching tends to sustain a long time.
How to detect same-page-hit iTLB references?
Lei et. al. classify page-crossing references in two categories:
1) Two successive instructions that sit on different pages. (WALKING)
2) Branch or jump instructions whose target address locate on a different page. (JUMPING)
The compiler is employed to detect all WALKING references, an explicit bit is inserted into instruction set to inform iTLB.
Comments