CPSC 330 - Fall 2008 Homework 4 Due on Friday, Oct. 24 (10/23/08: corrected cycle calculations in third example) Example: Consider a computer system that contains a cache with 1-cycle access time and a memory with 10-cycle access time. If the hit rate is 75% and a cache miss requires both a single cache access and a single memory access, what is the average memory access time (AMAT)? AMAT = .75*(1 cycle) + .25*(1 cycle + 10 cycles) = 3.5 cycles 1. Question 1.6 of your textbook, p. 23. [paraphrasing] (a) Assume that execution time is proportional to instruction fetch time and that an access to the icache is 20 times faster than an access to main memory. Let the hit rate be 96% and the hit time (i.e., cache access time) be t_c. The miss time is t_c + t_m. (Note: the question reads as if it should be t_c + t_m + t_c, but ignore that interpretation.) Compute the speedup factor for the cache as the ratio of the MAT (t_m, the memory access time without cache) to the AMAT (average memoy access time). (b) If we double the icache size with the result that the miss rate is cut in half, calculate the resulting speedup. Example: Consider a 16 MB main memory with a 32 KB direct-mapped cache and 8 bytes per line. (a) How many total lines are there in cache? # lines = 32K bytes / 8 bytes per line = 4K lines (b) Identify the main memory address fields and sizes. # memory address bits = log_base_2( 16 M ) = 24 # index bits = log_base_2( 4 K ) = 12 # line offset bits = log_base_2( 8 ) = 3 # tag bits = 24 - ( 12 + 3 ) = 9 23 15 14 3 2 1 0 +------------------+------------------------+------+ | 9-bit tag | 12-bit index |3-bit | | | |offset| +------------------+------------------------+------+ (c) How many comparators must operate in parallel to perform tag-matching? There is only one comparator in a direct-mapped cache. 2. Consider a 4 GB byte-addressable main memory with a three-way set-associative cache of 768 KB and 32 bytes per line. (a) How many total lines are there in cache? (not just per bank) (b) Identify the main memory address fields and sizes. (c) How many comparators must operate in parallel to perform tag-matching? Example: Consider a word-addressed computer system with one-word instructions. If a program has an initial sequential execution section and then a loop with instruction addresses as follows .-----. | 0-10| `-----' .---->| | .-----. | |11-30| three iterations | `-----' `-----' the instruction fetch address stream is 0,1,...,9,10, 11,12,...,29,30, 11,12,...,29,30, 11,12,...,29,30 \iteration 1/ \iteration 2/ \iteration 3/ Consider a 16-word direct-mapped instruction cache with 4 words per line. Determine the number of cache hits and misses when executing the program above. index _________________ memory block mapping 0 |v|tag|__/__/__/__| 0- 3, 16-19, ... 1 |v|tag|__/__/__/__| 4- 7, 20-23, ... 2 |v|tag|__/__/__/__| 8-11, 24-27, ... 3 |v|tag|__/__/__/__| 12-15, 28-31, ... cache reference analysis cache block first second third start address iteration iteration iteration 0 .-----. 0 miss (1) | | 1 hit | 0-10| 2 hit | | 3 hit 4 | | 4 miss (2) | | 5 hit | | 6 hit | | 7 hit 8 | | 8 miss (3) | | 9 hit `-----' 10 hit .---->| | .-----. 11 hit 11 miss (9) 11 miss (13) 12 | | | 12 miss (4) 12 miss (10) 12 miss (14) | |11-30| 13 hit 13 hit 13 hit | | | 14 hit 14 hit 14 hit | | | 15 hit 15 hit 15 hit 16 | | | 16 miss (5) 16 hit 16 hit | | | 17 hit 17 hit 17 hit | | | 18 hit 18 hit 18 hit | | | 19 hit 19 hit 19 hit 20 | | | 20 miss (6) 20 hit 20 hit | | | 21 hit 21 hit 21 hit | | | 22 hit 22 hit 22 hit | | | 23 hit 23 hit 23 hit 24 | | | 24 miss (7) 24 miss (11) 24 miss (15) | | | 25 hit 25 hit 25 hit | | | 26 hit 26 hit 26 hit | | | 27 hit 27 hit 27 hit 28 | | | 28 miss (8) 28 miss (12) 28 miss (16) | | | 29 hit 29 hit 29 hit | `-----' 30 hit 30 hit 30 hit `-----' miss 1 - 0- 3 fills empty line 0 miss 2 - 4- 7 fills empty line 1 miss 3 - 8-11 fills empty line 2 miss 4 - 12-15 fills empty line 3 miss 5 - 16-19 replaces 0- 3 in line 0 miss 6 - 20-23 replaces 4- 7 in line 1 miss 7 - 24-27 replaces 8-11 in line 2 miss 8 - 28-31 replaces 12-15 in line 3 miss 9 - 8-11 replaces 24-27 in line 2 miss 10 - 12-15 replaces 28-31 in line 3 miss 11 - 24-27 replaces 8-11 in line 2 miss 12 - 28-31 replaces 12-15 in line 3 miss 13 - 8-11 replaces 24-27 in line 2 miss 14 - 12-15 replaces 28-31 in line 3 miss 15 - 24-27 replaces 8-11 in line 2 miss 16 - 28-31 replaces 12-15 in line 3 Thus there are 55 hits and 16 misses, for a total of 71 references. Assume for a moment that instruction execution time is based solely on the instruction fetch timiing. If the instruction cache access time is 1 cycle, the memory access time is 5 cycles, and cache misses require 1 cycle to miss and 5 cycles per word to refill, what is the instruction fetch timing? 55 cache hits @ 1 cycle = 55 cycles 16 cache misses @ 21 cycles (1+5*4)= 336 cycles total = 391 cycles If supported in the hardware, a burst transfer from memory could speed up the timing. If a 5-1-1-1 burst is available, what is the timing? 55 cache hits @ 1 cycle = 55 cycles 16 cache misses @ 9 cycles (1+5+1+1+1) = 144 cycles total = 199 cycles 3. Consider a word-addressed computer system with one-word instructions. Consider a program with an initial sequential execution section, followed by a loop, and ending with a sequential execution section. If the instruction addresses are as follows .-----. | 0-5 | `-----' .---->| | .-----. | | 6-11| three iterations | `-----' `-----' | .-----. |12-15| `-----' the instruction fetch address stream is 0,1,2,3,4,5, 6,7,8,9,10,11, 6,7,8,9,10,11, 6,7,8,9,10,11, 12,13,14,15 \iteration 1/ \iteration 2/ \iteration 3/ (a) Consider a 4-word direct-mapped instruction cache with one word per line. Determine the number of cache hits and misses when executing the program above. index ________ memory block mapping 0 |v|tag|__| 0, 4, ... 1 |v|tag|__| 1, 5, ... 2 |v|tag|__| 2, 6, ... 3 |v|tag|__| 3, 7, ... What is the total number of cycles spent in execution if a cache hit requires 1 cycle and a cache miss requires 4 total cycles (i.e., 1 cycle to miss plus 3 cycles to perform a memory read). (b) Consider a 4-word direct-mapped instruction cache with 2 words per line. Determine the number of cache hits and misses when executing the program above. index ___________ memory block mapping 0 |v|tag|__/__| 0-1, 4-5, ... 1 |v|tag|__/__| 2-3, 6-7, ... What is the total number of cycles spent in execution if a cache hit requires 1 cycle and a cache miss requires 1 cycle to miss plus a 3-1 burst transfer for cache refills? (c) Consider an 8-word direct-mapped instruction cache with one word per line. Determine the number of cache hits and misses when executing the program above. index ________ memory block mapping 0 |v|tag|__| 0, 8, ... 1 |v|tag|__| 1, 9, ... 2 |v|tag|__| 2, 10, ... 3 |v|tag|__| 3, 11, ... 4 |v|tag|__| 4, 12, ... 5 |v|tag|__| 5, 13, ... 6 |v|tag|__| 6, 14, ... 7 |v|tag|__| 7, 15, ... What is the total number of cycles spent in execution if a cache hit requires 1 cycle and a cache miss requires 4 total cycles (i.e., 1 cycle to miss plus 3 cycles to perform a memory read). (d) Consider an 8-word direct-mapped instruction cache with 2 words per line. Determine the number of cache hits and misses when executing the program above. index ___________ memory block mapping 0 |v|tag|__/__| 0-1, 8-9, ... 1 |v|tag|__/__| 2-3, 10-11, ... 2 |v|tag|__/__| 4-5, 12-13, ... 3 |v|tag|__/__| 6-7, 14-15, ... What is the total number of cycles spent in execution if a cache hit requires 1 cycle and a cache miss requires 1 cycle to miss plus a 3-1 burst transfer for cache refills?