CPSC 330 - Spring 2009 - Exam 2 Name: ___________________ No calculators. 1. Consider the MIPS store word (sw) instruction as implemented on the datapath given in Figure 4.2. sw Rt, Offset(Rs) // Mem[ Reg[Rs] + sign_ext(Offset) ] <- Reg[Rt] Circle the correct value 0 or 1 for the control signals (a-d) and circle whether each of the three muxes (e-f) selects its upper input, lower input, or don't care. For the ALU operation (h) circle one of the function names. (The Zero condition signal will be assumed to be 0.) (8 pts.) (a) Branch = 0 1 (b) MemRead = 0 1 (c) RegWrite = 0 1 (d) MemWrite = 0 1 (e) Mux1 (upper left; output to PC) = upper, lower, don't care (f) Mux2 (upper middle; output to Regs) = upper, lower, don't care (g) Mux3 (lower middle; output to ALU) = upper, lower, don't care (h) ALU operation = and, or, add, subtract, set-on-less-than, nor 2. For the MIPS instruction sequence below, draw the data dependency graph and identify any data dependencies as RAW, WAR, or WAW. (Destination registers are listed first except in the case of sw.) (6 pts.) i1: lw $1, 0($5) // Reg[1] <- Memory[ Reg[5] + 0 ] i2: lw $2, 4($5) // Reg[2] <- Memory[ Reg[5] + 4 ] i3: add $3, $1, $2 // Reg[3] <- Reg[1] + Reg[2] i4: sw $3, 8($5) // Memory[ Reg[5] + 8 ] <- Reg[3] 3. For the MIPS instruction sequence given in question 2, show the pipeline cycle diagrams for the standard 5-stage pipeline without forwarding. Assume register file writes occur in the first half cycle and reads in the second half cycle. (4 pts.) i1: lw $1, 0($5) // Reg[1] <- Memory[ Reg[5] + 0 ] i2: lw $2, 4($5) // Reg[2] <- Memory[ Reg[5] + 4 ] i3: add $3, $1, $2 // Reg[3] <- Reg[1] + Reg[2] i4: sw $3, 8($5) // Memory[ Reg[5] + 8 ] <- Reg[3] 4. For the MIPS instruction sequence given in question 2, show the pipeline cycle diagrams for the standard 5-stage pipeline with forwarding. (4 pts.) 5. Describe how delayed branches work in the standard 5-stage pipeline. (5 pts.) 6. Find the average CPI for the following instruction set workload running on the standard 5-stage pipeline we studied in class but instead of delayed branches the pipeline uses a branch prediction scheme that has a 50% accuracy and the branch outcome (i.e., both the BTA and the decision) is available in the EX stage of the pipeline. (5 pts.) type | freq -------+------ alu | 0.5 branch | 0.2 ld/st | 0.3 Associate each term or statement below with aspects of branching. Circle A or P, for Address or Prediction, respectively. Note that some questions may require both to be circled. (2 pts. each) 7. A P BTAC 8. A P BHT 9. A P BTB 10. A P gshare 11. Which term in the CPU time equation is most influenced by a multiple- issue processor (superscalar or VLIW)? Explain your answer. (5 pts.) 12. Explain what the term "false dependency" means and how it can impact performance in an out-of-order superscalar processor. What can be done in hardware to eliminate false dependencies? (5 pts.) Extra Credit. (2 pts. each for identifying a processor or processor family. Additional 2 pts. each for giving the correct instruction execution rate, but note that the rate may be model-dependent for processor families so rate should be given for a specific model.) XC-A. Identify a superscalar processor. What is its peak execution rate? XC-B. Identify a VLIW processor. What is its peak execution rate? XC-C. Identify an EPIC processor. What is its peak execution rate? 13. Identify at least three difference between DRAM and SRAM. (6 pts.) 14. Consider a memory burst of 5-2-2-2 that transfers a total of 16 bytes. If the bus is clocked at 200 MHz and there is no bus arbitration overhead, what is the bus bandwidth? (5 pts.) 15. Explain the difference between temporal locality and spatial locality. (6 pts.) 16. Consider a computer system that contains a cache with 1-cycle access time and a memory with 10-cycle access time. If the hit rate is 75% and a cache miss requires both a single cache access and a single memory access, what is the average memory access time (AMAT)? (5 pts.) 17. Consider a 4 GB byte-addressable main memory with a level 1 data cache that is two-way set-associative, 1 MB in size, has an 8-byte line size. a) How many total lines are there in cache? (not just per bank) (2 pts.) b) How many lines are there in bank? (2 pts.) c) Show how the main memory address is partitioned into fields for the cache access and give the bit lengths of those fields. (6 pts.) 18. Assume a 256-byte main memory and a four-line cache with four bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for the different cache placement schemes. Also, show the final contents of the cache. (The byte addresses are in decimal.) a) direct-mapped (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 b) two-way set associative with LRU replacement (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 c) fully-associative with FIFO replacement (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22