branch prediction in the IF stage .---------------------------------------- correct PC value | | .------------------------------------- misprediction signal | | (also flushes pipeline) | | | | | | +-+ | | IF stage | | ID stage | | | | V V | | `.---. +------+ +--------+ | | next PC |mux|-->| PC |-+-->| icache |-->| | .---' +------+ | +--------+ | | / | | | .-^-. | | | .-->|mux| | | | | `---' .----. | | | | ' `--<---| +4 |<--+ | | | | `----' | | | | | | | | | | BTA +------+ | | | | `---<----| BTAC |<--+ | | | +------+ | | | o<- no match in BTAC | | | ^ | | | pipeline | | | | latch | prediction +------+ | | | between `-----<------| BHT |<--' | | IF and ID +------+ +-+ let the branch being predicted reside at "X" let the branch target address be "Y" predict untaken X: bz Y IF ID ALU MEM WB X+4: sub r3,r1,r2 IF ID ALU MEM WB X+8: ... IF ID ALU MEM WB .---.X+4+------+ +--------+ |.--|-->| X |-+-->| icache |-> "bz Y" X+4.---' +------+ | +--------+ .---. | .-->| \| | | `---' X+4 .----. X | | ' `------| +4 |<--+ | | `----' | | Y +------+ X | | ` - < - -| BTAC |<--+ BTAC has address Y matching branch address X | +------+ | |pred untaken+------+ X | `-----<------| BHT |<--' BHT predicts untaken for branch address X +------+ (or no match for address X in BTAC) predict taken X: bz Y IF ID ALU MEM WB Y: add r6,r4,r5 IF ID ALU MEM WB Y+4: ... IF ID ALU MEM WB .---. Y +------+ +--------+ |.--|-->| X |-+-->| icache |-> "bz Y" Y .---' +------+ | +--------+ .---. | .-->|/ | | | `---' X+4 .----. X | | ' ` - - -| +4 |<--+ | | `----' | | | Y +------+ X | | `---<----| BTAC |<--+ BTAC has address Y matching branch address X | +------+ | | pred taken +------+ X | `-----<------| BHT |<--' BHT predicts taken for branch address X +------+ dynamic branch prediction 1. hardware keeps branch history in BHT (branch history table) a. one-bit history reflects last occurrence b. two-bit saturating counter helps with loops c. BHT is usually indexed by low-order bits of PC 2. hardware keeps branch target addresses in BTAC (branch target address cache) 3. BHT and BTAC may be combined into a single BTB (branch target buffer) 4. hardware return address stack deals with returns to different call sites (invisible to software) two-level adaptive prediction 1. branch history shift register (BHSR) keeps taken/not-taken pattern of last set of branches encountered 2. pattern history table (PHT) keeps history for each specific pattern from first-level 3. Intel Pentium II/III uses a BTB indexed by the PC, entries contain a branch target address and a 4-bit BHSR which is used to index into one of four 16-entry PHTs 4. gshare - McFarling (DEC 1993) - xor of global BHSR and branch address is used to access BHT speculative execution 1. not just fetch down predicted path, but start execution 2. write results to buffers but not to registers or cache until you are sure that the branch was correctly predicted