False-sharing example False sharing occurs when two non-shared variables are allocated in the same cache line. This example assumes a two-processor, snoopy cache-coherent SMP system. The caches use two words (8 bytes) per line, write-back, write-invalidate, and MSI cache coherency protocol. The bus transactions for the system are: READ - read cache line from memory or another cache RIM - read with intent to modify - read a cache line in preparation for writing bytes held within INV - invalidate cache line Each bus transaction above involves a complete cache line. Other bus transactions may include: WB - write back (complete line) WT - write through (single word) MSI protocol self-transitions (based on action by this processor) current state x action => new state (bus transaction) in invalid state in shared state in modified state --------------------- --------------------- --------------------- I x read => S (READ) S x read => S (none) M x read => M (none) I x write => M (RIM) S x write => M (INV) M x write => M (none) bus-driven transitions (based on action by another processor) current state x bus transaction => new state [*] * == cache preempts memory read and writes back in invalid state in shared state in modified state ---------------- --------------- ----------------- I x READ => I S x READ => S M x READ => S * I x RIM => I S x RIM => I M x RIM => I * I x INV => I S x INV => I M x INV => I same transitions and actions expressed in a different table format bus - bus action by initiating processor self - current state and transition in that processor's cache pre - preemptive WB on bus by another cache holding the same block in state M other - current state and transition in the other processor's cache read write ---------------------- ---------------------- bus self pre other bus self pre other ---- ---- ---- ----- ---- ---- ---- ----- miss READ I->S I miss RIM I->M I READ I->S S RIM I->M S->I READ I->S WB M->S RIM I->M WB M->I ---- ---- ---- ----- ---- ---- ---- ----- hit none S I hit INV S->M I none S S INV S->M S->I none M I none M I ---- ---- ---- ----- ---- ---- ---- ----- notes: (1) not shown, but on a miss, if the line to be replaced has state M, then the current processor must do a WB (2) the three combinations (self,other) of (S,M), (M,S), (M,M) are not allowed - the caches can never enter these states (3) a fourth state - exclusive - would prevent the invalidate traffic for write hit on a line that was only in one cache Let processor 0 use the 4-byte word at address 0x100, and let processor 1 use the 4-byte word at address 0x104. The caches are initially empty. processor 0 processor 1 bus ---------------------------- ---------------------------- action action cache contents action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- READ read 100 S 0x100 init init I ----- ---- ---- INV write 100 M 0x100 p0w0 init I ----- ---- ---- RD/WB S 0x100 p0w0 init read 104 S 0x100 p0w0 init INV I ----- ---- ---- write 104 M 0x100 p0w0 p1w0 RD/WB read 100 S 0x100 p0w0 p1w0 S 0x100 p0w0 p1w0 (none) S 0x100 p0w0 p1w0 read 104 S 0x100 p0w0 p1w0 INV write 100 M 0x100 p0w1 p1w0 I ----- ---- ---- RIM/WB I ----- ---- ---- write 104 M 0x100 p0w1 p1w1 The shared line will continue to ping-pong from cache to cache as each processor writes to its associated word. There would be much less bus traffic if the words were allocated in separate cache lines.