Clemson University -- CPSC 231 -- Fall 2009 I/O - input/output system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices +-----+ CPU must perform all transfers to/from simple controller, | CPU | e.g., CPU reads byte from buffer in memory and stores +-----+ it in controller's data register then stores a write- |cache| to-device command in the controller's command register +-----+ | +============================================+ bus | | | +--------+ +-----------+ +-----------+ a simple controller will | memory | |controller | |controller | respond to bus signals, | | | +--+| | +--+| will set status register |+------+| | data| || | data| || for CPU to later check || I/O || | +--+| | +--+| ||buffer|| | status| || | status| || |+------+| | +--+| | +--+| | | |command| || |command| || | | | +--+| | +--+| +--------+ +-----------+ +-----------+ | | +-----------+ +-----------+ | device | | device | +-----------+ +-----------+ controller registers - data register - holds data byte going to/from device - status register - holds bits indicating if device is ready, error, etc. - command register - bit for read, bit for write, etc. (may be combined with the status register) access to controller registers either by: - memory-mapped - registers respond to main memory addresses (typically high memory), so you can use normal load/store instructions to access - isolated I/O - special instructions (e.g., IN, OUT on Pentium) are required, use port numbers as addresses of the controller registers programmed I/O - CPU is involved with sending/receiving every byte, CPU must busy wait on device to be ready for sending/receiving ; write bytes from memory buffer to device ; ; pseudo-code ; | ; | int count = N; ; | char *addr = memory_buffer; ; | char byte; ; | ; | do{ byte = *addr; ; | ; | while( io_device_status != READY ) /* busy wait */ ; ; | ; | io_device_data = byte; ; | io_device_command = WRITE; ; | ; | addr++; ; | count--; ; | ; | }while( count > 0 ); ; ; ; (defines may differ for real device) ; define(DEVICE_REGS_BASE, 0xffff0000) define(DEVICE_READY, 0x80) ; mask that defines ready bit in status reg. define(WRITE_CMD, 0x1) ; mask to set write command in command reg. define(DATA, 0) ; offset of data reg. define(STATUS, 4) ; offset of status reg. define(CMD, 8) ; offset of command reg. set buffer, %address_r set buffer_size, %count_r set DEVICE_REGS_BASE, %device_reg_ptr_r ; memory-mapped i/o mov WRITE_CMD, %cmd_r loop: ldub [%address_r], %byte_r ; get data byte from buffer busy_wait: ldub [%device_reg_ptr_r + STATUS], %status_r btst DEVICE_READY, %status_r ; andcc to test ready bit bz busy_wait ; branch if zero nop ready: stb %byte_r, [%device_reg_ptr_r + DATA] ; send data byte stb %cmd_r, [%device_reg_ptr_r + CMD] ; send write command inc %address_r dec %count_r cmp %count_r, 0 bgt loop nop consider 6 ppm printer, with 5,000 characters per page = 30,000 chars/min = 500 chars/sec = 0.002 sec/char = 2 ms/char for a 500 MHz processor (= 500 M cycles/sec), the cycle time is 2 ns, thus 1,000,000 cycles in 2 ms and thus 1,000,000 cycles between characters if the busy wait loop takes 100 cycles per iteration (of which most will be required for the latency of the load inst. accessing the device status register), the busy wait loop requires 10,000 iterations between characters 1 success in 10,000 iterations => not an efficient use of the CPU (CPU spends 99.9+% of its time waiting) ; read bytes from device to memory buffer ; ; pseudo-code ; | ; | int count = N; ; | char *addr = memory_buffer; ; | char byte; ; | ; | do{ io_device_command = READ; ; | ; | while( io_device_status != READY ) /* busy wait */ ; ; | ; | byte = io_device_data; ; | *addr = byte; ; | ; | addr++; ; | count--; ; | ; | }while( count > 0 ); ; ; ; (defines may differ for real device) define(DEVICE_REGS_BASE, 0xffff0000) define(DEVICE_READY, 0x80) ; mask that defines ready bit in status reg. define(READ_CMD, 0x2) ; mask to set read command in command reg. define(DATA, 0) ; offset of data reg. define(STATUS, 4) ; offset of status reg. define(CMD, 8) ; offset of command reg. set buffer, %address_r set buffer_size, %count_r set device_regs_base, %device_reg_ptr_r ; memory-mapped i/o mov READ_CMD, %cmd_r loop: stb %cmd_r, [%device_reg_ptr_r + CMD] ; send read command busy_wait: ldub [%device_reg_ptr_r + STATUS], %status_r btst DEVICE_READY, %status_r ; andcc to test ready bit bz busy_wait ; branch if zero nop ready: ldub [%device_reg_ptr_r + DATA], %byte_r ; get data byte stb %byte_r, [%address_r] ; put in buffer inc %address_r dec %count_r cmp %count_r, 0 bgt loop nop interrupt-driven I/O - CPU can do something else while controller and device are busy, the controller grabs the CPU's attention when needed by causing what is essentially an unplanned procedure call +-----+ | CPU |--------. +-----+<-----. | |cache| | | +-----+ | | | | | +===+=======+=====+===|=|===========+=====+ bus | | +-|-----------|---+-+ interrupt request line (INTR) +--------+ | | +-----------|---|-+ interrupt ack line (INTA) | memory | | | v | | v |+------+| +-----------+ +-----------+ controllers that can ||buffer|| |controller | |controller | interrupt raise request |+------+| | +--+| | +--+| signal on bus | | | data| || | data| || |+------+| | +--+| | +--+| when CPU responds with || ISR || | status| || | status| || an acknowledgement, the |+------+| | +--+| | +--+| controller places some | | |command| || |command| || type of identification |+------+| | +--+| | +--+| on the bus || int. || +-----------+ +-----------+ ||vector|| | | ||table || +-----------+ +-----------+ |+------+| | device | | device | +--------+ +-----------+ +-----------+ we rely on an external interrupt from the controller to signal that the device is ready (i.e., that the previous I/O operation is complete); this will cause the currently executing program to stop and the processor to enter the OS and start executing an interrupt service routine (ISR) - sometimes called an interrupt handler (IH) there are also internal interrupts (sometimes called exceptions) for divide by zero, unaligned memory accesses, memory protection errors, etc. moreover, to protect the OS, calls to the OS must be made by a special instruction that causes an interrupt - called SVC (supervisor call) on IBM mainframes, INT on x86, and trap on SPARC an interrupt must save a return address and information on the processor state to allow the interrupted program to be resumed later => save the program counter (PC) and processor state register (PSR) an interrupt switches execution mode to an OS-only mode by changing a mode bit (or bits) in the PSR there are typically interrupt control bits in the controller's command register, and interrupt enable bits (either a priority level or a bit mask) in the PSR - the processor typically disables interrupts (at least at that level and lower) whenever an ISR starts the entry point address to the interrupt service routine (ISR) is typically provided by a table of such addresses in low memory; for I/O the entry is chosen according to the interrupt code placed on bus by controller +------------------------+ \ 0 | addr of ISR for type 0 |-------------. | +------------------------+ | | 4 | addr of ISR for type 1 |----------. | | interrupt vector +------------------------+ | | | table (IVT) 8 | addr of ISR for type 2 |-------. | | | +------------------------+ | | | | c | addr of ISR for type 3 |----. v v v | +------------------------+ | / | ... | | | | ... | | +------------------------+ | \ | code for type 3 int |<---' | interrupt | ... | | service | return from interrupt | | routine (ISR) +------------------------+ / | ... | a special return from interrupt instruction at end of ISR switches back to previous processor state and restores saved PC the fetch-execute cycle is extended to check for interrupts after each instruction - the hardware response to an interrupt acts like procedure call if interrupt requested by device and if CPU has interrupts enabled note that the ISR is a software routine - and that instructions in the ISR are fetched, decoded, and executed, just like any other program +-------+ .----->| fetch | | +-------+ | v | .------. | < decode > | `------' | | | +--+---------+------------+------- ... -------------+ | v v v v | +---------+ +---------+ +---------+ +-----------------------+ | | execute | | execute | | execute | ... | return from interrupt | | | load | | add | | store | | (restore PC and PSR) | | +---------+ +---------+ +---------+ +-----------------------+ | v v v v | +--+---------+------------+------- ... -------------+ | | | .-------------------------------------------------. | < if interrupt requested and interrupts are enabled > | no `-------------------------------------------------' +<--------' | yes ^ v | +--------------------------------------------------+ | | 1) save PC and PSR | | | 2) switch execution mode to kernel (OS-only) | | | 3) disable/restrict further interrupts | | | 4) load new PC from IVT (interrupt vector table) | | +--------------------------------------------------+ | | `--------------' nested interrupts if the PC and PSR are saved on a stack (or in a set of registers), a high priority device can interrupt the execution of the ISR for a lower priority device .----------. | disk ISR | `----------| .-------------^ v-------------. | printer ISR | | printer ISR | `-------------| `-------------| --------------^ ^ v-------------- user program | | |user program --------------| | `-------------- ^ | | | rti rti printer disk interrupt interrupt otherwise, the second interrupt is held pending until the first ISR finishes and executes its rti instruction; at that point, the rti briefly reestablishes user mode with interrupts enabled but immediately the highest-priority pending interrupt is accepted .--------------------------..----------. | printer ISR || disk ISR | `--------------------------|`----------| --------------^ |^ v-------------- user program | || |user program --------------| v| `-------------- ^ | | rti| rti | | | ^^^^^^^^^^^^^^* printer disk interrupt interrupt x86 specifics FR (flags register, also sometimes called PSW for processor status word) contains flags (condition codes) and interrupt enable bit IP (instruction pointer) is Intel's name for the program counter INT is software interrupt instruction used to invoke OS IP and FR saved on memory stack when interrupt accepted, and interrupt enable bit is turned off the ISR can push registers on the stack to save them before servicing the interrupt, and then pop them back off before returning IRET instruction pops FR and IP back from stack SPARC v7 specifics PSR +----------------+-+-+-+-+-------------+-+-------+-+-+-+----------+ | . . . . . . . .|N|Z|V|C|. . . . . . .|F|. PIL .|S|P|E|. .CWP. . | +----------------+-+-+-+-+-------------+-+-------+-+-+-+----------+ 31 29 27 25 23 21 19 17 15 13 11 9 8 7 6 5 4 3 2 1 0 NZVC - integer condition codes EF - enable floating point PIL - processor interrupt level S - supervisor mode PS - previous supervisor mode ET - enable traps CWP - current window pointer (see /usr/include/v7/sys/psr.h) #define PSR_CWP 0x0000001F /* current window pointer */ #define PSR_ET 0x00000020 /* enable traps */ #define PSR_PS 0x00000040 /* previous supervisor mode */ #define PSR_S 0x00000080 /* supervisor mode */ #define PSR_PIL 0x00000F00 /* processor interrupt level */ #define PSR_EF 0x00001000 /* enable floating point unit */ #define PSR_EC 0x00002000 /* enable coprocessor */ #define PSR_RSV 0x000FC000 /* reserved */ #define PSR_ICC 0x00F00000 /* integer condition codes */ #define PSR_C 0x00100000 /* carry bit */ #define PSR_V 0x00200000 /* overflow bit */ #define PSR_Z 0x00400000 /* zero bit */ #define PSR_N 0x00800000 /* negative bit */ #define PSR_VER 0x0F000000 /* mask version */ #define PSR_IMPL 0xF0000000 /* implementation */ the UltraSPARC (v9) uses a different layout in the PSTATE register interrupts are called traps 256 trap types, half software and half hardware invoke OS by trap instruction trap enable bit (ET) and 4-bit interrupt level (PIL) in PSR - a synchronous trap is accepted only if the ET bit is set - an external interrupt is accepted only if the ET bit is set and priority level of the interrupt is greater than current PIL TBR - trap base register with leading 20 bits set by OS, an 8-bit field supplied internally based on trap type or externally from interrupting device, and four zeros trap sequence 1. ET bit in PSR cleared so that further traps are disabled 2. S bit saved into PS bit and S bit set => places processor in supervisor mode 3. CWP incremented to give trap handler new set of local registers (you can only use the eight local registers since the out registers are not guaranteed to be mapped to physical registers w/o window overflow processing; also the in registers and global registers might be in use by interrupted process) 4. pc, and npc are saved into %l1 and %l2 (caveat: hyperSPARC manual differs in register assignment from textbook) 5. trap type field placed into TBR and pc = TBR, npc = TBR+4 the IVT / trap table in SPARC consists of blocks of four instructions, enough to contain a branch to the trap handling routine and load the PSR sethi %hi(trap_handler), %l3 ! load trap handler jmp [%l3+%lo(trap_handler)] ! address and jump mov %psr, %l0 ! load %psr into local register nop the UltraSPARC (v9) changes this to blocks of eight instructions, and four blocks (32 instructions) each for the window underflow and overflow traps so the the window-related traps are branchless rett - return from trap instruction, restores previous processor state (caveat: hyperSPARC manual differs in register assignment from textbook) ! re-execute the trapping instruction jmpl %l1, %g0 ! saved pc rett %l2 ! saved npc ! return to instruction after trapping instruction jmpl %l2, %g0 ! saved npc rett %l2+4 ! saved npc + 4 the UltraSPARC (v9) supports nested interrupts in hardware, including multiple sets of global registers, one per trap level; this allows nested interrupts and thus a high-priority I/O device is not locked out during the servicing of a lower-priority I/O device DMA - direct memory access - extra registers and logic in the controller allow it to transfer a whole block of bytes without CPU involvement, interrupts the CPU after completion or after an error - address register - address of buffer in main memory, controller increments - count register - length of block, controller decrements +-----+ INTA | CPU |----------------. +-----+<-------------. | |cache| INTR | | +-----+ | | | | | +===+=======+=====+===========|=|=====+ bus | | | | | | | v +--------+ +----------------------+ DMA controller can be bus master, | memory | | DMA controller | so need to arbitrate for bus among | | | +-------------+| DMA controllers and CPU |+------+<-. | data| || || I/O || | | +-------------+| extra logic in controller to ||buffer|| | | status| || implement loop: |+------+| | | +-------------+| | | | |command| || while( count > 0 ){ | | | | +-------------+| transfer byte at address; | | `----address| & buffer || address++; | | | +-------------+| count--; | | | count| # in buffer || } | | | +-------------+| +--------+ +----------------------+ interrupt once at end of buffer | +-----------+ | device | +-----------+ I/O methods CPU involvement # interrupts --------------- ------------ programmed I/O completely dedicated none to the transfer interrupt-driven I/O transfers each byte after each byte DMA I/O initially loads the one, at end of block address and count registers, gives transfer command effect of offloading CPU do{ byte = *addr; \ while( status != READY ); <= interrupt-driven I/O | complete block io_device_data = byte; relieves CPU of | transfer is io_device_command = WRITE; busy-wait loop | offloaded onto addr++; | DMA controller count--; | }while( count > 0 ); / further offloading of I/O from the CPU mainframe channel - provides for transfers of multiple blocks by traversing linked-list-like channel programs, interrupt only when end of channel program reached (or on error) a channel has the equivalent of a program counter +---------+ each channel instruction is | channel | called a "channel command word" | +----+ | +-----+---------+-------+------+ each CCW has | | pc ------->| r/w | address | count | next --. address and | +----+ | +-----+---------+-------+------+ | count fields +---------+ | to support a .--------------------------------' block transfer v +-----+---------+-------+------+ additional | r/w | address | count | next --. fields indicate +-----+---------+-------+------+ | end of physical v block, etc. CCWs can also provide scatter-gather scatter - read data from a single physical block on an I/O device and send different parts to multiple, non-contiguous I/O buffers in memory gather - read data from multiple, non-contiguous I/O buffers in memory and write a single physical block to the device I/O processors - offload I/O conversion, editing, etc., e.g., I^2O = "intelligent I/O", Intel i960 running its own real-time OS