Clemson University -- CPSC 231 -- Fall 2009 Separate assembly allows a program to be built from modules rather than a single source file assembler linker source file 1 ---------> object module 1 -- (assembly lang.) (machine code) \ \ source file 2 ---------> object module 2 -----> executable file (assembly lang.) (machine code) / or "load module" / or "binary" / "bin" prewritten library -- (machine code) (machine code) advantages separate source files provide a nice structure for dividing a project between several team members separate source files allow separate testing (and thus better isolation and detection of bugs) separate source files provide for easy reuse separate source files minimize the work of reassembly and relinking needed whenever a change occurs in only one source file prewritten libraries provide machine extensions (higher-level functions like square root, if not available as a machine instruction) Program testing example programs linked from web page illustrate subroutines exercised by test drivers individual testing of modules like this is often not done (e.g., we are lazy or we foolishly think we will save time by skipping testing) or testing is done poorly (e.g., we neglect important test cases) "write all the code, bolt it all together, and hope for the best"? - bad idea it is easier to detect errors and debug when we individually test then when we combine previously tested and debugged modules, the remaining errors will probably stem from misunderstandings about the interface specifications (i.e., the number, ordering, and data types of the actual parameters in the subroutine calls) bottom-up development: write and test the lowest-level (leaf) subroutines first, then write the modules or program that uses them; requires extra effort of writing test drivers top-down development: write the highest-level modules/program first and test the logic; requires extra effort of writing "stub" routines that are simply place holders for lower-level subroutines that will be developed later; these stubs can return fixed values if necessary and are often useful just to print that the call occurred (and optionally print what parameters were passed) often mix the two approaches top-down typically offers the ability to quickly and easily prototype a program and obtain feedback from the end user see this essay by Paul Graham on "Programming Bottom-Up" in which he argues that bottom-up programs are usually smaller and easier to read: http://www.paulgraham.com/progbot.html Linking object files compiler or assembler produces object file (.o) -- using -c flag for gcc linker (unfortunately named ld since ln already used as a file system command) yields an executable file (default name is a.out) in file p1.s .global main .global x main: save %sp,-96,%sp prt_addr: set x,%o1 set fmt1,%o0 call printf nop prt_value: set x,%o0 ld [%o0],%o1 set fmt2,%o0 call printf nop return: ret restore .section ".rodata" fmt1: .asciz "the address of x is %p\n" fmt2: .asciz "the value of x is %d\n" in file p2.s .global x .section ".data" x: .word 55 y: .word 66 % gcc p1.s /var/tmp/ccIcF4mA.o: In function `prt_addr': /var/tmp/ccIcF4mA.o(.text+0x4): undefined reference to `x' /var/tmp/ccIcF4mA.o(.text+0x8): undefined reference to `x' /var/tmp/ccIcF4mA.o: In function `prt_value': /var/tmp/ccIcF4mA.o(.text+0x1c): undefined reference to `x' /var/tmp/ccIcF4mA.o(.text+0x20): undefined reference to `x' collect2: ld returned 1 exit status % gcc -c p1.s % nm p1.o nm prints symbols in object 0000000000000000 a *ABS* file or executable file 0000000000000000 r fmt1 r = read only data (addresses are 0000000000000018 r fmt2 relative to data section) 0000000000000000 T main T = text (addresses are relative to text section) U printf U = undefined 0000000000000004 t prt_addr t = text (lower case type code => 000000000000001c t prt_value private, upper case type code 0000000000000038 t return => global) U x U = undefined % gcc -c p2.s % nm p2.o 0000000000000000 D x D = data (lower case type code => 0000000000000004 d y private, upper case type code => global) % gcc p1.o p2.o instead could use ld p1.o p2.o % nm a.out first column is relocated address ... 000000000001061c r fmt1 0000000000010634 r fmt2 ... 0000000000010560 T main ... U printf@@SYSVABI_1.3 U = undefined since default is ... dynamic linking to shared object for printf, see beloW 0000000000010564 t prt_addr 000000000001057c t prt_value 0000000000010598 t return 0000000000020654 D x 0000000000020658 d y % a.out the address of x is 20654 the value of x is 55 the elfdump command gives more information, including relocation information for p1.o Relocation Section: .rela.text type offset addend section with respect to R_SPARC_HI22 0x4 0 .rela.text x R_SPARC_LO10 0x8 0 .rela.text x R_SPARC_HI22 0xc 0 .rela.text .rodata (section) R_SPARC_LO10 0x10 0 .rela.text .rodata (section) R_SPARC_WDISP30 0x14 0 .rela.text printf R_SPARC_HI22 0x1c 0 .rela.text x R_SPARC_LO10 0x20 0 .rela.text x R_SPARC_HI22 0x28 0x18 .rela.text .rodata (section) R_SPARC_LO10 0x2c 0x18 .rela.text .rodata (section) R_SPARC_WDISP30 0x30 0 .rela.text printf the linker resolves external references between .o (simple object files), .a (libraries/archives), and .so (shared objects) the linker also performs storage management to assign regions within the executable file to each program section; the linker resolves any external references and also performs relocation, that is it fixes addresses within the program sections relative to each other simple object files and libraries/archives are combined using static linking to make a self-contained executable (i.e., all parts needed for execution are contained in the executable); however, for common library routines like printf, this requires too much disk space for every executable that uses the common routine to have to store its own copy assembler linker loader source - - - > object - - - - > executable - > memory image myprog.s ---> myprog.o ---- \ | . . . | \ +-------+ >-> myprog ----> | myprog| / printf | printf| / +-------+ libc:printf - | . . . | shared objects use dynamic linking (at run time) to save disk space (since the program doesn't need to keep a copy of the shared object inside its executable file) and memory space (since many programs can share a single memory-resident copy of the shared object); e.g., using static linking (gcc -g -static) on the capitalization program in 00.17 resulted in an executable file size of 300732 bytes, which was reduced to 24928 bytes when the default dynamic linking was used (gcc -g) assembler linker loader source - - - > object - - - - > executable - > memory image myprog.s ---> myprog.o ---- \ | . . . | \ +-------+ >-> myprog ----> | myprog| / stub | stub--+--. / +-------+ | run-time linking printf stub - | . . . | | +-------+ | (stub invokes | printf|<-' the OS to find +-------+ shared copy | . . . | of printf) (dynamic link libraries (DLLs) in Windows systems are similar to shared objects, however, some early Windows systems would link DLL files into a complete executable memory image at load time rather than run time) static linking advantages executable is self-contained no run-time overhead dynamic linking advantages reduced disk space for executable only one copy of shared routine needs to be in memory, thus reduced memory space across several currently executing programs will get latest version of shared object symbol tables are carried into the executable by the -g flag for help in debugging; for production code that has been debugged and tested, you can use the strip command to remove the extra tables; e.g., using gcc -g on the capitalization program in 00.17 resulted in an executable file size of 24928 bytes, which was reduced to 8320 bytes after stripping linking example first source file - a.s: object file - a.o: .global main,sub1,y <- "main" and "y" are public symbols defined in this object module; "sub1" is an external symbol used in this object module but defined elsewhere (so linker resolves) 0 main: save %sp,-96,%sp 0: 4 sethi %hi(x),%o0 4: hi(__0___) 8 or %o0,%lo(x),%o0 8: lo( " ) c call sub1 c: (______) <- assembler 10 nop 10: can't fill 14 ret 14: in address; 18 restore 18: entry below .section ".data" tells linker 0 x: .word 1 0: 1 to fix it 4 y: .word 2 4: 2 -- external symbols and relocation -- text / 0 / def / "main" note that "x" is not public ---> text / 4 / --- / --- / hi-22-bits or extern, so don't need its text / 8 / --- / --- / lo-10-bits name in the table text / c / use / "sub1" / pc-relative data / 4 / def / "y" -- relocation needed for text 4, 8 -- second source file - b.s: object file - b.o: .global sub1,sub2 0 sub1: save %sp,-96,%sp 0: 4 add %i0,4,%o0 4: 8 call sub2 8: (______) <- entry below c nop c: tells linker 10 ret 10: to fill in 14 restore 14: -- external symbols and relocation -- text / 0 / def / "sub1" text / 8 / use / "sub2" / pc-relative -- no relocation needed -- third source file - c.s: object file - c.o: .global sub2,y 0 sub2: save %sp,-96,%sp 0: 4 sethi %hi(y),%o0 4: hi(______) <- entries below 8 or %o0,%lo(y),%o0 8: lo(______) <- tell linker c ld [%o0],%l0 c: to fill in 10 inc %l0 10: 14 st %l0,[%i0] 14: 18 ret 18: 1c restore 1c: -- external symbols and relocation -- text / 0 / def / "sub2" text / 4 / use / "y" / hi-22-bits text / 8 / use / "y" / lo-10-bits -- relocation needed for text 4, 8 -- % ld a.o b.o c.o (or use gcc a.o b.o c.o) assume linker places text sections one after another and then data section starts at next doubleword aligned address -------------------------------------------- 0 | 0 main . | 4 | 4 . | ... |... a's text section . | 18 | 18 . | | . . . . . . . . . . . . . . . | 1c | 0 sub1 . combined | 20 | 4 . text | ... |... b's text section . section | 30 | 14 . | | . . . . . . . . . . . . . . . | 34 | 0 sub2 . | 38 | 4 . | ... |... c's text section . | 50 | 1c . | -------------------------------------------- 54 | skipped for doubleword alignment | -------------------------------------------- 58 | 0 data section | 5c | 4 y (y is global) | -------------------------------------------- linker symbol table symbol address in executable ------ --------------------- _main_ ___0__ _sub1_ __1c__ <- text of b.o starts after a.o, so adjust by 0x1c _sub2_ __34__ <- c.o starts after a.o, so adjust by 0x34 _y____ __5c__ (x is not in linker's table since it is not global) executable - a.out: 0: \ linker places a.o first 4: hi(__58__) | <--- linker adjusts address of x according 8: lo(__58__) | <-' to starting address of data section c: (__10__) | <- linker fills in pc-relative displacement 10: | | of 0x1c - 0xc = 0x10 14: | | 18: - -|- - - / 1c: <-----' \ linker places b.o second so addresses 20: | in b.o adjusted (relocated) by 0x1c 24: (__10__) | <- linker fills in pc-relative displacement 28: | | of 0x34 - 0x24 = 0x10 2c: | | 30: _ _|_ _ _ / 34: <-----' \ linker places c.o third, adjust by 0x34 38: hi(__5c__) | <--- linker fills in address of y 3c: lo(__5c__) | <-' 40: | 44: | 48: | 4c: | 50: - - - - - / ... 58: 1 \ linker places data section on doubleword 5c: 2 _ _ _ _ _ _ _ _ _ / aligned boundary -- relocation information -- 4 / hi-22-bits 8 / lo-10-bits 38 / hi-22-bits 3c / lo-10-bits Loading running a program == actual _loading_ into memory and then _branching_ to the entry point address loader usually performs address relocation as words containing absolute addresses are loaded into memory; relocation is required in both the linker and the loader so that the program will run correctly (with the correct addresses) loading example executable - a.out: memory image: the program is loaded at, say, 1060, so all absolute addresses must be adjusted (relocated) by adding 1060 0: 1060: 4: hi(__58__) 1064: hi(_10b8_) <- hi-22 and 8: lo(__58__) 1068: lo(_10b8_) <- lo-10 adj. c: (__10__) 106c: (__10__) [pc-rel ok 10: | 1070: | since 14: | 1074: | 0x107c - 18: | 1078: | 0x106c = 1c: <-----' 107c: <-----' 0x0010] 20: 1080: 24: (__10__) 1084: (__10__) [pc-rel ok] 28: | 1088: | 2c: | 108c: | 30: | 1090: | 34: <-----' 1094: <-----' 38: hi(__5c__) 1098: hi(_10bc_) <- hi-22 and 3c: lo(__5c__) 109c: lo(_10bc_) <- lo-10 adj. 40: 10a0: 44: 10a4: 48: 10a8: 4c: 10ac: 50: 10b0: ... 58: 1 10b8: 1 5c: 2 10bc: 2 -- relocation information -- 4 / hi-22-bits <- loader uses these entries 8 / lo-10-bits in the executable file 38 / hi-22-bits to relocate absolute addrs 3c / lo-10-bits -- entry point (main) is 0 -- <- loader branches to 0+1060 Summary assembler linker loading --------- ------ ------- PC-relative offset * caller and subroutine bound -- -- in same source file * caller and subroutine -- bound -- in different files absolute address * definition and use bound may need may need in same source file relocation relocation * definition and use -- bound may need in different files relocation