Peripherals and File Design
· Disks are slow!
Access to RAM 80 hs (0.000000080 seconds)
Access to Disk 12 ms (.012 seconds)
· Disks are large and cheap
· Disks are non-volatile
· A file structure is a combination of representations for data in files and operations for accessing the data
· Goals of file processing
· Get all the information needed in one access to the disk
· If not one, then as few accesses as possible
· When access is successful, get all the data needed with one I/O
· Keep the disk positioned so as to minimize the time required for the next I/O
· Maintaining these goals is difficult when files grow or shrink and multiple users share the disk drive simultaneously
· Tapes
· Access is sequential
· Large files required large access times
· Media was cheap
· Disk
· Access is direct
· Cost was initially high and capacity was initially low
· Indexes were added but were often slow to access data
· AVL trees were developed but still required many accesses
· B-Trees were developed to balance trees and minimize I/Os
· Sequential access with B-Trees was slow
· B+-Trees were developed to provide sequential access
· Access to data is O(logkN)
· In a file with several million records, a single specific record can be found in no more than 3 or 4 accesses.
· Hashing
· Allows for access large volumes of data with one I/O
· Did not initially allow for dynamic, volatile files that change greatly in size
· Extendible, dynamic hashing now permits access to a file of any size with one or, at most, two disk accesses
· Key concepts in disk file structures
· Collect data into buffers, blocks or buckets
· When blocks become too large, split them into multiple blocks
· Read sections 1.3 – 1.5 to review simple C++ object ideas.
· A file is a collection of bytes on some external device
· A disk may contain thousands of physical files
· An application generally relies on part of the operating system to assist with the I/O to files – hence the term logical file
· Some mechanism must exist to map the logical file name to the physical file name on a device. e.g.
· C++:
· Fd = open(filename, flags [,pmode]);
· Example:
Fd=open(‘input.dat’, O_RDONLY);
pmode is required if flags is O_CREAT
· Files must often be closed to ensure all data has been written to the external device
· I/O is often blocked to increase performance
· Some operating systems automatically close files
· Read(Source_file, Destination_addr, Size)
· Write(Destination_file, Source_addr, Size)
· Some of these parameters may not be needed for all languages
· End-of-file is a signal that no more data is available in the file for reading
· SEEK(Source_file, Offset) enables positioning to a specific place within a file
· C positions to a specific byte offset within the file. e.g.
Pos=lseek(fd, byte_offset, origin)
· Files with C Streams and C++ Stream Classes
· C Streams use the stdio.h header file
· C++ Streams use iostream.h and fstream.h header files
· Examples of C and C++ Stream I/O
· Special characters cause problems in some file systems (e.g. carriage returns or line feeds).
· Directory structures are often trees
· Some systems allow for I/O redirection and pipes
· Read Chapter 2 as a good review of C++ I/O capabilities
· Disks
· Direct access storage devices (DASDs) as opposed to serial devices
· Hard disks vs floppy disks
· Optical disks, WORM disks, etc
· Physical structure of disks
· Platters
· Tracks
· Sectors (unit of addressing on disk)
· Cylinders
· Disk drive
· Capacity
· Sector capacity = # bytes (e.g. 512)
· Track capacity = #sectors/track (e.g. 40) * sector capacity
· Cylinder capacity = #tracks/cylinder (e.g. 11) * track capacity
· Drive capacity = #cylinders/drive (e.g. 1331) * cylinder capacity
For next time use the Web to:
1. Find out typical capacities for sectors, tracks, cylinders and drives.
2. Find out the cost/byte of disk storage
Hand-in an MS Word file with your findings by placing it into the turn-in box for this course found by starting at the address www.cle.clemson.edu . Be sure to cite your sources of information.