Computer Science 360

Peripherals and File Design

Lecture 1

Chapter 1

 

·        Disks are slow!

          Access to RAM                  80 hs   (0.000000080 seconds)

          Access to Disk                  12 ms  (.012 seconds)

 

·        Disks are large and cheap

·        Disks are non-volatile

 

·        A file structure is a combination of representations for data in files and operations for accessing the data

·        Goals of file processing

·        Get all the information needed in one access to the disk

·        If not one, then as few accesses as possible

·        When access is successful, get all the data needed with one I/O

·        Keep the disk positioned so as to minimize the time required for the next I/O

·        Maintaining these goals is difficult when files grow or shrink and multiple users share the disk drive simultaneously

 

·        Tapes

·        Access is sequential

·        Large files required large access times

·        Media was cheap

 

·        Disk

·        Access is direct

·        Cost was initially high and capacity was initially low

·        Indexes were added but were often slow to access data

·        AVL trees were developed but still required many accesses

·        B-Trees were developed to balance trees and minimize I/Os

·        Sequential access with B-Trees was slow

·        B+-Trees were developed to provide sequential access

·        Access to data is O(logkN)

·        In a file with several million records, a single specific record can be found in no more than 3 or 4 accesses.

 

·        Hashing

·        Allows for access large volumes of data with one I/O

·        Did not initially allow for dynamic, volatile files that change greatly in size

·        Extendible, dynamic hashing now permits access to a file of any size with one or, at most, two disk accesses

 

·        Key concepts in disk file structures

·        Collect data into buffers, blocks or buckets

·        When blocks become too large, split them into multiple blocks

·        Read sections 1.3 – 1.5 to review simple C++ object ideas.

Chapter 2

 

·        A file is a collection of bytes on some external device

·        A disk may contain thousands of physical files

 

·        An application generally relies on part of the operating system to assist with the I/O to files – hence the term logical file

·        Some mechanism must exist to map the logical file name to the physical file name on a device.  e.g.

 

·        C++:

 

·        Fd = open(filename, flags [,pmode]);

 

·        Example:

 

Fd=open(‘input.dat’, O_RDONLY);

pmode is required if flags is O_CREAT

 

·        Files must often be closed to ensure all data has been written to the external device

·        I/O is often blocked to increase performance

·        Some operating systems automatically close files

·        Read(Source_file, Destination_addr, Size)

·        Write(Destination_file, Source_addr, Size)

·        Some of these parameters may not be needed for all languages

·        End-of-file is a signal that no more data is available in the file for reading

 

·        SEEK(Source_file, Offset) enables positioning to a specific place within a file

·        C positions to a specific byte offset within the file.  e.g.

 

Pos=lseek(fd, byte_offset, origin)

 

·        Files with C Streams and C++ Stream Classes

·        C Streams use the stdio.h header file

·        C++ Streams use iostream.h and fstream.h header files

·        Examples of C and C++ Stream I/O

 

·        Special characters cause problems in some file systems (e.g. carriage returns or line feeds).

·        Directory structures are often trees

·        Some systems allow for I/O redirection and pipes

·        Read Chapter 2 as a good review of C++ I/O capabilities

 

Chapter 3

 

·        Disks

·        Direct access storage devices (DASDs) as opposed to serial devices

·        Hard disks vs floppy disks

·        Optical disks, WORM disks, etc

·        Physical structure of disks

·        Platters

·        Tracks

·        Sectors (unit of addressing on disk)

·        Cylinders

·        Disk drive

·        Capacity

·        Sector capacity = # bytes (e.g. 512)

·        Track capacity = #sectors/track (e.g. 40) * sector capacity

·        Cylinder capacity = #tracks/cylinder (e.g. 11) * track capacity

·        Drive capacity = #cylinders/drive (e.g. 1331) * cylinder capacity

 

For next time use the Web to:

 

1.                 Find out typical capacities for sectors, tracks, cylinders and drives.

2.                 Find out the cost/byte of disk storage

 

Hand-in an MS Word file with your findings by placing it into the turn-in box for this course found by starting at the address www.cle.clemson.edu .  Be sure to cite your sources of information.