Lecture 6

 

Chapter 4 (continued)

 

·        A Sequential Search

·        Finding a particular record in a file based upon a primary key

·        Performance

·        Usually expressed as a function of the number of comparisons

·        With files this is often a poor indicator – number of I/O operations is usually better

·        Unblocked record searching (with n) records requires an average of n/2 I/O operations

·        Blocked records (with b records per block) requires n/(2*b) I/O operations

·        Blocked reads also require fewer seeks – big savings

·        Both are O(n)  in time complexity

·        Sequential searches are good for many applications.  E.g.

·        When searching for a particular pattern (and there is no specific record key)

·        Very small files (e.g. 30 records)

·        Files that only rarely need searching

·        Files in which searches yield a large number of hits

·        Common search utilities

·        wc  (word count in UNIX)

·        counts the number of lines, words and characters in a file

·        grep (generalized regular expression program)

·        searches for all instances of  a particular string in a file

·        Skip sequential search

·        Assume a sorted file has n records blocked b records per block.  What is the average number of blocks that must be read in order to find a record?  What is the average number of comparisons that must be made?  What is the optimal blocking factor required to minimize the number of comparisons?

 

·        Direct Access

·        Can go to any specific record (using relative record number, RRN, in the file)

·        If record lengths are fixed then RRN can be used to compute CCHHR in the file

·        Cannot be used if records are of varying length

·        Can go to any specific byte (using relative byte offset, RBO, in the file)

·        Can go to a record with a specific key value

·        Requires an index to determine the RBO in the file

·        File Organization vs File Access

·        File organization choices can be made independent of access

·        File Access choices depend on what choices have been made for file organization

·        Abstract Data Models

·        Application’s view of the data

·        Removed from the physical organization & device specific issues

·        Headers and Self-Describing Files

·        Information about the physical characteristics of a file is kept in the file itself.  e.g.

·        Names of fields, offset and length of each field, fields per record, records per block

·        Programs (or access methods) must run in an interpretive mode and thus may experience poorer performance

·        Metadata

·        Data that is related to the primary data of study

·        Can be stored within the file in special locations or formats

·        Examples:  scaling factor, offset, source, date, etc

·        Indexes are often maintained to locate primary data with certain metadata characteristics

·        Standard conventions concerning the format of commonly stored data often reduce the amount of metadata required in a file. 

·        Often a reference code is stored in the file to denote a class of metadata (standard type) and the details are assumed to be available from other sources.

·        Object-oriented File Access

·        Programs can process data as though they were always stored in RAM

·        An Object Oriented File System performs the transformation from external formats (on files) to internal formats (in RAM).

·        Pointers must change

·        Standards in representation may change

·        Programmers should not be responsible for the transformations

 

·        Portability and Standardization

·        Factors affecting portability

·        Operating System Differences

·        Language Differences

·        Machine Architecture Differences

·        Support Library Differences

·        Version Differences

·        Achieving Portability

·        Stick with a Standard for Physical Record Format

·        Use a Standard Binary Encoding for Data Elements

·        When conversion is required, convert through a common standard

·        File Structure Conversion

·        File System Differences