CpSc 360

Lecture 10

 

Chapter 5 (continued)

 

·      Reclaiming Space in Files

 

·       Placement Strategies

·       First-Fit:       Go down the Avail Stack until a large enough slot is found

·       Best-Fit:      

·       Keep list in Ascending order by size and use First-Fit.

·       Residual after finding fit makes another smaller slot in list

·       Search time can be significant for adds and deletes

·       Worst-Fit:    Keep list in Descending order by size and use First-Fit.

·       Finding an open slot is fast!

·       Residual after selection is large and won’t usually create much external fragmentation

·       Conclusions:

·       Fixed-length records do not require any special placement strategies.

·       Variable length records in volatile files require careful placement.

·       If space is lost due to internal fragmentation then consider first-fit or best-fit.

·       If space is lost due to external fragmentation then consider worst-fit.

 

·      Finding Things Quickly

·       Sequential lookup is expensive

·       Direct access with RRN or RBO require that we know a mapping from user data to RRN or RBO

·       Binary searches are attractive for finding records by a key

·       Problems with binary searches

·       Require a sorted file

·       Sorting is expensive – O(nlog2n)

·       External sorts require many seeks

·       Internal sorts can help

·       Requires log2n + ˝ accesses (average) to find a record – too many!

·       Keeping a file sorted is very expensive

·       Internal sorts only work on small files

 

·      Keysorting

 

·       Don’t sort the entire record but rather sort only the keys and addresses of the records [key, RRN] or [key, RBO] constructed during a sequential pass through the data file.

·       Use RAM as much as possible to contain sort data

·       Rearrange records in the file according to key sequence after sorting

 

For i:=1 to NumberOfRecords do

          Seek Record (SortKey.RRNi)

          Read this record into a buffer in RAM

          Write the contents of the buffer to output file

EndFor

 

·       Problems with this technique

·       Records must be read twice

·       Many random seeks to disk on second pass through records

·       If records have pointers to other records in them (i.e pinned records) then the rearrangement of the records may invalidate the pointers.

·       Solution

·       Don’t rearrange the records into sorted order

·       Write the [key, RRN] back to a second (and smaller) file

·       This file is actually an index to the original file

·       A binary search can be performed on the index file much faster than the original

 

Chapter 6 – Indexing

 

·       Advantages of indexes

·       Enables one to impose an order on a file without actually rearranging the file

·       Provide multiple access paths to a file (by multiple sort fields)

·       Enables keyed access to variable-length record files

·       The index is sorted

·       The index will contain fixed length entries

·       The index can be searched with a binary search

·       The RBO data in the index entry can be used to access the target record

·       The data file is called an entry-sequenced file

·       Indexes are often small enough to be completely contained in memory

·       Binary searches thus require no seeks and are extremely fast

·       Algorithms required to use index files (assuming they will fit into RAM):

·       Create the original empty index and data files

·       Create empty data structures on disk

·       Load the index file into memory before using it

·       Simply read from disk into memory data structures

·       Check for condition of the index and reconstruct index if possibly in damaged status

·       Rewrite the index file from memory after using it

·       Watch for power failures, etc.

·       Post a status flag to note condition of index and protect against reading an out-of-date index

·       Add records to the data file and index file

·       Rearrangement of index entries into sorted order is likely

·       Delete records from the data file

·       Index entries must be deleted and index burped

·       Update the records in the data file

·       Updating non-key data

·       Updating key data – must rearrange index entries to keep them sorted

·       Retrieve records from the data file using the index file