CpSc 360
Chapter 5 (continued)
· Reclaiming Space in Files
· Placement Strategies
· First-Fit: Go down the Avail Stack until a large
enough slot is found
· Best-Fit:
· Keep list in Ascending order
by size and use First-Fit.
· Residual after finding fit
makes another smaller slot in list
· Search time can be
significant for adds and deletes
· Worst-Fit: Keep list in Descending order by size and
use First-Fit.
· Finding an open slot is
fast!
· Residual after selection is
large and won’t usually create much external fragmentation
· Conclusions:
· Fixed-length records do not
require any special placement strategies.
· Variable length records in
volatile files require careful placement.
· If space is lost due to internal fragmentation then consider
first-fit or best-fit.
· If space is lost due to external fragmentation then consider
worst-fit.
· Finding Things Quickly
· Sequential lookup is
expensive
· Direct access with RRN or
RBO require that we know a mapping from user data to RRN or RBO
· Binary searches are
attractive for finding records by a key
· Problems with binary
searches
· Require a sorted file
· Sorting is expensive –
O(nlog2n)
· External sorts require many
seeks
· Internal sorts can help
· Requires log2n +
˝ accesses (average) to find a record – too many!
· Keeping a file sorted is
very expensive
· Internal sorts only work on
small files
· Keysorting
· Don’t sort the entire record
but rather sort only the keys and addresses of the records [key, RRN] or [key,
RBO] constructed during a sequential pass through the data file.
· Use RAM as much as possible
to contain sort data
· Rearrange records in the file
according to key sequence after sorting
For i:=1 to NumberOfRecords
do
Seek Record (SortKey.RRNi)
Read this record into a buffer in RAM
Write the contents of the buffer to output file
EndFor
· Problems with this technique
· Records must be read twice
· Many random seeks to disk on
second pass through records
· If records have pointers to
other records in them (i.e pinned records)
then the rearrangement of the records may invalidate the pointers.
· Solution
· Don’t rearrange the records
into sorted order
· Write the [key, RRN] back to
a second (and smaller) file
· This file is actually an
index to the original file
· A binary search can be
performed on the index file much faster than the original
Chapter 6 – Indexing
· Advantages of indexes
· Enables one to impose an
order on a file without actually rearranging the file
· Provide multiple access
paths to a file (by multiple sort fields)
· Enables keyed access to
variable-length record files
· The index is sorted
· The index will contain fixed
length entries
· The index can be searched
with a binary search
· The RBO data in the index
entry can be used to access the target record
· The data file is called an entry-sequenced file
· Indexes are often small enough
to be completely contained in memory
· Binary searches thus require
no seeks and are extremely fast
· Algorithms required to use
index files (assuming they will fit into RAM):
· Create the original empty
index and data files
· Create empty data structures
on disk
· Load the index file into
memory before using it
· Simply read from disk into
memory data structures
· Check for condition of the
index and reconstruct index if possibly in damaged status
· Rewrite the index file from
memory after using it
· Watch for power failures,
etc.
· Post a status flag to note
condition of index and protect against reading an out-of-date index
· Add records to the data file
and index file
· Rearrangement of index
entries into sorted order is likely
· Delete records from the data
file
· Index entries must be
deleted and index burped
· Update the records in the
data file
· Updating non-key data
· Updating key data – must rearrange
index entries to keep them sorted
· Retrieve records from the
data file using the index file