Chapter 5 (continued)
· Reclaiming Space in Files
· The problem revolves around
what to do if we want to add, update or delete records in an existing file
· Record adds don’t cause
problems if we are allowed to add at the end of an existing file
· Record updates can be viewed
as a record deletion followed by a record addition
· Record deletions are the
real problem!
· Records can be recognized as
deleted by placing a special character in a known position of the record
(perhaps the first byte)
· Programs must have special
logic to understand that certain records are deleted (unless objects manage the
I/O, in which case the object method would worry about this problem)
· Space can be reclaimed
periodically (when more space is needed or on a fixed schedule) by simply
coping the file to a new location and not writing the deleted records out.
· Space can also be reclaimed
by compressing the file in place.
· Which of the two techniques
above is faster?
· Deleting fixed-length
Records for Reclaiming Space Dynamically
· Used when we want to use the
deleted space as soon as possible
· Mark deleted records as
above
· How do we find the deleted
records so we can reuse the space?
· Search sequentially through
the file looking for a marked (delted) record –NOT!!!
· Linked lists of deleted
records with a sentinel on the last record (Available record list)
· Stacks are generally simpler
to use
· Pointers are usually the
Relative Record Number (RRN) in the file
· One can quickly determine if
a file contains empty space and locate the space with one I/O
· The stack can contain the
actual data records with embedded pointers (RRNs) or can be a separate file
itself (what are the tradeoffs?)
· The pointer to the first deleted
record in the file can be kept in a header record in the file. If null, then there are no deleted records
· Deleting variable-length
records
· We can still use an
Available list of deleted records
· Since records are of
differing lengths, we cannot use a RRN as a link between records
· We must instead use a
relative byte offset in the file as a link address
· Re-using space reclaimed
from deleted variable-length records
· We can’t simply take the
first record on the stack since the deleted record slot my be too small to
contain the new record to be added
· We can’t use a simple stack (FIFO)
data structure to manage the deleted records
· We may not have a slot in
the stack large enough to contain a new record
· We may need to add
especially large records to the end of the file
· Storage Fragmentation
· If we force all records to
be fixed length (i.e. to be of maximum possible length) then we often waste
space at the end of records that are less than maximum length. We must pad the unused space in these
records
· Wasted space within a record
is called internal fragmentation
· Variable length records
eliminate internal fragmentation
· Deletion and replacement of
longer variable length records with shorter records still results in internal fragmentation
(not quite but the author describes it this way)
· The part of the record not
used should go on the Available list
· If we choose to use a slot
on the Available list that is too large for the record to be added, we can put
the unused space back on the Available list
· After considerable file activity
(deletes and adds) many of the slots on the Available list will be too small to
be reused. We could simply compact the
file (as described earlier) or we could attempt to coalesce the holes (consolidate
available slots) that are physically adjacent.
· The problem with coalescing
holes is that they are not necessarily adjacent on the Available list!
· Smart placement strategies
can help reduce this problem
· Placement Strategies
· First-Fit: Go down the Avail Stack until a large
enough slot is found
· Best-Fit:
· Keep list in Ascending order
by size and use First-Fit.
· Residual after finding fit
makes another smaller slot in list
· Search time can be
significant for adds and deletes
· Worst-Fit: Keep list in Descending order by size and
use First-Fit.
· Finding an open slot is
fast!
· Residual after selection is
large and won’t usually create much external fragmentation
· Conclusions:
· Fixed-length records do not
require any special placement strategies.
· Variable length records in
volatile files require careful placement.
· If space is lost due to internal fragmentation then consider first-fit
or best-fit.
· If space is lost due to external fragmentation then consider
worst-fit.
· Finding Things Quickly
· Sequential lookup is
expensive
· Direct access with RRN or
RBO require that we know a mapping from user data to RRN or RBO
· Binary searches are
attractive for finding records by a key
· Problems with binary
searches
· Require a sorted file
· Sorting is expensive – O(nlog2n)
· External sorts require many
seeks
· Internal sorts can help
· Requires log2n + ½
accesses (average) to find a record – too many!
· Keeping a file sorted is
very expensive
· Internal sorts only work on
small files