Chapter
5
·
Data
Compression Makes files smaller
·
Advantages/Disadvantages
·
Uses
less external storage
·
Read/Write
time faster to device with smaller size
·
Fewer
seeks/rotational latency with fewer tracks
·
Requires
more CPU time to compress/decompress
·
Compact
notation Redundancy reduction
·
Example
·
State
names require about 14 bytes (max)
·
State
abbreviations require 2 bytes
·
State
sequence numbers require 6 bits
·
Costs
·
Binary
codes cant be read (easily) by humans
·
Encoding/decoding
costs time
·
Must
include encode/decode logic in all programs requiring state names
·
Is
the cost worth it? It depends!
·
Suppressing
Repeating Sequences
·
Run-length
encoding
·
Include
run-length indicator, length value and target byte
·
Can
result in longer string than original (if not careful)
·
Variable-length
Codes
·
Use
shortest encodings for most frequently occurring characters
·
E.g.
Morse Code e and t get . and - respectively
·
Huffman
Codes
·
Instantaneous
Code You know you are at the end of a coded character without having to
examine the next character
·
No
code is a prefix for another code
·
The
characters with the highest probabilities of occurrence are assigned the
shortest codes
·
Can
be codes using a binary tree (see if you can figure out the algorithm!)
·
Irreversible
Compression Techniques
·
Some
information is lost and thus the original cannot be accurately reconstructed
·
Voice,
pictures, etc.
·
The
Master File Update Problem
·
Given:
·
a
sorted old master file
·
a
sorted transaction file with
·
adds,
changes and deletes
·
Output
a sorted new master with the transactions applied to the old master file
·
Write
the pseudo-code for this algorithm