CpSc 360
Chapter 8
·
The Invention of the B-Tree
· Invented in 1972 by R. Bayer
and E. McCreight of Boeing Corporation
· B-Tree DOES NOT stand for
binary-tree; rather it stands for “Bayer” Tree.
· Statement of the Problem
· Binary searching requires too
many seeks
· It can be very expensive to
keep the index in sorted order so we can perform a binary search
· Binary Search Trees as a Solution
· In a binary tree, over ½ of
the nodes have null link fields (the leaves)
· We can add a new node by
searching down the tree and spawning a new subtree where the new element is to be
placed
· After many insertions, the
tree begins to get unbalanced, i.e. the distance from the root to some leaf
nodes is substantially longer than the distance from the root to other leaf nodes.
· The number of seeks is equal
to the length of the path from the root to any node
· AVL Trees as a Solution
· Named for a pair of Russian
mathematicians, G. M. Adel’son-Vel’skii and E. M. Landis
· An AVL tree is a height-balanced tree, i.e. there is a
limit placed on the amount of difference allowed between heights of any two
subtrees sharing a common root. With an
AVL tree the difference is 1.
· An AVL tree is called a height-balanced 1-tree or HB(1) tree.
· Two important features of
AVL trees
· By setting a maximum
allowable difference in the height of any two subtrees, AVL trees guarantee a
certain minimum level of performance in searching
· Maintaining a tree in AVL
form as new nodes are inserted involves the use of one of a set of four possible
rotations. Each rotation is confined to
a single, local area of the tree. The
most complex rotation requires only five pointer reassignments
· AVL trees are not
particularly good because, like binary trees, they have too many levels and
thus require too many seeks
· AVL trees approximate
completely balanced trees. The number
of seeks for a completely balanced tree is
log2(N+1)
for an AVL tree the number of seeks
is
1.44 * log2(N+1)
· Even with AVL trees, the
number of seeks to reach any record in a large file (e.g. 1,000,000 records) is
in the order of 20 – 28, totally unacceptable.
· Paged Binary Trees
· A page is a block of
information (e.g. collection of records) stored in a location that can be read
with a single seek and read
· The number of seeks required
with a page with k record capacity is
logk+1(N+1)
· If we have 134,217,727
records then the number of seeks required for a balanced binary tree and a
paged tree with 511 records per node respectively are:
log2(134,217,727+1)
= 27 seeks for a balanced binary tree
log511+1(134,217,727+1) = 3
seeks for a paged tree with 511 records per node
· Top-down Construction of Paged Trees – Big Problems!
· If we have all the keys in
advance, we can sort them up, start in the middle of the list (at the root) and
build the tree
· In general, we receive the
keys over an extended period of time and add them in a random order
· Trees built through
insertion techniques tend to be unbalanced
· Keeping the tree balanced usually
means reorganizing the tree – a messy and time consuming activity
· The problems:
· How do we ensure that the
keys in the root page turn out to be good separator
keys, dividing up the set of other keys more or less evenly?
· How do we avoid grouping
keys, such as C, D, and E so that they should not share a page?
· How can we guarantee that
each of the pages contains at least some mini8mum number of keys? If we are working with a larger page size,
such as 8,191 keys per page, we want to avoid situations in which a large
number of pages each contain only a few dozen keys.
· How can we organize the tree
so that it can be traversed in a sorted order without going back up and down
branches to reach logically adjacent nodes?
· B-Tree Definition
· A B-Tree is a restricted
growth multiway search tree. A B-tree
of order m (page size m) is a tree that satisfies the
following properties:
· Every node has £ m children
· Every node, except the root,
has ³ ém/2ù children.
· The root has at least 2
children unless it is a terminal node
· All terminal nodes appear on
the same level
· An internal node with k children contains k–1 key values.
Example
B-Tree





B-Tree while inserting P

TOOBIG

B-Tree while
inserting D
TOOBIG
TOOBIG
B-Tree after
inserting D

Example B-Tree
prior to delete

B-Tree after
deleting J

B-Tree while
deleting M

B-Tree while
deleting R

![]()
TOOBIGNODE TOOSMALL
B-Tree after
deleting R

B-Tree while
deleting H

B-Tree after
deleting H

B-Tree while
deleting B

B-Tree while deleting B
B-Tree after
deleting B
