CpSc 360

Lecture 15

 

Chapter 8

 

·      The Invention of the B-Tree

 

·       Invented in 1972 by R. Bayer and E. McCreight of Boeing Corporation

·       B-Tree DOES NOT stand for binary-tree; rather it stands for “Bayer” Tree.

 

·      Statement of the Problem

 

·       Binary searching requires too many seeks

·       It can be very expensive to keep the index in sorted order so we can perform a binary search

 

·      Binary Search Trees as a Solution

 

·       In a binary tree, over ½ of the nodes have null link fields (the leaves)

·       We can add a new node by searching down the tree and spawning a new subtree where the new element is to be placed

·       After many insertions, the tree begins to get unbalanced,  i.e. the distance from the root to some leaf nodes is substantially longer than the distance from the root to other leaf nodes.

·       The number of seeks is equal to the length of the path from the root to any node

 

·      AVL Trees as a Solution

 

·       Named for a pair of Russian mathematicians, G. M. Adel’son-Vel’skii and E. M. Landis

·       An AVL tree is a height-balanced tree, i.e. there is a limit placed on the amount of difference allowed between heights of any two subtrees sharing a common root.  With an AVL tree the difference is 1.

·       An AVL tree is called a height-balanced 1-tree or HB(1) tree.

·       Two important features of AVL trees

·       By setting a maximum allowable difference in the height of any two subtrees, AVL trees guarantee a certain minimum level of performance in searching

·       Maintaining a tree in AVL form as new nodes are inserted involves the use of one of a set of four possible rotations.  Each rotation is confined to a single, local area of the tree.  The most complex rotation requires only five pointer reassignments

·       AVL trees are not particularly good because, like binary trees, they have too many levels and thus require too many seeks

·       AVL trees approximate completely balanced trees.  The number of seeks for a completely balanced tree is

 

log2(N+1)

 

            for an AVL tree the number of seeks is

 

                                    1.44 * log2(N+1)

 

·       Even with AVL trees, the number of seeks to reach any record in a large file (e.g. 1,000,000 records) is in the order of 20 – 28, totally unacceptable.

 

·      Paged Binary Trees

 

·       A page is a block of information (e.g. collection of records) stored in a location that can be read with a single seek and read

·       The number of seeks required with a page with k record capacity is

 

logk+1(N+1)

 

·       If we have 134,217,727 records then the number of seeks required for a balanced binary tree and a paged tree with 511 records per node respectively are:

 

                                    log2(134,217,727+1) = 27 seeks for a balanced binary tree

 

                                    log511+1(134,217,727+1) = 3 seeks for a paged tree with 511 records per node

 

·      Top-down Construction of Paged Trees – Big Problems!

 

·       If we have all the keys in advance, we can sort them up, start in the middle of the list (at the root) and build the tree

·       In general, we receive the keys over an extended period of time and add them in a random order

·       Trees built through insertion techniques tend to be unbalanced

·       Keeping the tree balanced usually means reorganizing the tree – a messy and time consuming activity

·       The problems:

·       How do we ensure that the keys in the root page turn out to be good separator keys, dividing up the set of other keys more or less evenly?

·       How do we avoid grouping keys, such as C, D, and E so that they should not share a page?

·       How can we guarantee that each of the pages contains at least some mini8mum number of keys?  If we are working with a larger page size, such as 8,191 keys per page, we want to avoid situations in which a large number of pages each contain only a few dozen keys.

·       How can we organize the tree so that it can be traversed in a sorted order without going back up and down branches to reach logically adjacent nodes?

 

·      B-Tree Definition

 

·       A B-Tree is a restricted growth multiway search tree.  A B-tree of order m (page size m) is a tree that satisfies the following properties:

·       Every node has £ m children

·       Every node, except the root,  has ³ ém/2ù children.

·       The root has at least 2 children unless it is a terminal node

·       All terminal nodes appear on the same level

·       An internal node with k children contains k–1  key values.


Example B-Tree


 

 

 


B-Tree after inserting M


 

 

 

 


B-Tree while inserting J


 

 

 



TOOBIG

 


B-Tree after inserting J


 

 




B-Tree while inserting P

 

 



TOOBIG

 


Split Interior Node

 

 

 

 

 

 


    TOOBIG

 

 

 

 

 

B-Tree after inserting P

 


 

 



B-Tree while inserting D

 

 

 

 

 

 

 

 

 


       TOOBIG

 

 

B-Tree after splitting interior node

 

 

 

 

 

 

 

 


   TOOBIG

 

 

 

 

 

B-Tree while splitting root

 

 

 

 

 


                                             TOOBIG

 

 

 

 

 

 

 


B-Tree after inserting D


 

 

 

 


Example B-Tree prior to delete


 

 

 


B-Tree after deleting J


 

 

 



B-Tree while deleting M


 

 

 


B-Tree while deleting R

 


 


TOOBIGNODE                                                                             TOOSMALL

 

 

B-Tree after deleting R


 

 


 


B-Tree while deleting H


 

 

 


B-Tree after deleting H

 


 

 


 


B-Tree while deleting B

 

 

 


Too small

 

 

 

 

 

 


B-Tree while deleting B

 

 


B-Tree after deleting B