DB-Ch18
Serch Tree constraints B Tree constraint
1 K1<K2<Kn-1 2 For all values X in the subtree pointed by Pi we have Ki-1<X<Ki A tree must be balanced or complete and space wasted never become excessive.
Types of Single-level Ordered Indexes
1 Primary Indexes 2 Clustering Indexes 3 Secondary Indexes
Files primary organization
1 Unordered 2 Ordered 3 Hashed organization
Indexes as Access Paths
A single-level(ordered file) index of a file that makes it more efficient to search for records. Specified on one field of the file The index is called an access path on the field, much smaller entries. A binary search on the index yields a pointer to the file record
B-Tree or B+-Tree
B-T= multilevel access structure in a balanced tree with each node at least half full, order p, p-1 at most search values B+-T= dynamic multilevel tree, a variation of B-t, data ptr only stored at the leaf nodes, the tree is formed of nodes. Each node is tree except root. A node without a child is a leaf node. Internal node= non- leaf node B-Tree
Multi-Level Indexes
Because a single-level index is an ordered file, we can create a primary index to the index itself; In this case, the original index file is called the first-level index and the index to the index is called the second-level index. We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block. A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block
Clustering Index
Defined on an ordered data file. The data file is ordered on a non-key. includes file with two fields 1st field= field of data type 2nd field = disk block ptr It is example of non-dense index has entry for distinct value for entry field which is a non-key.
Difference between B-tree and B+-tree
In a B-tree, pointers to data records exist at all levels of the tree. In a B+-tree, all pointers to data records exist at the leaf-level nodes. A B+-tree can have less levels than the corresponding B-tree
Dynamic Multilevel Indexes Using B-Trees and B+-Trees
Most multi-level indexes use B-tree or B+-tree data structures because of the insertion deletion problem. This leaves space in each tree node (disk block) to allow for new index entries. These data structures are variations of search trees that allow efficient insertion and deletion of new search values. Each node corresponds to a disk block. Each node is kept between half-full and completely full. If a node is full the insertion causes a split into two nodes. Splitting may propagate to other tree levels. A deletion is quite efficient if a node does not become less than half full. If a deletion causes a node to become less than half full, it must be merged with neighboring nodes
Multi-Level Indexes
Such a multi-level index is a form of search tree. However, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file.
Block anchor or anchor record term-3
The index entry has the key field value for the first record in the block, The primary index is a non-dense (sparse) index since it includes an entry for each disk block of the data file and the keys of its anchor record.
Indexes characterized
as dense or sparse
Dense index
has an index entry for every search key value in the data file.
access structures are called
indexes Provide secondary access paths. It provides access based on indexing fields
Problem with Primary index
insertion & deletion will change anchor records of some blocks. It is similar to overflow. Record deletion handled with deletion markers
Primary Index
on ordering key field of an ordered file, ordering key is used to order physical files and they have a unique value for that field.
A sparse (or non-dense) index
on the other hand, has index entries for only some of the search values
Secondary Indexes
provides accessing a data file for which some primary access already exists. 1st Indexing Field: The same data type as some non-ordering field of the data file that is an indexing field. 2nd block ptr or rcrd ptr: There can be many secondary indexes for the same file. Includes one entry for each record in the data file; hence, it is a dense index need more storage, longer search time