DBMS CHAPTER 9 (OVERVIEW OF STORAGE AND INDEXING)
Search with range selection example
"Find all Students records with name alphabetically after 'Smith.' "
Search with Equality Selection example
"Find the Students record for the student with sid = 23."
Properties of Alternative 1,2,3
1) At most ONE INDEX on a given collection of data records can use Alternative 1, otherwise, data records are duplicated leading to update issues 2) Alternative 2 and 3 are faster with large data records 3) Alternative 3 is more COMPACT than Alternative 2, but leads to VARIABLE SIZED DATA entries even if search keys are of FIXED LENGTH.
Give the situations where the three file organizations would be best used:
1) Heap - best when file scan and retrieving ALL RECORDS 2) Sorted - best when ORDER important or need a RANGE 3) Clustered - best when doing RANGE search
What operations do we look at when determining an appropriate file organization?
1) Scan 2) Search with equality selection 3) Search with range selection 4) Insert/Delete/Update
What are the three types of file organizations:
1) files of randomly ordered records (Heap Files) 2) files sorted on some field (Sorted Files) 3) files that are hashed on some fields (Clustered) 4) files that are indexed
Three Alternatives for Data Entries in an Index
1. A data entry k∗ is an actual data record (with search key value k). 2. A data entry is a <k, rid> pair, where rid is the record id of a data record with search key value k. 3. A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data records with search key value k.
Advantage and disadvantage of Clustered index
Advantage: low I/O cost Disadvantage: Expensive to maintain
Primary Index
An index on a set of fields that includes the primary key is called
Secondary Index
An index that is not a primary index is called a secondary index
Alternative 1 always leads to dense index. Why?
Because in Alt 1, only each search key has one corresponding data record. And in Dense indexing, each index has on data record.
When are Sorted Files best used
Best if records must be retrieved in some ORDER, or only a `RANGE' of records is needed. Storage efficient but fast INSERT and DELETE
Alternatives 3 (By reference):
COMPACT version of Alternative 2 where it is a <k, and list of rids with matching data records> This works well for when you have many data records with the SAME KEY value.
Alternative 1 (By Value):
Data records are stored in the actual index file. The key-value pairs aren't separated. If data records are very large, # of pages containing data entries is high. (Not efficient for large data records)
Every sparse index is clustered! Why?
In a sparse index, only some indexes point to the records. Since the data records are sorted, and the indexes are sorted, this means the sorting of the indexes are the same (or close to ) the sorting of the data records.
The flow structure of a Search Key in an index
Index File [Search Key -> Data Entry (k*)] -> Data File [Data Record]
Difference between sorted and hashed files?
Sorted files have continuous pages and continuous records while hashed files do not have continuous pages but have continuous records
Sparse indexes are smaller. why?
Sparse indexes are smaller because there are less indexes being used. Since an index points to a group of records, compared to every index pointing to a data record.
Clustered and Unclustered Example
Suppose that Students records are sorted by age; an index on age that stores data entries in sorted order by age is a clustered index. If in addition we have an index on the gpa field, the latter must be an unclustered index.
A data file can be clustered on at most one search key (T/F)
True;
Alternatives 2 (By reference):
a <k, rid> pair where k is the search k value, and the rid is the record id of where to find the record in the data file.
An index file contains ....
a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k
Dense Index
a data record appears for every index
What is a Search Key
a sequence of FIELDS on which a file is SORTED or HASHED EX: salary, age
What is and index on a file?
an auxiliary structure designed to SPEED up operations
A sparse index must be ...
clustered
You can have at most only one _____________ index for a specific data record
clustered;
Hash based indexes are good for __________ and cannot support ________________.
equality based searches; range based searches
Sparse Index
only some indexes point to data records
When are heap files best used
suitable when typical access is a FILE SCAN retrieving ALL records. This is storage efficient and a SLOW SEARCH.
When are clustered (hashed) Files best used?
suited for range search with index but NOT STORAGE efficient but INSERT and DELETE are fast
Clustered Index
the ordering of data RECORDS is the same as or close to the ordering of data ENTRIES (Alternative 1 is clustered by definition)
Unclustered Index
the ordering of data records is NOT the same as or close to the ordering of data entries (we can have several unclustered indexes on a data file.)
A dense index must be ...
unclustered