DBMS CHAPTER 9 (OVERVIEW OF STORAGE AND INDEXING)

Ace your homework & exams now with Quizwiz!

Search with range selection example

"Find all Students records with name alphabetically after 'Smith.' "

Search with Equality Selection example

"Find the Students record for the student with sid = 23."

Properties of Alternative 1,2,3

1) At most ONE INDEX on a given collection of data records can use Alternative 1, otherwise, data records are duplicated leading to update issues 2) Alternative 2 and 3 are faster with large data records 3) Alternative 3 is more COMPACT than Alternative 2, but leads to VARIABLE SIZED DATA entries even if search keys are of FIXED LENGTH.

Give the situations where the three file organizations would be best used:

1) Heap - best when file scan and retrieving ALL RECORDS 2) Sorted - best when ORDER important or need a RANGE 3) Clustered - best when doing RANGE search

What operations do we look at when determining an appropriate file organization?

1) Scan 2) Search with equality selection 3) Search with range selection 4) Insert/Delete/Update

What are the three types of file organizations:

1) files of randomly ordered records (Heap Files) 2) files sorted on some field (Sorted Files) 3) files that are hashed on some fields (Clustered) 4) files that are indexed

Three Alternatives for Data Entries in an Index

1. A data entry k∗ is an actual data record (with search key value k). 2. A data entry is a <k, rid> pair, where rid is the record id of a data record with search key value k. 3. A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data records with search key value k.

Advantage and disadvantage of Clustered index

Advantage: low I/O cost Disadvantage: Expensive to maintain

Primary Index

An index on a set of fields that includes the primary key is called

Secondary Index

An index that is not a primary index is called a secondary index

Alternative 1 always leads to dense index. Why?

Because in Alt 1, only each search key has one corresponding data record. And in Dense indexing, each index has on data record.

When are Sorted Files best used

Best if records must be retrieved in some ORDER, or only a `RANGE' of records is needed. Storage efficient but fast INSERT and DELETE

Alternatives 3 (By reference):

COMPACT version of Alternative 2 where it is a <k, and list of rids with matching data records> This works well for when you have many data records with the SAME KEY value.

Alternative 1 (By Value):

Data records are stored in the actual index file. The key-value pairs aren't separated. If data records are very large, # of pages containing data entries is high. (Not efficient for large data records)

Every sparse index is clustered! Why?

In a sparse index, only some indexes point to the records. Since the data records are sorted, and the indexes are sorted, this means the sorting of the indexes are the same (or close to ) the sorting of the data records.

The flow structure of a Search Key in an index

Index File [Search Key -> Data Entry (k*)] -> Data File [Data Record]

Difference between sorted and hashed files?

Sorted files have continuous pages and continuous records while hashed files do not have continuous pages but have continuous records

Sparse indexes are smaller. why?

Sparse indexes are smaller because there are less indexes being used. Since an index points to a group of records, compared to every index pointing to a data record.

Clustered and Unclustered Example

Suppose that Students records are sorted by age; an index on age that stores data entries in sorted order by age is a clustered index. If in addition we have an index on the gpa field, the latter must be an unclustered index.

A data file can be clustered on at most one search key (T/F)

True;

Alternatives 2 (By reference):

a <k, rid> pair where k is the search k value, and the rid is the record id of where to find the record in the data file.

An index file contains ....

a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k

Dense Index

a data record appears for every index

What is a Search Key

a sequence of FIELDS on which a file is SORTED or HASHED EX: salary, age

What is and index on a file?

an auxiliary structure designed to SPEED up operations

A sparse index must be ...

clustered

You can have at most only one _____________ index for a specific data record

clustered;

Hash based indexes are good for __________ and cannot support ________________.

equality based searches; range based searches

Sparse Index

only some indexes point to data records

When are heap files best used

suitable when typical access is a FILE SCAN retrieving ALL records. This is storage efficient and a SLOW SEARCH.

When are clustered (hashed) Files best used?

suited for range search with index but NOT STORAGE efficient but INSERT and DELETE are fast

Clustered Index

the ordering of data RECORDS is the same as or close to the ordering of data ENTRIES (Alternative 1 is clustered by definition)

Unclustered Index

the ordering of data records is NOT the same as or close to the ordering of data entries (we can have several unclustered indexes on a data file.)

A dense index must be ...

unclustered


Related study sets

SOCIOLOGY EXAM 2: Chapter 7: Social Class: The Structure of Inequality

View Set

Chapter 40 Management of patients with gastric and duodenal disorders

View Set

PN Pediatric Nursing Online Practice 2023 B

View Set

Grondslagen van de Chemie: definities

View Set

Spanish 2 - Lección 6 - Estructura: 6.2 Autoevaluación

View Set