Hashing

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

In a successful search

the number of nodes to examine is about Lambda/2 on average.

In an unsuccessful search

The number of nodes to examine is Lambda on average.

General Ideas when Hashing?

- Each key is mapped into some number in the range 0 to TableSize − 1 and placed in the appropriate cell. - Should be simple to compute - Should ensure that any two distinct keys get different cells. - Hash function

Example: insert {5, 15, 6, 3, 27, 8} If position h(key) = key mod TableSize is occupied then Apply the linear probing ith probe was (h(key) + i) % TableSize, i =1, 2, 3, 4, ...

0 1 2 3 4 5 6 7 8 9 5 15 6 27 8 <Primary-Cluster

Define Quadratic probing

A collision resolution method that eliminates the primary clustering problem of linear probing

Compare: AVL Tree vs. Hash Table Average Complexity? Find min/max? Items in a range? Sorted input?

AVL Tree HashTable Average Complexity O(logN) O(1) Find Min/Max Yes No Items in a range Yes No Sorted Input Very Bad No problems (many rotations)

Advantages and Disadvantages of Separate Chaining

Advantages - Simple to implement. - Hash table never fills up, we can always add more elements to chain. Disadvantages - Parts of the table/array might never be used. - Uses extra space for links. - As chains get longer, search time increases to O(n) in the worst case.

Advantages and Disadvantages of Open Addressing

Advantages of Open addressing: - All items are stored in the hash table itself. There is no need for another data structure. Disadvantages of Open Addressing: - The keys of the objects to be hashed must be distinct. - Dependent on choosing a proper table size. - Requires the use of a three-state (EMPTY, OCCUPIED, DELETED) flag in each cell.

For Collision Resolution what is Separate Chaining?

All keys that map to the same table location are kept in a linked list

Define a collision

Choosing a function, deciding what to do when two keys "hash" to the same value.

What is the bad news for quadratic hashing?

For quadratic probing is NO guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime. Theorem: if the table is half empty (l < 1/2) and the Table-Size is prime, then we are always guaranteed to be able to insert a new element.

Define Double Hashing

General Idea: - Given two good hash functions u and v, it is very unlikely that for some key, u(key) == v(key) - So make the probe function f(i) = i*v(key) Detail: Make sure v(key) cannot be 0 formula (h1(key) + h2(key)∗i) mod (tablesize).

Which is the best selection for collision resolutions when it comes to hash-tables?

Gonnet and Baeza-Yates compare several hashing strategies; their results suggest that quadratic probing is the fastest method.

Define the load factor (lambda) of a hash table.

Lambda = N / Table-Size Where N is the number of items in the table

Important consideration when picking the table size.

If the table size is 10 and the keys all end in zero choice of hash function needs to be carefully considered. The hash function (Key mod TableSize) is a bad choice. - It is a good idea to ensure that the table size is prime. Why? Real-life data tends to have a pattern. - "Multiples of 61" are probably less likely than "multiples of 60". - If the input keys are random integers, then the function Key mod TableSize is a very simple to compute and distributes the keys evenly.

Explain Open Addressing

Important points: - All items are stored in the hash table itself. - In addition to the cell data (if any), each cell keeps one of the three states: EMPTY, OCCUPIED, DELETED. - While inserting, if a collision occurs, alternative cells are tried until an empty cell is found. - Deletion (lazy deletion): When a key is deleted the slot is marked as DELETED. - Probe sequence: A probe sequence is the sequence of array indexes that is followed in searching for an empty cell during an insertion, or in searching for a key during find or delete operations.

Quadratic probing is better than linear probing because it eliminates primary clustering; however, what is a possible drawback?

It may result in secondary clustering: if h(k1) = h(k2) the probing sequences for k1 and k2 are exactly the same. This sequence of locations is called a secondary cluster

Given integer values, what is the hash function?

Key mod TableSize

Define primary clustering

Keys tend to cluster around table locations that they originally hash to

Given the Lambda expression of load-factor. What is a general rule of separate-chaining?

Make the table size about as large as the number of elements expected Lambda ~= 1

Can we eliminate collisions?

No, we reduce and have ways of handling collisions; however, we can not remove the possibility of collisions.

The computational effort for search is what time complexity?

O(1) + Time to traverse the list

Running time complexity for hash functions

On average, a good hash function will achieve O(1) inserts, searches, and removes, but in the worst-case may require O(N).

Is there a way to use the "unused" space in the table/array instead of using chains to make more space?

Open Addressing Main idea: use empty space in the table

Three types of Collision Resolutions

Separate Chaining Quadratic Probing Double Hashing

One main issue when it come to hashing? What limitations do we have?

Since there are a finite number of cells and a virtually infinite supply of keys. This is impossible given we can not give that memory space to a computer.

When should Hash-Table be used?

Use Hash Table if there is any suspicion of SORTED input & NO ordering information is required.

How does the probe function change? c(i) = i

We can avoid primary clustering by changing the probe function c(i) = i by c(i) = i2 -or- bucket = (Hash(item->key) + c1 * i + c2 * i * i) % N

Given the following Table 0 1 2 3 4 5 6 7 8 9 Element 8 109 10 38 19 find(109)= find(58)= delete(38) = find(8) =

find(109)= 1 find(58)= null (T[8],T[9],T[0],T[1], and T[2] ¹ 58, T[3]=null) delete(38) T[8] = "no data, don't stop" DELETED find(8), T[8] ? 8, no data, move to next T[9] ? 8, 19 ¹ 8, move to next T[0] ? 0, 0 = 0, YES!, find(8) = 0

Common probe sequences are of the form

hi(key) = (h(key) + c(i)) mod TableSize, where i = 0, 1, ..., TableSize-1 and c(0) = 0. c(i) is used to resolve collisions


Ensembles d'études connexes

State Laws & Regulations- GA P&C Exam Prep

View Set

Etchegoyen "La entrevista psicoanalítica: estructura y objetivos"

View Set

Unit 3 (Sem 4) Preeclampsia/HELLP/abruptio placentae

View Set

Chapter 7 (Knowledge Representation)

View Set

Chapter 6 quiz, Chapter 5 quiz, Chapter 4 Quiz

View Set