Hashing

Ace your homework & exams now with Quizwiz!

observation in hashing

1. Each element in the table can be stored and accessed in potentially O(1) time. 2. The Hashing technique can map a large key space of the data into a relatively small range of integers which are used as the indices of the hash table. In the example, the key space [6-189] is mapped to [0-11] 3. Hashing can be very efficient in terms of both the time and the space complexity

Collision

A collision or clash occurs when two different inputs to a function, typically one used to compress large data items into a smaller or fixed size, produce the same output, called (depending on the application) a hash value, checksum, fingerprint, or digest.

Extra work for retrieval process

A hash function that produces a hashtable without any collision is called a perfect hash function. A perfect hash function requires a good prior knowledge of the set of potential keys. Two states are sufficient for each cell of the hashtable, namely empty and occupied.

Example 2.21 Suppose that the hash function h(k) = k mod 11 is used again and the hash table is empty initially. Show the content of the hash table after inserting the data (29, 93, 31, 159, 51, 189, 27, 23, 17, 9).

Again we compute the hash codes first and get: (7, 5, 9, 5, 7, 2, 5, 1,6, 9). We found duplicate hash codes, such as, two 7s, three 5s and two 9s. This means that at least two data, for example, 93 and 159,are allocated to the same address 5 (So do (29 and 51); and (93, 159, 27)). We say that 'data 93 and 159 are collided' meaning that they are both mapped to the same address (5 in this example). This is the so-called collision problem in hashing.

why use double hashing

As we can see, the double hashing is more efficient than linear probing in this example. After one rehashing, only 9 is still collided with 31. This means that one probe is not sufficient using the double hashing. In this case, we need to take another approach to resolve the collision for 9. For example, we can apply linear probing on either of the hash functions, or apply an alternative rehash function. We shall leave you to solve the collision for 9.

what is closed addressing hashing

Closed address hashing is one easy solution for collision. The method chains the collided data together, using an array of linked lists. The size of the hash table remains the same during closed address hashing

collisions and hashing

Collisions are unavoidable in general because it is possible for two keys to have an identical hash code

hash table example

Hash Table Example : Here, we construct a hash table for storing and retrieving data related to the citizens of a county and the social-security number of citizens are used as the indices of the array implementation (i.e. key). Let's assume that the table size is 12, therefore the hash function would be Value modulus of 12.

what are hash able?

Hash Table is the result of storing the hash data structure in a smaller table which incorporates the hash function within itself

what is hashing

Hashing is another technique of storing and retrieving data.

what are hashing?

Hashing is the process of mapping large amount of data item to a smaller table with the help of a hashing function

rehashing

It is possible that a sequence of alternative hashing addresses will be allocated before a free hash cell is found. If a hash cell is occupied, a new hash code will be generated. If the new location is occupied again, a new hash code will be generated again. The process of computing the alternative addresses is called rehashing or probing.

Linear probing

Linear probing simply allocates the collided datum to the next available location. For example, if the first hash location h1 is occupied, then h1 + 1 is offered if it is empty, otherwise, the next location h1 + 2 is offered, and so on.7

double hashing goal

To avoid secondary rehashing, we introduce a rehashing technique called double hashing

Example 2.23 Suppose the hash function is h(k) = k mod 11, and the rehash function is then h(k) = (k + 1) mod 11. The hash table is empty initially. Show the content of the hash table after inserting the data (29, 93, 31, 159, 51, 189, 27, 23, 17, 9).

Solution Again we compute the hash code(s) first, using the hash function h(k) = k mod 11 and get: (7, 5, 9, 5, 7, 2, 5, 1, 6, 9). Found the collisions for 159, 51, 27, 17 and 9 at hash cell H[i], we rehash each of them by probing the next hash cell H[i + 1]. This process continues until each of them can be placed in a free hash cell. For example, since cell H[5] is occupied, we probe cell[6] and found it available, so place 159 to cell H[6]. Similarly, 51 is placed in cell H[8] after a collision is found in cell H[7]. There are 5 linear probes, at cell location 6, 7, 8, 9, 10 before placing 27 at cell H[10]. There are 5 probes at cell location 6-10 before placing 17 at cell H[0]. Finally, there are 5 probes at cell location 10-3 before placing 9 at cell H[3]. The process can be described precisely as to rehash 159, 51, 27, 17 and 9 as follows: rh(159) = (159 + 1) mod 11 = 160 mod 11 = 6 (1 probe) rh(51) = (51 + 1) mod 11 = 52 mod 11 = 8 (1 probe) rh1(27) = (27 + 1) mod 11 = 6, rh2(27) = (27 + 2) mod 11 = 7, rh3(27) = (27 + 3) mod 11 = 8, rh4(27) = (27 + 4) mod 11 = 9, rh5(27) = (27 + 5) mod 11 = 10 (5 probes) rh1(17) · · · rh5(17) = (17 + 5) mod 11 = 0 (5 probes) rh1(9) · · · rh5(9 + 5) = 14 mod 11 = 3 (5 probes) The content of the hashtable is (the data in bold which are rehashed

Example 2.22 Suppose the hash function is h(k) = k mod 11, and the rehash function is then h(k) = (k + 1) mod 11. The hash table is empty initially. Show the content of the hash table after inserting the data (29, 93, 31, 159, 51, 189, 27, 23, 17, 9).

Solution Again we compute the hash code(s) first and get: (7, 5, 9, 5, 7, 2, 5, 1, 6, 9). Collisions occur since 159 and 27 have the same hash code as for 93; 51 has the same hash code as that for 29, and 9 has the same hash code as for 31. We link, therefore, 93, 159 and 27 together, link 29 and 51 together, and link 31 and 9 together as follows. The content of the hashtable is (where symbol '↓' represents a link): i 0 1 2 3 4 5 6 7 8 9 10 H[i] 23 189 93 17 29 31 ↓ ↓ ↓ 159 51 9 ↓ 27 During the retrieval process, not only each hash cell but each linked list will also be searched. As we can see, the addresses of the hashtable remain the same despite extra storage space required for the linked lists.

Example of double hashing 2.24 Suppose the hash function is h(k) = k mod 11, and there hash function is h(k) = k mod 13. The hash table is empty initially.Show the content of the hash table after inserting the data (29, 93, 31, 159, 51, 189, 27, 23, 17, 9).

Solution Again we compute the hash code(s) first and get: (7, 5, 9, 5, 7, 2, 5, 1, 6, 9). Rehash 159, 51, 27, 23 and 9: rh(159) = 159 mod 13 = 3 rh(51) = 51 mod 13 = 12 rh(27) = 27 mod 13 = 1 rh(23) = 23 mod 13 = 10 rh(9) = 9 mod 13 = 9 The content of the hashtable becomes (the data in bold which are rehashed to the location.): i 0 1 2 3 4 5 6 7 8 9 10 11 12 H 27 189 159 93 17 29 31 23 51 9 49 E8455

what does hash function do?

The Hash Function primarily is responsible to map between the original data item and the smaller table itself.

Collision resolving

The collisions can be resolved in various ways. Of course, the hash function can be adjusted but this is not easy. The cause of the collisions is due to the attempt to map a large key space to a limited hash table range. A natural solution is hence to re-allocate the collided data elsewhere.

hash table

The data structure involved is essentially a one-dimensional array called hash table.

essence of hashing

The essence of hashing is to facilitate the next level searching method when compared with the linear or binary search

Open address hashing

This approach is to store all the elements right in the hashtable array without using any extra linked lists. The collided data need to be reallocated to other available cells. Thus additional cells may be required and the original hashtable may be extended.

calculation in linear probing

This is equivalent to using a similar hash function for rehashing when there is a collision: rh(k) = (k + 1) mod h

Double hashing

This is to apply an alternative hash function for probing. If the first hashing is unsuccessful, the second hash function can be used to resolve collisions

Example 2.18 Assume that a hash function h(k) = k mod 11 is used and the hash table is empty initially. Show the content of the hash table after inserting the data (7, 31, 159, 189, 23, 6).

We first compute the hash code for each datum using the hash function: h(7) = 7 mod 11 = 7 The given hash function is a modular function. To compute 7 mod 11, we need to first compute 7 div 11 = 0 and then calculate the remainder 7 − 0 × 11 = 7. Similarly, we have h(31) = 31 mod 11 = 9 h(159) = 159 mod 11 = 5 h(189) = 189 mod 11 = 2 h(23) = 23 mod 11 = 1 h(6) = 6 mod 11 = 6.

approaches for collision resolving

We look at some simple approaches here, namely, 1. closed address hashing 2. open address hashing linear probing double hashing. The first approach is called closed address hashing, for it does not consume any extra addresses of the hashtable. The number of addresses of the hashtable will remain the same. The second approach is called open address hashing, for the numberof addresses of the hashtable may be increased.

hashing technique have main problems which is

collisions

what is hash code

hash function h and a datum key k, the value of h(k) is called a hash code.

hash function and hashing

the index location of each datum depends on the value of its own key, and is calculated by a hash function. We denote the hash function by h(k), where k is the key of a datum


Related study sets

Banking Products & Services Vocabulary

View Set

Worksheet 12.1: Elements of Consideration & Adequacy of Consideration

View Set

Movement of the Thumb: Origin, Insertion, and Action

View Set

AP Computer Science Principles Programming (Khan Academy)

View Set

Chapter 14 Greenhouse effect and Global Warming

View Set

Chapter 6: Cultural and Ethnic Considerations

View Set

1450-1700 World history exam retake:

View Set

ultrasound physics boards (edelman)

View Set

Primerica Life Insurance Practice Questions (w/out multiple choice)

View Set