Hashing

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

what is quadratic probing?

Similar to linear probing except it examines cells 1,4,9 and so on away from the original probe point because the probe sequence uses i^2

what is a hash table?

a list that uses hashing for storing/accessing elements

what is hashing?

a technique used for ordering and accessing elements in a list in a relatively constant amount of time by manipulating the key to identify the location of the element in the list

what is a collision-handling algorithm?

a way to deal with collisions, when they occur

what is a probe sequence?

a way to determine desirable alternative locations within a hash table, starting with the home location

algorithm are useful for ______ data

accessing

what are the 3 steps to retrieving an element from a hash table? (using linear probing for collision resolution)

apply the hash function on the key, compare the desired key to the actual key at that location, if they don't match, use linear probing to look at next slot in hash table

what does it mean to store data items sequentially?

arrange data items in the order in which they arrive

what are 2 data structures that can store items in sorted order?

arrays & BST

what are 2 data structures that can store items sequentially?

arrays & linked lists

what is the biggest challenge in designing a good hash function?

avoiding collisions

what is an example of a sorted searching algorithm?

binary search

open hashing is also called

chained hashing/separate chaining

what time complexity should you shoot for with a hash table?

constant time

why is collision bad?

creates more work during storing and access

increasing table size ____________ collision

decreases

what 4 qualities make a WINNING hash function?

determines hash value fully from key, uses entire key, distributes keys uniformly, gives very different hash values for similar keys

hashing is inherently __________

deterministic

a good hash function results in a uniform _____________________________ throughout the array's index range

distribution of indexes

what 2 rules disqualify this hash function from being a winning candidate? int hash(char *str, int table_size) { int sum = 0; if (str == NULL) return -1; for( ; *str; str++ ) sum += *str; // sum up all characters return sum % table_size; // return sum }

does not uniformly distribute the data, and does not produce different hash values for anagrams

what is the most common technique to reduce clustering?

double hashing

what's the most common way to avoid primary clustering?

double hashing

what is the most common implementation of chaining?

each component in the hash table is a head pointer to a linked list

what is an advantage of a binary search tree?

efficiency for insertions and deletions

what is a hash function?

function used to manipulate the key of an element in a list to identify its location in the list

what is the hash function for the "perfect" case? (ie storing employee number 0's data in spot [0] of an array, and employee number 1's data in spot [1] of an array, etc)

h(key) = key

what would be the hash function in the following scenario: given an array of 100 elements, use the last two digits of idNum to determine where to store/access each employee's information. For instance, information of employee with idNum 533*74* is contained in empArray[74] information of employee with idNum 812*35* is contained in empArray[35]

h(key) = key%100

4 main ideas that affect collision

hash table size, hash function, nature of input, collision resolution strategy

what 2 qualities should every hash function have

hash value range must cover entire hash value table, and must be cheap/fast to calculate

why is it important that the hash function uniformly distributes the data across the entire set of possible hash values?

if it doesn't, a large number of collisions will result, cutting down on the efficiency of the hash table

why is it important that the hash function use all the input data?

if it doesn't, then slight variations to the input data would cause an appropriate number of similar has values resulting in too many collisions

why is it important that the hash value be fully determined by the data being hashed?

if something other than the input data is used to determine the hash, then the hash value is not as dependant upon the input data, ths allowing for a worse distribution of hash values

what does relatively prime mean?

if the largest number that divides both of them evenly is 1 (ex 100 and 3)

what is a disadvantage of a binary search tree?

it may become very unbalanced, leading to a search speed of O(n)

what is the multiplicative hash function?

key is multiplied by a constant less than one and the hash function returns the first few digits of the fractional part of the result

what is a mid-square hash function?

key is multiplied by itself and the hash function returns some middle digits of the result

what are 3 examples of sequential searching algorithms?

linear, sequential, and serial search

what is linear probing

locate and use the next available hash table location (wrap around when necessary)

what is the code for linear probing (considering your table size is CAPACITY and the location will be saved in a variable called location)

location = (location + 1)%capacity

what do you need to do when deleting an item from a hash table w/ collision?

make a distinction between a location that contains an EmptyItem vs a DeletedItem

hashing is not good for these 3 types of searches and this type of visit:

min-max search, range search, rough match search, and ordered visits

in practice, the goal is to ________ collision

minimize

why shouldn't you use a random-number generator as a hash function?

not deterministic. need to be able to reproduce results when accessing elements in the table

closed hashing is also called

open addressing

4 types of collision resolution strategies

overflow area, bucket size larger than one AND overflow area, open hashing, closed hashing

storing employee number 0's data in spot [0] of an array, and employee number 1's data in spot [1] of an array, etc, is an example of...

perfect hashing

hashing is best for ________ search/insert/update/remove involving large amounts of data

plain (exact match based on key)

what is the main problem with linear probing?

primary clustering

hash table size should be.....

prime

with a division hash function, what is a good choice for table size?

prime number (of the form 4k+3)

what are 2 problems with quadratic probing?

probe sequence usually does not cover the whole hash table, and suffers from secondary clustering

one way to reduce collision is to increase the ________ of the hash table

range

what is it called when you adjust hash table size?

rehash

what are 2 advantages of hashing?

search speed is O(1) and doesn't require elements to be sorted

what are 2 disadvantages of hashing?

search speed only applies to elements actually in the table, and the search speed depends on the number of collisions that occur

data structures are useful for ________ data

storing

the hash function has 2 uses: as a method for ___________ and __________________

storing and accessing

hashing is useful in what sort of real world situation?

storing/accessing data based on ID (or key)

in double hashing, which 2 things should be prime

table size and step size

what is a disadvantage of binary search on an array?

the array must be sorted

what is linear probing?

the colliding element is stored in the next available location

what is the home location?

the location given by the hash function

what is the load factor of a hash table?

the ratio of the number of elements divided by the table size

what is an advantage of binary search on an array?

the search speed is O(log n)

the ADT that was (probably) the driving force for hashing development

the table

what is clustering?

the tendency of elements to become unevenly distributed in the hash table (many elements clustering around a single hash location)

why is it important that the hash function generates very different hash values for similar data items?

to makes sure similar elements are distributable over the hash table

to minimize collision, you should aim for a _________________ distribution of ________________ over the _____________ of hash table

uniform, probability, entire

how does double hashing work?

use the first hash key (from h1) to try to perform the task. if collision occurs, use the second hash key (from h2) and use the result to determine how far forward to move through the array in looking for an unused spot

what is double hashing?

uses a second hash function to determine how we move through the hash table to resolve a collision

what is pseudo-random hashing?

using a pseudo-random number generator as a hash function

what is collision?

when 2 or more keys produce the same hashing location

what is chained hashing/chaining?

where each component of a hash table can hold more than one entry

a division hash function uses which mathematical symbol?

%

hashing is an approach to...

data storage/access

probe sequence should cover...

the entire hash table

in double hashing, what are the 2 hash functions used for?

the first one is to determine the hash value, the second is to determine the step size

what is a hash scheme?

the hash function and the collision-handling algorithm

what is meant by "hashing should be randomly deterministic"?

the hash function must yield the same hash value every time a specific key is given, but hash values should be as well scattered as possible

with perfect hashing, what is the consideration given to space, and what is the consideration given to time/efficiency?

0% to space, 100% to time/efficiency


Ensembles d'études connexes

079 Social NeuroScience Study Guide

View Set

Microprocessor System (MCSL51E) - Chapter 4: Instructions and Memory

View Set

Delegation and Prioritization - NCLEX Questions

View Set

CRJU1010: Introduction to Criminal Justice Final Exam Study Guide

View Set