Hashing
what is quadratic probing?
Similar to linear probing except it examines cells 1,4,9 and so on away from the original probe point because the probe sequence uses i^2
what is a hash table?
a list that uses hashing for storing/accessing elements
what is hashing?
a technique used for ordering and accessing elements in a list in a relatively constant amount of time by manipulating the key to identify the location of the element in the list
what is a collision-handling algorithm?
a way to deal with collisions, when they occur
what is a probe sequence?
a way to determine desirable alternative locations within a hash table, starting with the home location
algorithm are useful for ______ data
accessing
what are the 3 steps to retrieving an element from a hash table? (using linear probing for collision resolution)
apply the hash function on the key, compare the desired key to the actual key at that location, if they don't match, use linear probing to look at next slot in hash table
what does it mean to store data items sequentially?
arrange data items in the order in which they arrive
what are 2 data structures that can store items in sorted order?
arrays & BST
what are 2 data structures that can store items sequentially?
arrays & linked lists
what is the biggest challenge in designing a good hash function?
avoiding collisions
what is an example of a sorted searching algorithm?
binary search
open hashing is also called
chained hashing/separate chaining
what time complexity should you shoot for with a hash table?
constant time
why is collision bad?
creates more work during storing and access
increasing table size ____________ collision
decreases
what 4 qualities make a WINNING hash function?
determines hash value fully from key, uses entire key, distributes keys uniformly, gives very different hash values for similar keys
hashing is inherently __________
deterministic
a good hash function results in a uniform _____________________________ throughout the array's index range
distribution of indexes
what 2 rules disqualify this hash function from being a winning candidate? int hash(char *str, int table_size) { int sum = 0; if (str == NULL) return -1; for( ; *str; str++ ) sum += *str; // sum up all characters return sum % table_size; // return sum }
does not uniformly distribute the data, and does not produce different hash values for anagrams
what is the most common technique to reduce clustering?
double hashing
what's the most common way to avoid primary clustering?
double hashing
what is the most common implementation of chaining?
each component in the hash table is a head pointer to a linked list
what is an advantage of a binary search tree?
efficiency for insertions and deletions
what is a hash function?
function used to manipulate the key of an element in a list to identify its location in the list
what is the hash function for the "perfect" case? (ie storing employee number 0's data in spot [0] of an array, and employee number 1's data in spot [1] of an array, etc)
h(key) = key
what would be the hash function in the following scenario: given an array of 100 elements, use the last two digits of idNum to determine where to store/access each employee's information. For instance, information of employee with idNum 533*74* is contained in empArray[74] information of employee with idNum 812*35* is contained in empArray[35]
h(key) = key%100
4 main ideas that affect collision
hash table size, hash function, nature of input, collision resolution strategy
what 2 qualities should every hash function have
hash value range must cover entire hash value table, and must be cheap/fast to calculate
why is it important that the hash function uniformly distributes the data across the entire set of possible hash values?
if it doesn't, a large number of collisions will result, cutting down on the efficiency of the hash table
why is it important that the hash function use all the input data?
if it doesn't, then slight variations to the input data would cause an appropriate number of similar has values resulting in too many collisions
why is it important that the hash value be fully determined by the data being hashed?
if something other than the input data is used to determine the hash, then the hash value is not as dependant upon the input data, ths allowing for a worse distribution of hash values
what does relatively prime mean?
if the largest number that divides both of them evenly is 1 (ex 100 and 3)
what is a disadvantage of a binary search tree?
it may become very unbalanced, leading to a search speed of O(n)
what is the multiplicative hash function?
key is multiplied by a constant less than one and the hash function returns the first few digits of the fractional part of the result
what is a mid-square hash function?
key is multiplied by itself and the hash function returns some middle digits of the result
what are 3 examples of sequential searching algorithms?
linear, sequential, and serial search
what is linear probing
locate and use the next available hash table location (wrap around when necessary)
what is the code for linear probing (considering your table size is CAPACITY and the location will be saved in a variable called location)
location = (location + 1)%capacity
what do you need to do when deleting an item from a hash table w/ collision?
make a distinction between a location that contains an EmptyItem vs a DeletedItem
hashing is not good for these 3 types of searches and this type of visit:
min-max search, range search, rough match search, and ordered visits
in practice, the goal is to ________ collision
minimize
why shouldn't you use a random-number generator as a hash function?
not deterministic. need to be able to reproduce results when accessing elements in the table
closed hashing is also called
open addressing
4 types of collision resolution strategies
overflow area, bucket size larger than one AND overflow area, open hashing, closed hashing
storing employee number 0's data in spot [0] of an array, and employee number 1's data in spot [1] of an array, etc, is an example of...
perfect hashing
hashing is best for ________ search/insert/update/remove involving large amounts of data
plain (exact match based on key)
what is the main problem with linear probing?
primary clustering
hash table size should be.....
prime
with a division hash function, what is a good choice for table size?
prime number (of the form 4k+3)
what are 2 problems with quadratic probing?
probe sequence usually does not cover the whole hash table, and suffers from secondary clustering
one way to reduce collision is to increase the ________ of the hash table
range
what is it called when you adjust hash table size?
rehash
what are 2 advantages of hashing?
search speed is O(1) and doesn't require elements to be sorted
what are 2 disadvantages of hashing?
search speed only applies to elements actually in the table, and the search speed depends on the number of collisions that occur
data structures are useful for ________ data
storing
the hash function has 2 uses: as a method for ___________ and __________________
storing and accessing
hashing is useful in what sort of real world situation?
storing/accessing data based on ID (or key)
in double hashing, which 2 things should be prime
table size and step size
what is a disadvantage of binary search on an array?
the array must be sorted
what is linear probing?
the colliding element is stored in the next available location
what is the home location?
the location given by the hash function
what is the load factor of a hash table?
the ratio of the number of elements divided by the table size
what is an advantage of binary search on an array?
the search speed is O(log n)
the ADT that was (probably) the driving force for hashing development
the table
what is clustering?
the tendency of elements to become unevenly distributed in the hash table (many elements clustering around a single hash location)
why is it important that the hash function generates very different hash values for similar data items?
to makes sure similar elements are distributable over the hash table
to minimize collision, you should aim for a _________________ distribution of ________________ over the _____________ of hash table
uniform, probability, entire
how does double hashing work?
use the first hash key (from h1) to try to perform the task. if collision occurs, use the second hash key (from h2) and use the result to determine how far forward to move through the array in looking for an unused spot
what is double hashing?
uses a second hash function to determine how we move through the hash table to resolve a collision
what is pseudo-random hashing?
using a pseudo-random number generator as a hash function
what is collision?
when 2 or more keys produce the same hashing location
what is chained hashing/chaining?
where each component of a hash table can hold more than one entry
a division hash function uses which mathematical symbol?
%
hashing is an approach to...
data storage/access
probe sequence should cover...
the entire hash table
in double hashing, what are the 2 hash functions used for?
the first one is to determine the hash value, the second is to determine the step size
what is a hash scheme?
the hash function and the collision-handling algorithm
what is meant by "hashing should be randomly deterministic"?
the hash function must yield the same hash value every time a specific key is given, but hash values should be as well scattered as possible
with perfect hashing, what is the consideration given to space, and what is the consideration given to time/efficiency?
0% to space, 100% to time/efficiency