Sets, Maps, and Hash Tables
What is a map?
A collection of key-value pairs that do not contain duplicate keys
What is a set?
A collection that contains no duplicate elements
What is the implementation of std::map?
BST
What is the implementation of std::set?
BST
how are unordered sets implemented?
Hash Table, and operations are O(1) (+ O(k) for hash)
What are common has functions?
Key modulo table size, Mid square, Multiplicative string hash
When you remove an item from a hash table should you delete it?
No, must put a tombstone so probing knows that there might be other elements after
What are hash tables?
Tables that take in a key and calculates an index based on a hash functions *uses a hash function to compute an index (a hash code) which maps to a "bucket" containing a value
what will this output for a map? table['b'] = 30; table['a'] = 10; table['c'] = 50; table['a'] = 40
a : 40 b : 30 c : 50 *prints the keys in ascending order*
what is the separate chaining collision resolution policy?
each bucket stores a linked list; collisions are simply appended to the end of the list
What are the set operations?
find, insert, remove, union, intersection, difference, subset
characteristics of good and bad hash functions
good: - Evenly distribute data (therefore minimizing the potential for data collisions) - Be easy to compute, (and very fast) bad: - Produce different outputs for the same input - Take lots of time - Result in high potential for data collisions
What is the implementation of std::unordered_set?
hash table
What is the implementation of std:unordered_map?
hash table
how are maps implemented and what is their time complexity?
implemented as BST insert = O(log(n)), [] = O(log(n))
how are unordered maps implemented and what is their time complexity?
implemented as hash table Time complexity: insert average case = O(1) [] average case = O(1)
what are the methods for maps? insert map[key] erase(key) find(key) count(key) size() empty()
insert (key, value) - if key already exists in map, returns false otherwise inserts new entry with key, value pair. map[key] = value - if key already exists in map, overwrites with new value erase(key) - deletes key in map find(key) - searches for key in map, and returns iterator to it if found; otherwise returns iterator to map::end() count(key) - returns 1 if key is found in map or 0 otherwise size() - gives number of elements in map empty() - returns if map is empty or not
describe the following methods used by both sets and unordered sets insert(element) erase(element) find(element) count(element) size() empty()
insert(element) - adds element erase(element) - removes element find(element) - returns iterator to element if it is found, or returns an iterator to std::end otherwise count(element) - returns 1 if element is found or 0 otherwise size() - gives number of elements in set empty() - returns if set is empty or not
What are the differences between lists and sets in terms of implementation?
lists can be implanted as array based or with linked list. Sets can be implemented as array based or tree based.
What is the open addressing collision resolution policy?
look for other available slots 1) linear probing: each bucket stores only one entry; if you try to add and entry and there is a collision, move the "problem" entry (one bucket at a time) to the next available free bucket and put it there 2) quadratic probing: same as linear probing, except you move the "problem" entry 1 bucket, then by 4 buckets, then by 9 buckets then by 16 buckets, etc.
what type of relationships do maps have?
many-to-one
Do sets reveal the order of insertion of items?
no
Are sets indexed?
no set[index] is NOT allowed!
What is load factor in a hash table?
number of elements in the hash table / size of the array
how are sets implemented?
red black trees, so complexity for set operations is o(logn)
As the load factor grows, does the hash table become faster, slower, or stay the same?
slower, more collisions
What is chaining/bucket hashing in hash tables?
store a linked list of all the keys that map to the same index at the index
what should we do when the load factor becomes too large? (i.e. table becomes too full)
we should dynamically resize the table, and rehash our values to reduce the load_factor → making our table more time efficient
If you want to maintain a load factor of 0.5, the current capacity is 20, when does the capacity increase?
when the array has 10 elements
Can you remove an element from a set without shifting other elements around?
yes