Test 2 CSC 221
Perfect Balance
- Could require a complete tree after ever operation
AVL Trees
- Height balanced binary search trees - Used to bound the worst case performance for Binary Search trees to O(logN) -They are always balanced -Calculates balance factor at every node; heights can differ by no more than 1
Trie
- Ordered tree data structure used to store an associative array where the keys are usually strings - Specialized tree for word searches, spell checking, spelling correcting, predictive typing - No node in the trie stores the key associated with that node - The node's position in the tree shows its key -Descendents of any node have a common prefix -Values are normally not associated with every node
Map implementation
-Use a binary search tree or a hash table as the internal storage container. -In HashMap implementation, keys are not in sorted order -Tree map implementation: AVL tree used to implement the map, stores keys in sorted order
Trie implementation
2D array Linked List Binary search tree
Maximum Number of Nodes in a Perfect Binary Tree
2^(h+1)-1
Full binary tree
A binary tree in which each node has exactly 0 or 2 children
Complete binary tree
A binary tree in which every level, except possible the deepest, is completely filled. At depth n, the height of the tree, all nodes are as far left as possible
Perfect Binary tree
A binary tree with all leaf nodes at the same depth. All internal nodes have exactly two children Has 2^(n+1) nodes
Binary Heap
A complete binary tree with a structure property and an ordering property. REMEMBER: On bottom level, each node is filled from left to right
Imperfect Hash function
A hash function which doesn't have a one to one mapping of keys to has values
Priority Queue
A queue that allows line jumping A collection of data with each time having a priroty
Binary Search Tree
A tree where each node has at most 2 children, and every node's left subtree holds values less than the node's value and every right subtree holds values greater than the node's value - An in order traversal will provide the elements in ascending order -Average case of O(logN) for add access and remove -Worst case os (ON^2)
Height balanced
A tree where there difference between heights of the left and right subtree is not more than 1
Binary tree
A tree with at most 2 children for each node
Tree
Abstract data point -Only point is the root -Nodes are leafs or internal nodes
Shifting
An alternative to folding Uses << which shifts the bit value over a certain amount. This is a good (but more complicated) alternative to folding because, for example, God and Dog would have different values if each letter was given a hash value
Overcrowding when Hashing
An insert using probing techniques cannot work with a load factor of 1 or more -Quadratic probing can fail if λ>½ -Linear probing and double hashing slow if λ>½ Separate chaining becomes slow once λ>1 To relieve the pressure on the hash, REHASH
Ancestors
Any node for which this node is a descendant
Descendants of a node
Any nodes that can be reached via one or more edges from another node
Implementation of Binary Heap
Can be represented using an array. This is better than using pointers because it takes up less space, multiplying and dividing by 2 are easy to do, and there is a simple way to find parents from children and vice versa
How to Rehash
Create a larger hash table, hash the current values on the larger table
Chaining
Each element of the hash table can be another data structure -Linked list, balanced binary tree -More space, but somewhat easier -Everything goes in its spot Reize at a given load factor or when any chain reaches some size limit
Properties of a trie
Each node has between 1 and k descendants Each link of the tree has a matching character Each leaf node corresponds to the final word which can be collected on a path from the root to this node Use O(N) space Search/insert/delete in O(DxM) time where d is length of the parameter string and m is the size of the alphabet
Heap operations
Find min insert(val): percolate up deleteMin: Percolate Down
Problem with Linked List
Finding an item takes O(N) Using a binary tree reduces access to O(log(N))
Ways to Build Heaps
Floyd's method - Add all elements arbitrary and form a complete tree. Next, fix the heap order property by percolating nodes up or down as necessary Can be constructed in O(N) time
Load factor in double hashing
For any λ<1, double hashing will find an empty slot (given appropriate table size and hash2) Search cost approaches optimal Costly as λ nears 1
Load Factor in Linear Probing
For any λ<1, linear probing will find an empty slot Performance degrades quickly for any λ>1/2
Binary Heap order property
For every non-root node X, the value in the parent of X is less than or equal to the value in X
Tries Applications
Full text search Storing word lists Search engine indices Kept in memory, therefore fast Biological applications (DNA, Genome sequencing) Game applications (Boggle)
Internal node
Has 1 or more children Is the parent of its child nodes
Leaf
Has no children
Ways to Use Priority Queues and Heaps
Heapsort: Add all items to a heap, and then remove them one at a time in sorted order To find Median: Add N elements into an array. Apply build heap algorithm to the array. Then delete perform n/2 deleteMin operations. The last item extracted from the heap is the median.
Balance factor
Height of left subtree - height of right subtree
Insert in Heap (Percolate Up)
Idea: Put val at next available leaf position, percolate up by repeatedly exchanging node until no longer needed
Probing
If a bucket isn't empty, search forward or backward for an open space Linear Probing: - Move forward one spot, check, next spot, check. -When deleting/removing, insert a blank (null if never occupied, blank if once occupied Quadratic probing - Check spot 1, then spot 2, then 4, then 8, then 16
Load factor in quadratic probing
If table size is prime and λ ≤ ½, quadratic probing will find an empty slot; for greater λ, may not
Priority Queue Operations
Insert deleteMin (Min is an arbitrary choice)
Word matching tree
Insert words into trie Each leaf stores occurrences of word in the text
Mapping
Integer values or things that can easily be converted to integer values. Transform the hashed key value into a legal index in the hash table Place in table by taking result of hash function (key) and taking remainder of dividing this result by the size of the table Prime numbers work best Hash table uses an array as its underlying storage container (array is like a series of buckets) See Ex. under hash functions
Map as a Dictionary
Key is a word, each is unique Value is the definition Together, they form a pair stored in the map One value may be represented by many keys (ie. synonyms)
Formula for Leaves
L = n(k-1)+1 L = leaves, n = internal nodes, k = k-ary tree
Collision resolution function (Probing)
Linear: F(i) I quadratic: f(i) = I^2
Advantages of tries
Looking up keys is faster (O(m) to find a key of length m vs. o(logN) for BST with n elements Lookups depend on depth of keys Require less space because they contain a large number of short strings Faster in the worst case, O(m) than an imperfect hash table which may have collisions; there are no collisions in a trie
Quadratic probing
Main idea: Spread out the search for an empty slot - Increment by I^2 instead of I
Calculating Rolling/Running Medians
Maintain 2 Heaps: A max heap and a min heap -Values <= the median are stored in the max heap -Values >= the median are stored in the min heap -Maintain balance: Number of values the two heaps can differ by at most 1
Hashing techniques
Mapping Folding Shifting
Big O of priority Queue
O(1) to insert into unsorted list, and O(n) to delete O(n) to insert into sorted list, and O(1) to delete
Hash Code Contract
Objects that are equal must have the same hash code within a running process Does not imply either of these misconceptions -Unequal objects will have different hash codes -Objects with the same hashcode must be equal Leads to these guidelines -Whenever you implement equals, you should also implement hashCode -Don't use hashCode directly as a key -Don't use hashCode in distributed applications
Collisions
Occurs when two different keys has to the same value - Keys 18 and 35 both map to 1 with table size 17
Offline vs. Online Algorithms
Offline: Algorithms that compute a property of a static collection of values Online: Algorithms that compute some property of a changing sequence of values
Folding
Partition key into several parts so that the integer values for the various parts are combined The parts may be hashed first Combine using addition, multiplication, or shifting
Depth of a node
Path length from the root to a specific node
Binary tree traversals
Preorder traversal In order traversal Post order traversal Level order traversal To determine the result, draw a path around a tree
Applications of PQueue
Print jobs in order of decreasing length Select most frequent symbols for compression
Methods to Resolve Collisions
Probing (Closed Hashing) Separate Chaining (Open Hashing)
In order traversal
Process LEFT subtree, then ROOT, then RIGHT subtree Always gives elements of a BST in increasing order
Preorder traversal
Process ROOT, then sub trees from LEFT to RIGHT
Post order traversal
Process the LEFT subtree, then RIGHT subtree, then ROOT
Hashing with Chaining
Put a pointer at each entry -Choose type as appropriate -Common chain is an unordered linked list Properties -Performance degrades with the length of chains -λ can be greater than 1
Delete in Heap (Percolate Down)
Remove the root Put last leaf at root Find the smallest child of the node Swap the node with its smallest child if needed Repeat the swap until no swaps are needed
Load factor with separate chaining
Search cost - Unsuccessful search: whole chain, average length of chain is λ -Successful search: Half a chain: Average is λ/(2+1) Optimal load factor -Zero! But between ½ and 1 is fast and makes good use of memory
Maps (Map <K,V>)
Search table, dictionary or associative array Data structure optimized for specific kind of search/access Access by asking "Give me the value associated with a key" -Keys are unique, but one value may be represented by multiple keys
Drawbacks of tries
Slower in some cases than hash tables Not easy to represent all keys as strings, such as numbers Tries are less space efficient than hash tables
Double Hashing (Probing)
Spread out the search for an empty slot by using a second hash function Near optimal when load factor is near 1/2
Level order traversal
Starting from the root of a tree, process all nodes at the same depth from left to right, then proceed to the nodes at the next depth
Height of AVL Trees
Storing n Keys is O(log n)
Patricia Trie
Substitue a chain of one child nodes with an edge labeled with a unique string Each non leaf node except root has at least 2 children
Hash Functions
Takes a large piece of data and reduce it to a smaller piece of data, usually an integer There are different types of hash functions. Ex: Take the 3rd letter of a name, divide by 6 and take the remainder. Normally a 2 step process -Transform key into an integer value -Map the integer into a valid index in the hash table -Locations in a hash table are often called buckets
Edge
The link from one node to another
Height of a node
The maximum distance (path length) of any leaf from this node -A leaf has a height of 0 -The "height of a tree" is the height of the root of the tree
Path length
The number of edges that must be traversed to get from one node to another
Search and Insertion in Trees
The search algorithm follows the path from the root towards the leaf, and can result in the word being found or not found New string insertion checks if the current character is at the current level of the tree, starting from the root. If yes, it proceeds down that branch labeled with the character. If not, it inserts a new branch at that level
Rotations
These can be used to rebalance AVL trees Outside cases require single rotation, inside cases require double -Left left case -Right right case -Right left case -Left right case
Siblings
Two nodes that have the same parent
Hash Tables
Uses a hash function to compute an index which is used as a key to identify its respective element in the table The location where elements are stored are often called buckets
Drawbacks of Linear Probing
Works until array is full, but as number of items N approaches TableSize (as λ gets close to 1) access time approaches O(N) Very prone to cluster information -If key hashes anywhere into a cluster, finding a free cell involves going through entire cluster and making it grow Often, table becomes empty except for a few clusters which doesn't distribute keys uniformly
Perfect hash functions
Yield a one to one value of keys to hash values
Left left
a \ b \ c b becomes the new root. a takes ownership of b's left child as its right child, or in this case, null. b takes ownership of a as its left child. b / \ a c
Right Left Double rotation
a \ c / b First, perform a right rotation on the right subtree. After performing a rotation on our right subtree, we have prepared our root to be rotated left. Here is our tree now: a \ b \ c Looks like we're ready for a left rotation. Let's do that: b / \ a c
Left right Double rotation
c / a \ b First, make our left subtree left-heavy. We do this by performing a left rotation our left subtree. Doing so leaves us with this situation: c / b / a This is a tree which can now be balanced using a single right rotation. We can now perform our right rotation rooted at C. The result: b / \ a c
Right Right
c / b / a b becomes the new root. c takes ownership of b's right child, as its left child. In this case, that value is null. b takes ownership of c, as its right child. b / \ a c
Hash Tables in java
hashCode is a method in java hashCode and equals -If two objects are equal according to the equals (object method) then calling the hashcode method on each of the two objects must produce the same integer result -If a class overrides equals, it should also override hashCode
Load factor of Hash Table
λ symbol Ratio of the number of values in the hash table to table size. Way to measure the efficiency of hash table implementations