CS310 Final

Ace your homework & exams now with Quizwiz!

Null Pointer Exception

Occur when you try dereferencing a node that does not exist, using a dot operator on a variable of type Node that doesn't point to a valid node. Always occur because dot operator is used on a node that doesn't point to anything. ***Have to carefully consider all possible cases for the operation to be performed.

Arrays vs Linked Lists

Array: faster performance, requires less memory, BUT has a fixed size. Linked List: Unlimited Capacity, BUT slower performance and requires extra memory for links.

Chaining

Chaining provides a BETTER solution to the collision problem for tables that require many insertions and deletions. Simple to implement. We create an array of linked lists, and insert each data item into the linked list at the index it hashes to. Thus if we have a collision, we store multiple items in the linked list at that index. No need to create ordered LL for your implementation. SEPARATE CHAINING = OPEN HASHING

GetKey (HashTable)

O(n) public K getKey(V value) { for(int i = 0;i<tableSize; i++) for(DictionaryNode n: list[i]) //have to traverse through entire HT and find the value //equal to n.value if(((Comparable<V>)value).compareTo((V)n.value) == 0) return (K)n.key; return null; }

Graphs of complexity algorithms

O(n) : everything on line n or under (takes less time) o(n) : everything under line n Θ(n) : only on line n Ω(n) : everything on or over line n (takes more time) ω(n) : everything over line n

Slow Algorithms

O(n^2) algorithms (bubble, selection, insertion) have an outer loop that runs n times, and an inner loop that runs n-1 times. They are easy to code but also the worst performers.

Selection Sort

Best/Avg/Worst: O(n^2) Stable? No In Place? Yes How it works: Find the largest element in an array. Swap this element with the element in last place. Then find the largest element in the section of the array minus the end. Swap the max element with the next-to-last one. Continue this process. With each pass of the loop, the largest element in the unsorted section is swapped with the element is last place in the unsorted partition. The unsorted partition becomes 1 less and the sorted partition one larger. Characteristics: Eliminates extra swaps from bubble sort. Still poor performer. Example: 19-2-23-16-33-4-27-1 Process: 1. Scans down array looking for largest element: 33 Best so far: 19 -> 23 -> 33 Index: [0] -> [2] -> [4] 2. Swaps 33 with the 1: 19-2-23-16-1-4-27-33 After 1st Pass: 19-2-23-16-1-4-27-33 After 2nd Pass: 19-2-23-16-1-4-27-33 (nothing happened) After 3rd Pass: 19-2-4-16-1-23-27-33 still n(n-1)/2 behavior = O(n^2)

Heap Sort

Best/Avg/Worst: O(nlogn) Stable? No In Place? Yes How it works: First the array must be turned into a heap, starting at index[1] (NOT ZERO), each element is trickled up to satisfy the heap ordering properties. Then the max element is repeatedly removed from the root position at index[0]. Characteristics: Must be max heap. Example: 50-42-13-33-7 Heap: (like a BST) 50 42-13 33-7, -- Process: 1. Remove the 50 (max element), the 7 replaces. Heap: 7 42,13 33 7 trickles down to the bottom and put next largest element (42) at root 2.Remove the next max, 42 Heap: 7 33,13 7 trickles down, 33 moved to the top After 1st Pass: 42-33-13-7-------50 After 2nd Pass: 33-13-7-------42-50

Complexity

Complexity analysis refers to how much time an algorithm may take to complete as the input size grows. When we talk about fast or slow algorithms, we're referring to the rate of growth as the input size increases. Thus, faster rate of growth means slower algorithm. Linear Behavior: doing twice as much takes twice as long

Hash Iterators

Disadvantage of Hash Tables: data is not ordered by the key, but rather by the hashcode. The tables don't store things in sorted order. The keyIterator must make a copy of the data and sort it on the keys. You should use a private sort method in your iterator class.

Suppose a programmer has to write a spell checker application. The words in the dictionary will be stored in a hashtable. For the hash method, the programmer decides to sum the ascii codes of the characters for each word. He reasons that since each word is distinct (no dups), this method will generate a unique hash for each word. Is his reasoning correct?

No. This is a bad idea. Many words will generate the same same hash code. ate and eat for instance will generate the same code.

Big Oh (O)

O(g) is the set of functions f such that for some constant c>0 and some n0: f(n) <= c g(n) for all n>n0 f=O(n) : F grows at a rate equal to or less than n. It may never grow at a rate faster than n, but may grow slower than n. THE ALGORITHM IS N OR FASTER.

Open/Closed Addressing

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. ---------------------------------Open hashing defines each slot in the hash table to be the head of a linked list. All records that hash to a particular slot are placed on that slot's linked list.

Hashtables do not keep data ordered by keys. This makes some operations more difficult. Suppose that you are required to write the following method for a hashtable that uses chaining public Object [] getRange(String first, String last) This method returns an ordered array of all keys in the table that fall within the range first..last inclusive. Your method must be as efficient in terms of time and storage complexity as possible. Describe how you might implement the method.

This is best done with the table's iterators, though this is not the most efficient way. The most expensive part of the process is the sort. Therefore, the most efficient way is to go through the table and collect all data that is within the required range and then sort it.

Priority Queues are almost always implemented with an ordered data structure. Why?

To speed up dequeue operations. If the data structure is unordered, then at dequeue time, the element to be removed must be found. With both arrays and linked lists that are unordered, this is a O(n) operation. The reason for this is that if a job is going into a priority queue, it will have to wait a certain amount of time in any case. If the insert takes time, that's not really an issue. However, when a job is removed from the priority queue, it should run at once. A delay on dequeue is usually unacceptable.

Heap insert/remove

referred to as trickle up/trickle down. Max number of comparisons in a trickle down operation is # of levels - 1.

Double hashing

second hash function is used. The key is transformed by the first hash function into an index. That index is then hashed to get a sequence for probing that will minimize clustering.

Theta (Θ)

Θ(g) is the set of functions f such that for some constant c>0 and some n0: f(n) = c g(n) for all n>n0 F=Θ(n) : f grows at a rate exactly equal to n. It does not grow more slowly, nor faster than n. IT IS STRICTLY EQUAL TO N.

N0 (n subzero)

the point far enough from 0 where from that point, behavior is consistent

Visit

to use a reference to the node to do something with it. This term is generally used in conjunction with iterating over the tree

Max Number of nodes in an AVL tree

T(n) = 2^(height+1) - 1

Analyzing an algorithm (Rules)

1. FASTER RATE OF GROWTH MEANS SLOWER ALGORITHM. 2. The equal sign indicates that a function belongs to a certain complexity class. Indicating set membership. 3. Discard all coefficients and lower order terms. 4. All logs grow at the same rate, regardless of the base 5. All exponential functions (n in the exponent) grow at a faster rate than any polynomial function 6. O(f) + O(g) = O(max {f,g}) Ex: O(n^2) + O(n) = O(n^2) 7. O(f) x O(g) = O(fxg)

Powers of 2

2----4----8----16----32----64----128----256----512----1024----2048----4096

Linked Lists

A collection of nodes stored on the heap, each of which contains a data element and a reference to the next node in the collection. The address field of the final node contains a null value, which signals the end of the list. Access in the data in the list is done by sequentially traversing the nodes in the list, one at a time. Addresses are addresses on the heap, nodes not in any specific order on the heap. (Addresses in hexadecimal 0x43h28dk) What it looks like: (head) [-] [-] [-] [-]

Leaf

A node at the bottom of the tree - a node that has zero parents

Node (BST)

A wrapper class that holds2 references, one to the left Subtree and the other to the right Subtree. The node also contains data.

A certain algorithm takes 2 seconds to process with an input size of n=1000 on an old desktop PC. A) How long would it take if n=2000? B) How long would it take for n=2000 if the algorithm were O(n^2) instead of linear> C) How long would it take for n=2000 if the algorithm were O(n^3) instead of linear?

A) How long would it take if n=2000? 2*2 = 4 B) How long would it take for n=2000 if the algorithm were O(n^2) instead of linear? 2*4 = 8 C) How long would it take for n=2000 if the algorithm were O(n^3) instead of linear? 2*8 = 16

Both Hashtables and balanced trees are often used for Dictionary ADT structures. What are the advantages and disadvantages of each for this purpose?

Advantages Hash Table: O(1) performance for search, insert, and delete. Balanced Tree: Data is ordered by key Disadvantages Hash Table: Data is not stored in key order Balanced Tree: Slower than hash tables

Complete tree

All levels in the tree are full. All leaf nodes are at the same level, and all nodes have zero or 2 children

AVL Tree

Balanced Condition: For each node in the tree, the difference in height between the left and right subtrees may differ by no more than one. After each insertion: must compare left heigh vs right height of each node. After insertion or deletion, we must backtrack to look for a node that violates the balanced condition.

Expected Runtime

C*x = ? X = n or logn or n^2 etc.. Ex: n = 1,2,4,8,16 Time = 2,8,32,128,512 C (x) = 2 X=n^2 2(2^2) = 8 2(4^2) = 32

Insert (HashTable)

Check if it is full, or if the hash table already contains it. public boolean add(K key, V value) { if(isFull()) return false; if(contains(key)) return false; DictionaryNode<K,V> newNode = new DictionaryNode<K,V>(key,value); //make new dictionary node to insert list[getIndex(key)].insert(newNode); //calling getIndex(key) to give the index associated with the //key being passed in, and then calling insert method in my //LL to insert the new node modCounter++; currentSize++; return true; }

Subtree

Each node in the tree may be thought of as a tree itself

Process of analyzing an algorithm

Examine the code and identify the amount of work done in the terms of input size. Look for loops where the stopping condition of the loop changes as the input size changes

Limits (Rules)

F/g = 0 G grows at a faster rate. G>f (g is a slower algorithm) F/g = c G grows at the same rate. g=f F/g = infinity(----). f grows at a faster rate than g. F>g. (F is a slower algorithm)

Complexity Classes (in order)

FASTER(less time)---1---logn---Sarto---n---nlogn---n^2---n^3--n^c---c^n---n!--SLOWER(more time)

Queue

FIFO behavior, like a line at a store. when dequeue() operation is performed, item that has been in the queue the longest is removed and returned. For a singly LL, use both a head and tail pointer, then insert at the tail and remove from the head. YOU CANNOT REMOVE FROM THE TAIL OF A SINGLY LL. But you CAN insert at the tail. For arrays, create a logical front and rear index that moves as elements are enqueued and dequeued, so that the front is at any index instead of just storage[0].

What is a hash function? Describe at least three commonly used hash methods.

Folding, multiplication, addition, xor, exponentiation, shifting.

Because deletion from a balanced binary tree can be computationally intensive, sometimes lazy deletion is used. Describe this technique.

Nodes are marked as deleted, but not removed from the tree.

Priority Queues are often implemented using binary heaps. What advantage does this have over an ordered array implementation?

Heaps provide O(log n) enqueue and dequeue operations. With arrays, either the enqueue or dequeue operation must be O(n), depending upon whether or not an ordered array is used. This is because the of the shifting that must be done after an insertion or deletion.

inner node

Not a leaf node. A node with one or 2 children

If a hashtable requires integer keys, what hash algorithm would you choose? Write Java code for your hash algorithm.

If the key can be any integer, not some range of integers, then the key itself may be used as the hash code. The following code is from the Java API for class Integer: private final int value; ... public int hashCode() { return value; }

To sort an array in almost sorted order, which algorithm would you choose? Why?

Insertion sort. It is O(n) when the array is already sorted.

Height

The number of edges in the longest path from root to leaf. The test defines this term as the number of nodes rather than edges. It is the number of levels -1.

Level

The number of levels is the height+1

If you were required to write an in place HeapSort algorithm, would you use a min heap or a max heap? Explain your choice.

Max heap. If you remove MAX from the heap, then it goes at the end of the array. As the heap shrinks as elements are removed, this creates a space at the end of the array.

Binary Heaps

Max heap: root is maximum number Min heap: root is minimum number Heaps most often used to implement priority queues

If you were required to sort a very large file that would not fit in memory, what algorithm would you choose? Describe the steps you would perform to sort the file.

Merge sort. Read from the file as many elements as will fit in memory. Sort the elements and then write to a temporary file Repeat until the whole file has been processed. Merge the temporary files back and overwrite the original file.

Red/Black Tree

Offers advantages over AVL trees: 1. a single addition field - the color - must be stored in each node. 2. We do not need to backtrack and retrace the insertion path and do not have to update heigh info 3. Fewer rotations required RULES: 1. Every node is either black or red 2. Root node is always black 3. New insertions are always colored red. 4. No path can have 2 red nodes in a row 5. null children are always black AFTER EACH INSERTION: check to make sure every path has same number of nodes CONDITIONS: Rotation: if it has a black aunt, then rotate, making it B R...........R Always rotate the grandparent node ColorFlip: If it has a red aunt, then color flip, making it R B...........B

Although they are difficult to code, red/black trees are often the implementation of choice for balanced trees. Why are red/black trees chosen instead of AVL trees?

Red/Black trees have a looser balanced condition which means that fewer adjustments to the structure are required, while still preserving O(log n) behavior. AVL trees also have more overhead. The left and right height for each node must either be stored in the node, and maintained, or calculated on the fly for each node along the insertion path.

Min Number of nodes in an AVL tree

S(h) = S(h-1) + S(h-2) + 1 S(0) = 1 S(1) = 2

List the structural and ordering properties for red/black trees.

The Red/Black tree is a balanced binary search tree. Each node in the tree contains zero, one or two children. For each node n in the tree, all entries to the left of n must be smaller than n, and all entries to the right must be larger than n. The balanced condition of the tree is dictated by the following rules: ● Every node is colored either red or black. ● The root node is always black. ● New insertions (except for root) are always colored red. ● Every path from root to leaf must contain exactly the same number of black nodes. ● No path can have two red nodes in a row. That is, if a node is red, it cannot have a red child or a red parent. ● Null children are always black. A red violation (two reds in a row along the insertion path) indicates that the tree is out of balance and must be adjusted. Adjustments are always made to the grandparent of the node that caused the violation according the following two rules: ● If the node that caused the violation has a red aunt, then perform a color flip. ● If the node that caused the violation has a black aunt, then rotate.

Parent

The ancestor node

Child

The descendant node

Balanced Tree

The height of the tree grows at a rate O(long). So the best/avg/worst case for search/insert/delete will always be O(logn). In every case, after an insertion or deletion the state of the trees balance is checked, and the tree is transformed in some way if the balanced condition is violated

Edge

The link or connection between a parent and child node.

Path

The sequence of edges between 2 nodes

Root

The top node in the tree, similar to the head node in the linked list. This is the single entry point to the tree.

If a queue is implemented using a singly linked list with a head and tail pointer, you should always insert at the tail and remove from the head. Explain why this is so.

You cannot remove from the tail of a singly linked list, even with a tail pointer, in O(1) time. Because the links only go one way, you must iterate over the entire list to get a reference to the next-to-last node in order to do the deletion. This operation is O(n). If you remove from the head, and insert at the tail, then both operations are O(1).

Recursion

a method where the solution to a problem depends on solutions to smaller instances of the same problem (as opposed to iteration)

Priority Queue

an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. Removal always O(1), insertion always O(n)

Red/Black trees require that some additional information, beyond what is required for standard binary trees, be stored in the Node. class.

class RBNode<K,V> { private K key; private V value; private RBNode<K,V> leftChild; private RBNode<K,V> rightChild; private boolean isRed; public RBNode(K k, V v) { key = k; } }

Push/Pop (syntax)

class Stack<E> { private int head; private int maxSize; private Object [] stack; public Stack(int size) { maxSize = size; head = -1; stack = new Object[maxSize]; } public void push(E data) { //first check if stack full -> throw new RTE if(head == maxSize-1) throw new RuntimeException("Error, full stack"); //then head+1 gets data stack[++head] = data; } public E pop() { if(head == -1) throw new RuntimeException("Error, cannot pop off of an empty stack"); return stack[head--]; }

Derivatives

d/dx (logn) = 1/n d/dx(n^2) = 1/(2n^2)

Logs

logbase2(n) = ln(n)/ln(2) logbase2(1,000,000) = 20 logbase2(2,000,000) = 21 logbase2(4,000,000) = 22 logbase2(8,000,000) = 23 logbase2(1 bil) = 30

Little Oh (o)

o(g) is the set of functions f such that for some constant c>0 and some n0: f(n) < c g(n) for all n>n0 f=o(n) : f grows at a rate strictly less than n. It may not be equal to n. THIS ALGORITHM IS FASTER THAN N.

getKey (BST)

public K getKey(V value) { tmp = null; findValue(value,root); return tmp; } private void findValue(V value, Node<K,V> n) { if(n == null) return; findValue(value,n.left); if(((Comparable<V>)value).compareTo(n.value) == 0) if(tmp == null) tmp = n.key; findValue(value,n.right); }

GetValue (BST)

public V getValue(K key) { return findKey(key,root); } private V findKey(K key, Node<K,V> n) { if(n == null) return null; int comp = ((Comparable<K>)key).compareTo(n.key); if(comp < 0) return findKey(key,n.left); else if(comp > 0) return findKey(key,n.right); else return (V) n.value; }

Rewrite the following Java code for InsertionSort so that it will sort any Object that implements the Comparable interface: public static int[] insertionSort(int array[]) { int [] n = array; int in, out, temp; for(out = 1; out < n.length; out++) { temp = n[out]; in = out; while(in > 0 && n[in-1] >= temp) { n[in] = n[in-1]; in--; } n[in] = temp; } return n; }

public static <E> E[] insertionSort(E[] array) { E[] on = array; int in, out; E temp; for(out = 1; out < on.length; out++) { temp = on[out]; in = out; // while(in > 0 && on[in-1] > temp) { while(in > 0 && ((Comparable<E>)on[in-1]).compareTo(temp) > 0) { on[in] = on[in-1]; in--; } on[in] = temp; } return on; }

deleteFirst (LL)

public void deleteFirst() { if(head == null) //checking if its empty return; head = head.next; }

Load Factor

(upside down y) The percentage of the table that is occupied. As the table load factor approaches 1, the probe time increases. Linear Probing: Unsuccessful search: 1(y) = (1/2)(1 + (1/((1-y)^2)) Successful search: 1(y) = (1/2)(1 + (1/(1-y)))

Reverse Nodes in the list

/ / reverses the order of the nodes in the list. i.e. if the list contains: // HEAD->A->B->C->D // then the method modifies the list so that it becomes: // HEAD->D->C->B->A // The method does not create a new list, but reverses the nodes by manipulating the links in the existing list. public void reverseList(){ Node tmp = null, previous = null, current = head; while(current != null) { tmp = current.next; current.next = previous; previous = current; current = tmp; } head = previous; }

DeleteLastInstance (LL)

// deletes the last instance of the key in the list public void deleteLastInstance(int key) //INTEGER Node previous = null, current = head; Node previousWhere = null, where = null; while(current != null) { if(current.data == key) { previousWhere = previous; where = current; } previous = current; current = current.next; } // end while if(where == null) // not found or empty list return; if(where == head) //deleting head head = head.next; else previousWhere.next = where.next; //deleting where }

deleteLast (LL)

// removes the last element in the list (there is no tail pointer). public void deleteLast() { if(head == null) //empty list return; Node previous = null, current = head; while(current.next != null) { previous = current; current = current.next; } //finding last element if(previous == null) // only one item in the list head = null; else previous.next = current.next; //previous is referencing null, making it last node }

Node

A node has 2 fields: the data element, and the address of the next element in the list. Looks like: [-] Top has the data, bottom has the address of the next node. head = A head.next = The next field in the A node [-] which is B (bottom part) head.data = field 'data' in the first Node head.next.next = the field 'next' in the second node [-] bottom part of node B that is pointing to C head.next.next.data : The node is head.next.next (Node C), The data field is the data in node C All Node<E> classes should be inner classes.

Radix Sort

Best/Avg/Worst: O(n) Stable? Yes In Place? No How it works: Splits ints into individual digits and places them in the auxiliary arrays based on the individual digit. Begins with the least significant digit and finishes with the most significant. OR need to create buckets (0-9) and put each number in a bucket, then take them out of the buckets and put back in the array. Characteristics: Linear algorithm. Requires twice as much auxiliary storage as the original array. It can only be used on character and int arrays. Its use predates computers. Linear, but usually slower than O(nlogn) lags. Stored in the computer in binary. Example: 207-412-993-621-047-336-802-116 Process: 1. Make Buckets - insert numbers based on last digit. Put back in array in that order. 0: 1: 621 2: 412 802 3: 993 4: 5: 6: 336 116 7: 207 047 8: 9: 2. Then again based on 2nd digit, then 3rd. After 1st Pass: 621-412-802-993-336-116-207-047

Bubble Sort

Best/Avg/Worst: O(n^2) Stable? Yes In Place? Yes How it works: This algorithm starts at the beginning of the array, and compares the first 2 elements. If they are out of order(the first element is larger than the second in ascending order) then the 2 elements are swapped. Then the alg shifts down to the next pair and compares them, swapping if they are out of order. Once the algorithm has reached the end of the array, the largest element will be in last position. With each pass of the loop, the largest element remaining in the unsorted section bubbles down to the end of the array. Characteristics: Easy to code, but WORST performer, should never be used. Sorted Partition of the array increases by 1 every time. Example: 19-2-23-16-33-4-27-1 Process: 1. Compares 19 & 2, swaps them 2-19-23-16-33-4-27-1 2. Compares 19 & 23, no swaps 2-19-23-16-33-4-27-1 3. Compares 23 & 16, swaps them 2-19-16-23-33-4-27-1 etc After 1st Pass: 2-19-16-23-4-27-1-33 Took 7 comparisons, then will take 6, then 5...etc n(n-1)/2 comparisons = O(n^2)

Merge Sort

Best/Avg/Worst: O(nlogn) Stable? Yes In Place? No How it works: Divide and Conquer algorithm. doesn't actually sort, but merges already sorted partitions. sorted lists can be merged in O(n) time. Partitions the array into n logical sections of size=1, which is considered sorted. These arrays are merged with their neighbors to form partitions of size=2, then 4, etc till sorted. Characteristics: Fast algorithm, no undesirable behaviors, except requires auxiliary storage. Only fast sorting all that is stable. An aux array of the same size as original is required, thus not in place. Best choice for large arrays that will not fit in memory. Recursive algorithm. Example: 19-2-23-16-33-4-27-1 Process: 1. Breaks up into smaller arrays until we have: [19] [2] [23] [16] [33] [4] [27] [1] 2. Now merges each array with its neighbor, inserting in correct order (compares 2 and 19, inserts 2, then whatever left): [2-19] [16-23] [4-33] [1-27] 3. Then merges again, compares 2 and 16, 2 smaller so takes 2, then compares 16 with 19, 16 smaller takes 16, then compares 19 and 23, takes 19, 23 comes automatically.

Quick Sort

Best/Avg: O(nlogn) Worst: O(n^2) Stable? No In Place? Yes How it works: Recursively partitions the array into smaller pieces like Merge Sort. Chooses a pivot (usually last element), and the elements are rearranged so that everything left of pivot is smaller and everything right larger, by swapping elements in the wrong area. Characteristics: One of the fastest sorting algorithms, recursive algorithm. Fastest sorting all available based on timing tests, but tricky to code. Worst case behavior: already sorted, can fix by swapping right most element with middle element, then choose pivot. Must fix worst case behavior or likely to crash. Example: 24-93-50-82-16-37-21-89-46 Process: 1. Pivot = 46, pointer at 24 and 89 2. is 24 < pivot? Yes, doesn't move, pointer moves to 93 3. is 89 > pivot? Yes, doesn't move, pointer moves to 21 4. Is 93 < pivot? No, now must find element on right side to swap 5. is 21 > pivot? No, swap 93 with 21 6. Go until pointers meet in the middle, then swap pivot with first element on the right side (46 with 82) 7. Now pivot on left side: 16, pivot on right side: 82 8. Sort each half After 1st Pass: 24-93-50-82-16-37-21-89-46 After 1st Pass: 24-21-50-82-16-37-93-89-46

Insertion Sort

Best: O(n) Avg/Worst: O(n^2) Stable? Yes In Place? Yes How it works: The array is partitioned into a sort and unsorted section. With each pass of the outer loop, the sorted partition will grow by one and the unsorted will shrink by 1. At the beginning, the sorted is the 1st element at index [0] and the unsorted is the rest of it. Begin with index [1], copy the element in the current index in a tmp variable, then shift elements in the sorted section over to the right to make room for tmp. We stop when we arrive at where the current element will go. Characteristics: best of the O(n^2) algorithms. Easy to code. All of choice when number of elements to be sorted is small. (arrays with >1000 elements). Minimized number of comparisons because if sorted already, the inner loop does not run making it O(n). Modest improvement: adding binary search to find the final position of the element to be inserted. Example: 19-2-23-16-33-4-27-1 Process: 1. Start at index 1, tmp is 2. 2. Copy 2 and override it: 19-19-23-16-33-4-27-1 3. Shift things down till we find where it belongs After 1st Pass: 2-19-23-16-33-4-27-1 Sorted: 2-19 Unsorted: 23-16-33-4-27-1 After 2nd Pass: 2-19-23-16-33-4-27-1 Was already in sorted order - didn't go into inner loop After 1st Pass: 2-16-19-23-33-4-27-1 Sorted: 2-16-19-23 Unsorted: 33-4-27-1

Shell Sort

Best: O(n^7/6) Avg: O(n^5/4) Worst: O(n^3/2) Stable? No In Place? Yes How it works: Uses Insertion sort to repeatedly sort small sections of the array. These small sections span entire array and are comprised of every nth element. The distance between elements in these logical arrays shrink as the sort progresses. This has the effect of quickly moving elements into a position close to where they will eventually be in the final sorted array. Characteristics: Variant of Insertion Sort capitalizing on theist case behavior of that alg. This all attempts to preprocess the array to get the elements in almost sorted order, then use insertion sort because of the O(n) behavior. Performance acceptable even for large arrays. Example: 25-72-16-8-34-99-17-26-53-62-87-55-40-82-14-29-70 Process: 1. Calculate h value to get the gap between bands: while(h< size/3) h=h*3+1 Gaps: 1,4,13,40 Size: 17, 17/3=5 while 4<5 -> start with 13, which only gives 1 element.. so go back to 4 2. Every 4th element: 8,26,55,29 Sort: 8,26,29,55 3. Now gaps move 1 to the right: 34,53,40,70 4. After gap reaches end, now increments of 1 -> use insertion sort. After 1st Pass: 25-72-16-8*-34-99-17-26*-53-62-87-29*-40-82-14-55*-70 After 2nd Pass: 25-72-16-8-34*-99-17-26-40*-62-87-29-53*-82-14-55-70*

Enqueue & Dequeue (syntax)

Both O(1) operation. Front and Rear incremented as you enqueue and dequeue, and both wrap around. Enqueue: insert at the end, Dequeue: remove from the front. public class Queue<E> { private int maxSize; private int currentSize; private Object [] storage; private int front, rear; public Queue(int size) { maxSize = size; currentSize = 0; storage = (E[]) new Object[maxSize]; front = rear = 0; } public void enqueue(E obj) { //have to check if its full -> throw new RTE if(isFull()) throw new RuntimeException( "ERROR, attempt to insert in full queue"); if(++rear == maxSize) rear = 0; storage[rear] = obj; currentSize++; } public E dequeue() { if(isEmpty()) return null; if(++front == maxSize) front = 0; currentSize--; return (E) storage[front]; } public boolean isFull() { return currentSize == maxSize; } public boolean isEmpty() { return currentSize == 0; }

Delete (HashTable)

Check if its empty, and have case for if it doesn't find it. public boolean delete(K key) { if(isEmpty()) { return false; } if(list[getIndex(key)].remove(new DictionaryNode<K,V>(key,null)) == null) //calling the LL remove method, if it returns null, then the HT //does not contain the key return false; //will remove key if !null modCounter++; currentSize--; return true; }

Dictionaries

Dictionaries store key=value pairs. The value is a record which may contain data fields. The key is a unique identifier used to add, edit, delete, or find this record. This type Abstract Data Type is also referred to as a map, or associative memory. Java API refers to Dictionary ADT's as maps. You may not have duplicate keys but you may have duplicate values. The Dictionary ADT does not specify how they key=value pairs are ordered, only that the dictionary must be organized so that search, insertion, and deletion operations are done based on the key. Generally have 2 iterators, one for the keys and other for the values, iterators normally return data sorted by the key. LL are not used because the search insertion and deletion times are O(n).

Clustering

Happens with linear probing, when we transform the keys to indices, they tend to cluster - generated indices tend to be close together. primary clustering is one of two major failure modes of open addressing based hash tables, especially those using linear probing. It occurs after a hash collision causes two of the records in the hash table to hash to the same position, and causes one of the records to be moved to the next location in its probe sequence Secondary clustering, occurs more generally with open addressing modes including linear probing and quadratic probing in which the probe sequence is independent of the key, as well as in hash chaining. In this phenomenon, a low-quality hash function may cause many keys to hash to the same location, after which they all follow the same probe sequence or are placed in the same hash chain as each other, causing them to have slow access times.[1]

Stack

LIFO behavior, pop operation will always remove and return most recently inserted element, like a stack of books. Most important application for stacks is subroutines, you can call any number of subroutines in any order. *Stack is often used for backtracking. push and pop should always be done at the end of the array, never the beginning to avoid shifting!! Both has O(1) behavior, unless you push and pop from storage[0] which would make it O(n) behavior because shifting would be necessary.

Probing

Linear Probing: if we want to store data at index i, and index i is not available because there is already data there, then we try i+1,i+2...i+n-1 till we find an open spot. Will always work as long as hash table is not full. In the worse case when the table is almost full, we might have to traverse entire array to find available slot. Which would mean O(1) operation would downgrade to O(n) op. This is why tables must be 30% larger than the max capacity. LINEAR PROBING = OPEN ADDRESSING = CLOSED HASHING Quadratic probing: rather than sequentially moving down the array to find an available slot, you move down the array in a sequence i+1, i+2^2, i+3^2... until an available slot is found. Secondary clustering can still occur, and table size must be a prime number so that when we reach the end of the table and wrap around to the beginning, we land on a different sequence of slots each time. There are 3 states for each array index: occupied, empty and null, empty and not null (index has held data but it was deleted)

Binary Search Trees

Linked structures that are designed to address some of the drawbacks of LL. We can take advantage of ordering to speed up search insertion and deletion. On avg: all O(logn) Gives an ordered data structure that is faster than arrays or LL for insertion and deletion. It has a root node like a head pointer, with a left child and right child, all data to the left is smaller than the root and all data on the right is larger. Search always begins at the root, compare data with data in the node, if not equal, must go either left or right depending on how the data compares. We move down the tree using temp variables to hold node references like LLs.

Delete (BST)

Most difficult operation because of the number of cases that must be handled. Possible cases: empty tree, only 1 node in the tree - so root must be set to null, tree doesn't contain the node, node has 0,1,2 children, node to be deleted could be the root and new node must be designated as the root Appropriate references must be set. public boolean delete(K key) { //lots of cases // always delete from the parent if(! delete(key,root,null,false)) return false; currentSize--; modCounter++; return true; } private boolean delete(K key, Node<K,V> n, Node<K,V> parent, boolean wentLeft) { if(n == null) return false; if(((Comparable<K>)key).compareTo(n.key) < 0) //moves left return delete(key, n.left, n, true); else if(((Comparable<K>)key).compareTo(n.key) > 0) //moves right //use variable and assign it rather than doing this twice return delete(key,n.right,n,false); else { if(n.left == null && n.right == null) { //no children if(parent == null) root = null; else if(wentLeft) parent.left = null; else parent.right = null; }else if(n.left == null) { //1 child - right if(parent == null) root = n.right; else if(wentLeft) parent.left = n.right; else parent.right = n.right; }else if(n.right == null) { //1 child - left if(parent == null) root = n.left; else if(wentLeft) parent.left = n.left; else parent.right = n.left; }else { //2 children Node<K,V> Successor = getSuccessor(n.right); if(parent == null) { if(Successor != null) { Successor.right = n.right; Successor.left = n.left; root = Successor; } else{ Successor.left = root.right; root = n.right; } }else if(wentLeft) { if(Successor != null) { Successor.right = n.right; Successor.left = n.left; parent.left = Successor; } else {parent.left = n.right; parent.left.left = n.left; } } else { if(Successor != null) { Successor.right = n.right; Successor.left = n.left; parent.right = Successor; } else{ parent.right = n.right; parent.right.left = n.left; } } } return true; } }

Insert (BST)

New elements are always inserted at a leaf node. public boolean add(K key, V value) { Node<K,V> newNode = new Node<K,V>(key,value); if(root == null) //if empty tree root = newNode; else if(!insert(key,value,root,null,false)) return false; currentSize++; modCounter++; return true; } private boolean insert(K key, V value, Node<K,V> n, Node<K,V> parent, boolean wasLeft) { if(n == null) { //only has a root node if(wasLeft) parent.left = new Node<K,V>(key,value); else parent.right = new Node<K,V>(key,value); return true; } else if(((Comparable<K>)key).compareTo(n.key) == 0) return false; else if(((Comparable<K>)key).compareTo((K)n.key) < 0) return insert(key,value,n.left,n,true); else return insert(key,value,n.right,n,false); }

GetValue (HashTable)

O(1) public V getValue(K key) { DictionaryNode<K,V> tmp = new DictionaryNode<K,V>(key,null); //make new node DictionaryNode<K,V>returnValue = list[getIndex(key)].find(tmp); //make new node that returns the value (calling find fromLL) if(returnValue == null) return null; return returnValue.value; }

HashTables

Often used for Dictionary implementation because they provide outstanding performance. They provide the fastest access of any data structure, with O(1) times for insert, delete, and search operations. Often chosen of search operations on very large data sets because if you know where something is, retrieval is instantaneous. Always Array-based structures.

InsertFirst ( Unordered LL)

Order doesn't matter, so you can always insert at the head. Create a new node, then insert it by changing the references in the in the list. Also have to check if list is empty, because then head will become the new Node. Possible Cases: list empty public void insertFirst(E data) { Node<E> newNode = new Node<E>(data); if(head==null) head == newNode; else { newNode.next = head; head = newNode; } }

Insert/Delete Complexity

Ordered Array: insert = n remove = 1 Unordered Array: insert = 1 remove = n Ordered LL: insert = n remove = 1 Unordered LL: insert = 1 remove = n

Push & Pop

Pop: O(1) complexity. will always remove and return most recently inserted element. Push: O(1) complexity. Array: Push and pop should always be done on the END of the array when implanting an array, to avoid shifting, which would make them O(1). LL: Push and pop should always be done at the head of the list, O(1) because there will be no need to traverse the list.

Insert (Ordered LL)

Possible Cases: list empty, inserting at the head, inserting at the end, inserting in the middle Conditions: head==null, previous==null, current==null, previous!=null && current!= null *Be sure to compare the data to insert with the data in a node, not the node itself Must make a while loop that allows us to traverse the list to find the insertion point. We must avoid dereferencing the next field in the last node in the list (because it is always null in the last node) Must set previous to null because previous must be node before current, and current must always start as the head. public void insert(E data) { Node<E> newNode = new Node<E>(data); //node to insert Node<E> previous = null, current = head; while(current != null && ((Comparable<E>)data).compareTo(current.data) > 0) { //while you haven't reached the end and the data to insert //is greater than the data in current previous = current; current = current.next; //traverse down the list } if(previous==null) { //it goes at the head newNode .next = head; head = newNode; //must remember to set new head else { //or else it goes on the end, newNode is pointing to null previous.next = newNode; newNode.next = current; } }

Delete (LL)

Possible cases: empty list, only has 1 node, deleting from the front, deleting from the end, deleting from the middle, list doesn't contain the element ORDER IS IMPORTANT! public boolean remove(E obj) { //OBJECT Node<E> previous = null, current = head; while(current != null && ((Comparable<E>)obj).compareTo(current.data) != 0) { //while the obj is not = the data in current node previous = current; current = current.next; //traverse down the list } if(current == null) //empty list or couldn't find obj return false; if(current == head) //need to remove first node head = head.next; else if(current == tail) //need to remove last node previous.next = null; tail = previous; //if it has a tail, need to reset tail if(head == null) //if we have just removed the only node in the list, make //sure to reset tail tail = null; return true; } // removes the given key from the list if it exists, otherwise does nothing public void delete(int key) { //INTEGER Node previous = null, current = head; while(current != null && current.data != key) { previous = current; current = current.next; } if(current == null) return; // not found or empty list if(previous == null) head = head.next; // delete from first position else previous.next = current.next; }

Complexity Example O(n)

Public int findIndexInArray(int key){ For(int i=0; i<array.length; i++) If(array[i] == key) Return I; Return -1; // array.length = n because input size is the size of the array

Complexity Example O(nlogn)

Public void doSomethingElse(int n). { Int j; For(int i=0; i<n; i++). { J=n; While(j>0) S.O.P.() J = j/2; The inner loop does the same number of iterations regardless of the value of the index of the outer loop. The inner loop runs logn+1 times. Since the outer loop runs n times. F(n) = n* (logn+1) = O(nlogn)

Sorting Table

Sort Best Avg Worst Stable In Place Bubble O(n^2) O(n^2) O(n^2) Yes Yes Selection O(n^2) O(n^2) O(n^2) No Yes Insertion O(n) O(n^2) O(n^2) Yes Yes Shell O(n^7/6) O(n^5/4) O(n^3/2) No Yes Quick O(nlogn) O(nlogn) O(n^2) No Yes Merge O(nlogn) O(nlogn) O(nlogn) Yes No Radix O(n) O(n) O(n) Yes No Heap O(nlogn) O(nlogn) O(nlogn) No Yes

Stable/In Place

Stable: The ordering of duplicates is preserved, the one that occurs first in the unsorted array also occurs first in the sorted array. In Place: The algorithm does not require extra storage IN TERMS OF THE INPUT SIZE N.

Some circular queue implementations use the mod operator % in enqueue and dequeue operations. Explain why this is inefficient.

The MOD operator is hardware division. Hardware multiplication and division are usually the most expensive (in terms of time) operations on a computer. An 'if' test just performs a simple subtraction, which is one of the least expensive operations

Of all the sorting algorithms we have studied, bubble sort is consistently the worst performer. What makes it the worst of the O(n2) algorithms?

The excessive number of swaps makes Bubble Sort the worst performer.

Why can't a random number generator be used in a hash function?

The hash code must be repeatable. Every time you call hashCode() on an object, it must return the same value.

What is a Priority Queue, and how does it differ from a standard queue?

The order of insertion is a secondary key in the ordering. The primary key is the priority.

The standard quick sort algorithm is O(n2) in the worst case. What is the worst case? What modifications can be made to the algorithm to provide better behavior in this case?

The worst case is an algorithm that is in sorted or reverse sorted order. In these cases, the partition size is 1 and n-1, which means n*n = O(n2). In the worst case, standard quick sort usually runs out of stack space and crashes. The reason for this is because if the array is in already sorted order, the pivot will be the largest element in the section of the array being processed. During the partitioning phase, you will end up with one partition holding one element, and the other partition holding section size-1 elements. When processing the array recursively, you will end up partitioning the array n-1 times. Stratagies for dealing with this problem vary, but always involve a strategy to insure that the largest (or smallest) element in the section is not chosen for the pivot. Any strategy must be efficient--doing a lot of processing to determine the pivot will slow the algorithm down, which is not desirable. A useful strategy is to swap the element at the right end with the one in the middle. This does not guarantee that you will avoid worst case behavior, but it does handle the case of sorted (or reverse sorted) arrays quickly.

Why is it not practical to run timing tests on algorithms?

There other factors impacting the time we observe, modern computers and operating systems are multitasking, these processes consume some CPU resources, thus timing and everything else computers doing during timing tests.

Hashtables are generally made somewhat larger than the maximum number of elements to be inserted. Why?

This is done to minimize collisions. We use more space than is needed to optimize performance.

Hashing/Hashcode

Transformation from key to array index, usually a String into a 32 bit integer. We don't know how large the hash tables array will be in advance, so we MOD the array size at runtime. 4 characteristics of a good hash function: 1. The alg distributes keys across the entire range of 32 bit integers without bias. 2. the alg is consistent. 3. the alg is efficient 4. Inputs that are equal as defined by the compareTo method must return the same hash code. There is a hashCode() method in class Object, but you should override this to obtain best performance. The code for hashing goes in THE OBJECT THATS STORED in the Dictionary ADT. Several Methods: 1. Addition 2. Multiply 3. Folding 4. Exponentiation 5. Shifting 6. Java native

Contains (LL, from midterm) -----

We must write a loop that will scan down the list checking for the presence of the object, using a tmp variable. We do not create a new node, the variable contains only an address and we initialize tmp to head. ???????????

List Traversal

We traverse a LL by dereferencing the next field in each node to get the one that follow. Dereferencing is done with the dot operator. Must start at head pointer, and traverse the list by using 1 or 2 temporary pointers.

Collisions

When 2 or more keys hash to the same array index. This is because we are mapping a key space to a much smaller index space. To cope with collisions: we must design our hash tables so that it accommodates duplicates, and we need to hash keys to indices to minimize collisions. The 2 ways are probing and chaining.

For each scenario given below, indicate which sorting algorithm you would choose, and why. The machine has sufficient memory to hold approximate 10,000,000 array elements. You know that the array to be sorted cannot have more than 10 elements. A number of arrays have to be sorted. Most are quite small (n < 100), but a few have as many as 1000 elements. The array to be sorted has 8,000,000 elements, and nothing is known about the initial arrangement of the elements. The array to be sorted has 1,000,000 elements. The array is in almost perfect non-ascending order (only a few elements are out of place). You must sort it in ascending order. There are no duplicate keys in the array. You must sort an array of 50,000 elements. The array has many duplicate keys and you must preserve the ordering of duplicate keys. An array of 8,000,000 elements contains only integer keys in the range 1..50,000 inclusive. You need not preserve ordering of duplicates. Your algorithm must run in O(n) time. You must write a sort routine that will be included in a library that other programmers will use. You have no information about the size or ordering of arrays that your routine will be called to sort.

You know that the array to be sorted cannot have more than 10 elements. It doesn't really matter, though avoid Bubble Sort. A number of arrays have to be sorted. Most are quite small (n < 100), but a few have as many as 1000 elements. Insertion Sort The array to be sorted has 8,000,000 elements, and nothing is known about the initial arrangement of the elements. Modified quick sort, or heap sort. The array to be sorted has 1,000,000 elements. The array is in almost perfect non-ascending order (only a few elements are out of place). You must sort it in ascending order. There are no duplicate keys in the array. Reverse the array in a loop,which is O(n) and then use InsertionSort, almost O(n) in this case. You must sort an array of 50,000 elements. The array has many duplicate keys and you must preserve the ordering of duplicate keys. Merge Sort. An array of 8,000,000 elements contains only integer keys in the range 1..50,000 inclusive. You need not preserve ordering of duplicates. Your algorithm must run in O(n) time. This is a bit tricky. Make an array of size 50,000. Then loop through the array and count the number of times each integer occurs. Rewrite the original array using the counts in your auxiliary array. n + n = O(n). You must write a sort routine that will be included in a library that other programmers will use. You have no information about the size or ordering of arrays that your routine will be called to sort. No mention is made of the need for stability. It is always a good idea to use stable algorithms for default implementations. The only fast algorithm that is stable is merge sort. If stability is unimportant, the modified quick sort is the fastest algorithm.

Insertion sort can be improved by using binary search to find the next insertion point. However, this does not change the overall complexity of the algorithm. Why?

You must still shift elements to insert. This is O(n). The shifting dominates. This is O(log n) + O(n) = O(n). We do this for each element in the array, and thus, with binary search the cost is ( O(log n) + O(n) ) * O(n) = O(n^2).

How to make the leading bit of the hashcode to be zero, making the number positive.

int index = (entry.hashCode() & 0x7FFFFFFF) % tableSize;

InsertAfter (LL)

public void insertAfter(int key1, int key2) { Node newNode = new Node(key1); Node where=null, previous=null, current=head; while(current != null) { if(current.data == key2) where = current; previous = current; current = current.next; } // end while if(previous == null) // empty list head = newNode; else if(where == null) // no key2 in list, insert last previous.next = newNode; //because current is null else { newNode.next = where.next; where.next = newNode; } }

InsertLast (LL)

public void insertLast(int key) { Node newNode = new Node(key); if(head == null) { // test so we can use current.next in the while loop head = newNode; return; } Node current = head; while(current.next != null) current = current.next; //when current.next is null, reached the last node current.next = newNode; }

Big Omega (Ω)

Ω(g) is the set of functions f such that for some constant c>0 and some n0: f(n) >= c g(n) for all n>n0 F=Ω(n) : f grows at a rate equal to or greater than n. It may never grow slower than n, but may be equal or faster. THIS ALGORITHM IS N OR SLOWER.

Little Omega(ω)

ω(g) is the set of functions f such that for some constant c>0 and some n0: f(n) > c g(n) for all n>n0 F=ω(n) : f grows at a rate strictly greater than n. It may not be equal to n. THIS ALGORITHM IS SLOWER THAN N.


Related study sets

ASTR 209- Ch. 22: Neutron stars and Black holes

View Set

Lesson 10-Pain Management During Childbirth--TEST 2

View Set

Chapter 9: Chronic Illness and Disability

View Set

Quantitative Analysis Final Exam

View Set

Chapter 7 Thinking Intelligence and Language

View Set