Algorithms Midterm 2
Applications of Sorting: Searching
Once a set is sorted it makes searching the set extremely fast. (i.e. Binary Search)
Applications of Sorting: Closest Pair
Once a set is sorted the closest pair of numbers will be next to each other so the closest pair can be found in linear time.
Merge Sort
A nice recursive approach to sorting involves partitioning the elements into two groups, sorting each of the smaller groups recursively, and then interleaving the two sorted lists to totally order the elements.
Pragmatics of Sorting: Non-Numerical Values
Alphabetizing is the sorting of strings and libraries have very complete and complicated rules concerning the collating sequence of characters and punctuation.
Quicksort
Fastest internal sorting algorithm. uses partitioning about a pivot as the main idea. The pivot can be a specific element or randomly selected.
Pointer Based Dictionaries: Doubly Linked Lists
- We gain extra flexibility on predecessor queries at a cost of doubling the number of pointers by using doubly-linked lists. - The extra big-Oh costs of doubly-linked lists is zero.
Importance of Sorting
1. Computers spend more time on sorting than anything else. 2. Sorting is the best studied problem in CS. 3. Most of the interesting ideas in algorithms can be taught in the context of sorting 4. Once a set of items is sorted many other problems become easy.
Advantages of Contiguous Arrays
1. Constant-time access 2. Purely consists of data, no space wasted on linking 3. Physical continuity between successive data accesses helps exploit high-speed cache memory
Deletion from a Binary Search Tree
1. First, search for the key z to be deleted from the BST. 2. Three cases: 2.1 z has no children: just remove z (and you're done). 2.2 z has one child: splice z out (and you're done). 2.3 z has two children. (Bit harder) 2.3.1 Let y be z's successor. 2.3.2 Replace z's contents with y's contents. 2.3.3 Splice y out (y has at most one child. because it is the furthest left leaf node in z's subtree).
Advantages of Linked List
1. Overflow cannot occur unless memory is actually full 2. Insertion and deletion is simpler than for contiguous lists (arrays) 3. With large records it is easier and faster to move pointers than moving the items themselves
What are the two aspects of any data structure?
1. The abstract operations it supports 2. The implementation of those operations
Dictionary Implementations
1. Unsorted Array 2. Sorted Array 3. Singly-Linked Unsorted List 4. Doubly-Linked Unsorted List 5. Singly-Linked Sorted List 6. Doubly-Linked Sorted List 7. Binary Search Tree 8. Balanced Binary Search Tree 9. Hash Tables
Red-Black Binary Search Trees
A BST is a red-black tree if it satisfies the following properties. (1) Every node is either red or black. (2) If a node is red, then both of its children are black. (3) Every path from a node to a leaf contains the same number of black nodes. (4) The root is always black.
Binary Heap Definition
A binary tree with a key in each node such that: 1. All leaves are on, at most, two adjacent levels. 2. All leaves on the lowest level occur to the left, and all levels except the lowest one are completely filled. 3. The key in any node is ≤ keys in its children 1 and 2 are for the shape of the tree while 3 is for the labeling. A heap maintains a partial order on the set of elements which is weaker than the sorted order and more efficient to maintain.
Container
A data Structure that permits storage and retrieval of items independent of content. EX. Stacks and Queues
Queue
A first-in-first-out (FIFO) abstract data type.
Splay Trees
A form of self organizing tree that uses rotations to move any accessed key to the root of the tree. This allows for faster searches of frequently used or recently accessed nodes.
Max Heap
A heap where a node dominates its children by having a larger key than they do.
Min Heap
A heap where a node dominates its children by having a smaller key than they do.
Binary Heap: Height
A heap with n elements has a height h = ⌊lg n⌋.
Stack
A last-in-first-out (LIFO) abstract data structure.
Applications of Sorting: Element Uniqueness
A linear scan over a sorted list can easily find duplicates
Hash Function
A mathematical function that maps keys to integers. Ideally it is cheap to evaluate, has an equal likelihood to hash into any of the m slots, and does so independently of other elements.
Skip Lists
A somewhat cult data structure. It is constructed of a hierarchy of sorted linked lists
Radix Sort
A stable sorting algorithm - Assume that the input is provided as a linked list. - Assume that each integer has d digits. - There will be d passes starting with the least significant digit. - Takes O(d(n+r)) time where r is the number of different elements that could exist (i.e. Base-10 r = 10), and d is the number of passes.
Randomized Quicksort
A version of quicksort that simply always randomly selects its pivot point or randomizes the array before sorting. This guarantees that we run in Θ(n lg n) time.
Decision Trees
Any comparison-based sorting program can be thought of as defining a decision tree of possible executions. Running the same program twice on different permutations of the data causes a different sequence of comparisons to be made on each.
Pragmatics of Sorting: Library Functions
Any normal programming language has a built-in sort routine as a Library Function. You are almost always better off using the the system sort than writing your own
Dynamic Arrays
Arrays that start with a size of 1 and double their size each time we run out of space.
Successor/Predecessor for Binary Search Trees
Assume x is the current node. Successor is the smallest element > x. Predecessor is the largest element < x. Complexity is O(h) Successor in BST: If x has a right subtree the successor is the smallest element in that tree, otherwise it is the nearest ancestor a of x such that x is in a's left subtree Predecessor in BST: If x has a left subtree the successor is the largest element in that tree, otherwise it is the nearest ancestor a of x such that x is in a's right subtree
Pragmatics of Sorting: Comparison Functions
Comparison functions explicitly control the order of keys for each pair of elements.
Build Heap
Converts an arbitrary array into a heap. Simple analysis leads us to believe it takes O(n log n) time, but it actually takes O(n) time.
Priority Queues
Data structures that provide extra flexibility over sorting. This is important because jobs often enter a system at arbitrary times and it is more cost effective to add a new job to a priority queue than to resort each time.
Binary Heaps: Partial Orders
Defined by the ancestor relation in a heap. 1. Reflexive: x is an ancestor of itself 2. Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x = y 3. Transitive: if x is an ancestor of y and y is an ancestor of z, then x is an ancestor of z Can be used to model hierarchies with incomplete information or equal valued elements
Comparison Sorts and Decision Trees
Different permutations of elements require different steps to sort which means there must be at least n! different paths from root to leaves of the decision tree. Because we have at least n! paths the tree must have height h such that 2^h ≥ n!, h ≥ lg(n!) = Θ(n lg n) h = worst-case complexity of sorter.
Divide and Conquer
Divide the problem into smaller subproblems, solve each subbproblem recursively then meld the partial solutions into one solution for the whole problem. When the merging takes less time than the solving of the subproblems it makes an efficient algorithm.
Insertion into a Binary Search Tree
Do a binary search of the tree to find where the node should be inserted and replace the terminating NIL pointer with the new item. You need to maintain a trailing pointer so that you can insert the new node as a child of the trailing pointer. Takes O(h) time.
Deque
Double Ended Queue that supports both FIFO and LIFO operations.
Collision Resolution by Chaining
Easiest approach to resolve collisions where each element in the hash table is a pointer to a list of keys. This makes insertion, deletion, and search into linked list operations. If there are n keys in a size m table then each operation takes O(n/m) time. Takes up a lot of memory.
Pragmatics of Sorting: Equal Elements
Elements with equal values will all bunch together in any total order, but sometimes the relative order matters. Sorting algorithms that maintain the relative order of equal elements are called stable sorting algorithms. Stability can be achieved by adding the initial position as a secondary key.
Heapify
Given two heaps and a fresh element, they can be merged into one by making the new one the root and bubbling down. This takes O(log n) time which is the height of the heap.
Quicksort: Best Case
Happens when we partition as evenly as possible this makes the total effort of partitioning O(n).
Binary Heap: Insertion Construction
Heaps can be created by incremental insertion which takes Θ(n log n) time.
Quicksort vs Heapsort
Heapsort is Θ(n lg n) while selection sort is Θ(n^2) which makes it obvious heapsort is better, but quicksort when implemented well is typically 2-3 times faster than heapsort. This problem lies outside the realm of asymptotic analysis.
Balanced Search Trees
High-maintenance trees whose height is O(lg n), making all dictionary operations take O(lg n) time in the worst case. Must be rebalanced every time a key is deleted or inserted.
Quicksort: Average Case
If we pick the pivot at random then half the time the pivot will be within the first and third quartile of the array which ends up giving us an average case of O(n log n) time.
Applications of Priority Queues: Discrete Event Simulations
In simulations of airports, parking lots, and computer networks priority queues can be used to maintain who goes next. Stacks and queues are ideal cases of orderings.
Array Based Dictionaries: Sorted Arrays
In this context, pointer = index Search(S,k) - binary search, O(lg n) Insert(S,x) - search, then move to make space, O(n) Delete(S,x) - move to fill up the hole, O(n) Min(S), Max(S) - first or last element, Θ(1) Successor(S,x), Predecessor(S,x) - Add or subtract 1 from pointer, Θ(1)
Array Based Dictionaries: Unsorted Arrays
In this context, pointer = index Search(S,k) - sequential search, O(n) Insert(S,x) - place in first empty spot (like dynamic array), Θ(1) Delete(S,x) - copy nth item to the xth spot, Θ(1) Min(S), Max(S) - sequential search, O(n) Successor(S,x), Predecessor(S,x) - sequential search, O(n)
Red-Black Binary Search Trees Insert and Delete
Insert and Delete will need to be modified so that they don't violate red-black tree properties. Do this using rotations and color changes.
(Max) Priority Queue Operations
Insert(Q,x): Given an item x with key k, insert it into priority queue Q. Find-Maximum(Q): Return a pointer to item with max key value in priority queue Q. Delete-Maximum(Q): Remove the item from priority queue Q whose key is maximum. Each of these operations can be easily supported using heaps or balanced binary trees in O(log n).
Dynamic Array Management Time
It takes O(n) time to manage a dynamic array.
Merging Sorted Lists
Merge Sort efficiency depends on how efficiently the two sorted halves are combined into a single list. The smallest element can be removed leaving behind two sorted lists that are smaller than before. If we repeat this we get O(n) comparisons on O(log n) things making merge sort O(n log n)
Buffering
Merge sort is inconvenient to implement on a single array so instead we use an extra buffer array to write the merge to before recopying it to the original array. This uses extra space but not extra time.
Dictionary Auxiliary Operations
Min(S), Max(S) - Returns the element of the totally ordered set S which has the smallest (largest) key. Successor(S,x), Predecessor(S,x) - Given an element x whose key is from a totally ordered set S, returns the next largest (smallest) element in S, or NIL if x is the maximum (minimum) element.
Binary Heaps: Can we Implicitly Represent Any Binary Tree?
No, because all missing internal nodes still take up space in the heap structure making implicit representation inefficient with sparse trees as the number of nodes is less than 2^h. This is why heaps should be balanced/full at each level possible.
Array-Based Heaps
Normally pointers are used to represent heaps, however we can use an array of keys to implicitly satisfy the role of pointers.
Applications of Sorting: Mode
Once sorted a linear scan can be used to count the length of all adjacent runs. In addition the number of instances of k in a sorted array can be found in O(log n) time by using a binary search to fin the positions of both k-ε and k+ε.
Applications of Sorting: Median and Selection
Once the keys are in a sorted order the smallest, largest, and median element can be easily selected using the key location.
Applications of Sorting: Convex Hulls
Once you have the points sorted by X coordinate they can be inserted from left to right into the hull as the rightmost point is always a boundary.
Binary Heaps: Why Heaps?
Partial order is weaker than total order making it easier to build, but less useful than sorting (though still quite important)
Quicksort: Partitioning
Places all elements less than the pivot to the left part of the array and all elements greater than the pivot to the right part of the array. The pivot sits in the middle. Takes O(n) time.
Inorder Traversal of a Binary Search Tree
Prints all keys in sorted order
Binary Heap: Insertion
Put new element in (n+1)st location in the array then "Bubble" it up to the correct place.
Randomization
Randomization is a good tool to improve algorithms with bad worst-case but good average-case complexity as the chance of getting the worst-case scenario goes down after randomizing the data.
Binary Search Tree Formal Definition
Rooted Binary Tree - Let x be any node in the BST - If y is a node in x's left subtree, then y → key ≤ x → key. - If y is a node in x's right subtree, then y → key ≥ x → key.
Binary Search Tree (BST)
Rooted Binary Tree that uses pointers to nodes containing a left and right (child) pointer, key field, and parent pointer (optional as parents can be stored on a stack on the way down)
Dictionary Primary Operations
Search(S,k) - Given a set S and a key value k, returns a pointer x to an element in S whose key is k if it exists Insert(S,x) - Add x to set S. Delete(S,x) - Given a pointer x to an element in the set S, remove x from S. (Observe we are given a pointer, not a key value).
Linked List Structures
Search: walk through the list until you find x. O(n) Insert: insert at front Θ(1) Delete: first search for x, then delete O(n)
Selection Sort: Data Structure Differences
Selection Sort takes: O(n(T(A) + T(B)) time Using arrays or unsorted linked lists causes A to take O(n) time and B to take O(1) time which makes an O(n^2) selection sort. Using Balanced Search Trees or heaps causes both A and B to take O(log n) time resulting in an O(n log n) selection sort.
Selection Sort
Selection sort scans the entire array, repeatedly finding the smallest remaining element. Takes O(n(T(A) + T(B)) time.
Dictionary/Dynamic Set
Set of items indexed by keys
Recursive Algorithm Analysis
Set up a recurrence relation, then solve the recurrence relation.
NIL
Similar to null, but only for pointers.
Sorting Algorithms
Sorting is the basic building block that many other algorithms are built around. By understanding sorting, we obtain an amazing amount of power to solve other problems.
What are the Elementary Data Structures?
Stacks, Queues, Lists, and Heaps
Min/Max of a Binary Search Tree
The minimum element is the furthest left leaf of the BST. The maximum element is the furthest right leaf of the BST.
Quicksort: Why Partitioning?
The pivot ends up in the correct place in the total order, no element flops to the other side of the pivot in the final sorted order, as a result it gives us a recursive sorting algorithm.
Rotation Operations
There are two types of rotations: left rotations and right rotations
Array-Based Heaps: Left Child Location
This child of k sits in the position 2k
Array-Based Heaps: Right Child Location
This child of k sits in the position 2k + 1
Array-Based Heaps: Parent Location
This node is located ⌊k/2⌋ in relation to the node k
Quicksort: Worst Case
This occurs when the pivot splits the array as unequally as possible. which ends up giving a worst case time of Θ(n^2) from the recurrence relation T(n) = T(n-1) + n
Collisions
Two distinct keys hashing to the same value in a hash table.
Quicksort: Picking a better Pivot
Use the middle element of the subarray or take the median of three elements (first, last, middle) as the pivot. The worst case remains to be O(n^2).
Heapsort
Uses a max heap. Exchanging the max element with the last element and calling heapify repeatedly gives an O(n log n) sorting algorithm.
Collision Resolution by Open Addressing
Uses an implicit reference derived from a simple function. If the space we want to use is filled we can examine the remaining locations: 1. Sequentially h, h + 1, h + 2,... 2. Linearly h, h + k, h + 2k, h + 3k,... 3. Quadratically h, h + 1^2, h + 2^2, h + 3^2,... More complicated schemes are to avoid runs of similar keys. This makes deletion very ugly
Hash Tables
Very practical way to maintain a dictionary based on the idea that item lookup in an array is constant time if you know the index.
Pointer Based Dictionaries
We can maintain a dictionary in either a singly or doubly linked list.
Bubble Down
When a node is inserted into a binary heap it has to swap locations with its parent until it dominates its children.
External Sorting
When data gets too large to sort in the memory it has to be sorted on the disk. The disk benefits from algorithms that read and write data in long streams as opposed to random access. The best External Sorting algorithm is Merge Sort
Quicksort: When does the Worst Case occur?
When the array is sorted or nearly sorted.
Hash Table Based Dictionaries: Complexities
With either chaining or open addressing: Search - O(1) expected, O(n) worst case Insert - O(1) expected, O(n) worst case Delete - O(1) expected, O(n) worst case Min, Max and Predecessor, Successor Θ(n + m) expected and worst case Pragmatically, a hash table is often the best data structure to maintain a dictionary.
Searching a Binary Search Tree
Works because both the left and right subtrees of a BST are also BSTs - recursive structure, recursive algorithm Takes time proportional to the height of the tree, O(h)
singly linked list
a data structure in which each list element contains a pointer to the next list element. Search is done iteratively or recursively. Insertion is O(n) if the list is sorted or Θ(1) if unsorted as you just add the pointer to the beginning of the list. Deletion is trickier as you have to find the predecessor of the node you wish to delete this is done recursively and takes O(n) time
doubly linked list
a linked list in which each element has both forward and backward pointers. Search is done iteratively or recursively. Insertion is O(n) if the list is sorted or Θ(1) if unsorted as you just add the pointer to the beginning of the list. Deletion is easy as the predecessor is already connected to the node to be deleted Θ(1)
Arrays
a structure of fixed-size data records such that each element can be efficiently located by its index
Linked Data Structures
composed of multiple distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.
Contiguous Data Structures
composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.
Binary Search Tree Dictionaries: Complexities
h denotes the height of the tree and is Θ(log n) on the average and Θ(n) in the worst case.
Pointers
represent the address of a location in memory.
Data Structures
techniques for organizing and storing data in computer memory
Minimum Sorting Time
Ω(n log n)