Order Notation
Big Oh Notation
- Big Oh simplifies analysis of algorithms by ignoring levels of detail that do not impact our comparison of algorithms -The Big Oh ignores the difference between multiplicative constants - f(n) = O(g(n)) means c*g(n) is an upper bound on f(n). There exists some constant c such that f(n) is always <= c*g(n) for large enough n (n >= n0 for some constant n0) - f(n) = Omega(g(n)) means c*g(n) is a lower bound on f(n). There exists some constant c such that f(n) is always >= c*g(n) for all n >= n0 - f(n) = Theta(g(n)) means c1 * g(n) is an upper bound on f(n) and c2*g(n) is a lower bound on f(n) for all n >= n0. Thus there exists constants c1 and c2 such that f(n) <= c1*g(n) and f(n) >= n2*g(n). This means that g(n) provides a tight bound on f(n)
Binary Search Trees
- Binary search requires that we have fast access to two elements -- specifically the median elements above and below the given node. To combine these ideas, we need a linked list with two pointers per node. This is the basic idea behind binary search trees - A rooted binary search tree is recursively defined as either being 1) empty or 2) Consisting of a node called the root, together the two rooted binary trees called the left and right subtrees - A binary search tree labels each node in a binary tree with a single key such that for any node labeled x, all nodes in the left subtree of x have keys < x while all nodes in the right subtree x have keys > x
Finding the Minimum and Maximum elements in a tree
- By definition the smallest key must reside in the left subtree of the root, since all keys in the left subtree have values less than that of the root. - Similarly the maximum element must be the rightmost descendent of the root.
Minimum/Maximum operation of dict w/ unsorted array
- Defined with respect to sorted order and so require linear sweeps to identify in an unsorted array
Linked List
- Each node in the data structure contains one or more data fields that retain the data that we need to store - Each node contains a pointer field to at least one other node (next). This means that much of the space used in linked data structures has to be devoted to pointers, not data - We need a pointer to the head of the structure, so we know where to access it. - The three basic operations supported by lists are searching insertion and deletion. In doubly-linked lists, each node points both to its predecessor and its successor element.
Delete Operation with Hash Table
- Expected Case: O(1) - Worst Case: O(1)
Insert Operation with Hash Table
- Expected Case: O(1) - Worst Case: O(1)
Successor Operation with Hash Table
- Expected Case: O(n + m) - Worst Case: O(n + m)
Search Operation with Hash Table
- Expected Case: O(n) - Worst Case: O(n)
Hashing
- Hash tables are very practical way to maintain a dictionary - They exploit the face that looking an item up in an array takes constant time once you have its index - A hash function is a mathematical function that maps keys to integers. We use the value of our hash function as an index into an array and store our item at that position. - The first step of the hash function is usually to map each key to a big integer - The result is unique identifier numbers, but they are so large they will quickly exceed the number of slots on our hash table; We must reduce this number to an integer between 0 and m-1, by taking the remainder of H(S) mod m.
Insertion operation of dict w/ unsorted array
- Implemented by incrementing n and then copying item x to the nth cell in the array A[n]. The bulk of the array is untouched, so this operation takes constant time
Search operation of dict w/ unsorted array
- Implemented by testing the search key k against each element of an unsorted array. Thus, search takes linear time in the worst case
Data Structures
- Important classes of abstract data types such as containers, dictionaries and priority queues , have many different but functionally equivalent data structures that implement them. - A new implementation of the data type realizes different tradeoffs in the time to execute various operations, so the total performance can improve dramatically - Data structures can be neatly classified as either contiguous or linked depending upon whether they are based on arrays or pointers: 1. Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables 2. Linked data structures are composed of distinct chunks of memory bound together by pointers and include lists, trees and graph adjacency lists
Priority Queue implementation with an unsorted array
- Insert: O(1) - Find-minimum(Q): O(1) - Delete-minimum: O(n)
Priority Queue implementation with a balanced tree
- Insert: O(logn) - Find-minimum(Q): O(1) - Delete-minimum: O(logn)
Priority Queue implementation with an sorted array
- Insert: O(n) - Find-minimum(Q): O(1) - Delete-minimum: O(1)
Advantages of arrays over linked lists
- Linked structures require extra space for storing pointer fields - Linked lists do not allow efficient random access to items - Arrays allow better memory locality and cache performance than random pointer jumping - Dynamic memory allocation provides us with flexibility on how and where we use our limited storage resources
Priority Queues
- Many algorithms process items in a specific order - Data structure that provide more flexibility than simple sorting because they allow new elements to enter a system at arbitrary intervals - More cost-effective to insert a new job into a priority queue than to re-sort everything on each arrival. -Insert(Q,x): Given an item x with key k, insert it into the priority queue Q - Find Minimum(Q) or find maximum(Q): Return a pointer to the item whose key value is smaller (larger) than any other key in the priority queue Q. - Delete Minimum(Q) or delete Maximum(Q): Remove the item from the priority queue Q whose key is minimum
Advantages of Linked Lists over Arrays
- Overflow on linked structures can never occur unless the memory is actually full - Insertions and deletions are simpler than for contiguous lists - With large records, moving pointers is easier and faster than moving the items themselves
Balanced Search Trees
- Random search trees are usually good but if we get unlucky with our order of insertion, we can end up with a linear-height tree in the worst case. - The balanced search tree adjusts the tree a little after each insertion, keeping it close enough to be balanced so the maximum height is logarithmic - The height of the tree is always O(log n) and therefore all operaions are also O(log n)
Deletion from a tree
- Removing a node means appropriately linking its two descendant subtrees back into the tree - If you delete a node with two children you need to relabel this node with the key of its immediate successor in sorted order. This successor must be the smallest value in the right subtree - O(h) because every deletion requires the cost of at most two search operations, each taking O(h) time where h is the height of the tree
Selection Sort
- Repeatedly identifies the smallest remaining unsorted element and puts it at the end of the sorted portion of the array
Search operation of dict w/ sorted array
- Search can be done in O(logn) using binary search, because we know the median element sits in A[n/2]. Since the upper and lower portions of the array are also sorted, the search can continue recursively on the appropriate portion. - The minimum and maximum elements sit in A[1] and A[n] while the predecessor and successor to A[x] are A[x-1] and A[x+1] respectively - Insertion and deletion become more expensive, because making room for a new item or filling a hole may require moving many items arbitrarily
Dictionary Operations runtime w/ sorted Array
- Search: O(logn) - Insert: O(n) - Delete: O(n) - Successor: O(1) - Predecessor: O(1) - Minimum: O(1) - Maximum: O(1)
Dictionary Operations runtime w/ Unsorted Array
- Search: O(n) - Insert: O(1) - Delete: O(1) - Successor: O(n) - Predecessor: O(n) - Minimum: O(n) - Maximum: O(n)
Search dict w/ doubly/singly ll
- Sorting provides less benefit for linked lists than it did for arrays. - Binary search is no longer possible, because we can't access the median element without traversing all the elements before it. - Sorted lists do provide quick termination of unsuccessful searches
Queues
- Support retrieval in first in first out order - you want the container holding jobs to be processed in FIFO order to minimize the maximum time spent waiting. - The average waiting time will be the same regardless of FIFO or LIFO is used. - Queues are somewhat trickier to implement than stacks and thus are most appropriate for applications where the order is important. - Enqueue(x,q): Insert item x at the back of queue q - Dequeue(q): Return (and remove) the front item from queue q
Stacks
- Supports retrieval by last-in, first-out (LIFO) order. - Stacks are easy to implement and very efficient - Stacks are the right container to use when retrieval order doesn't matter at all - Push(x,s): Insert x at the top of stack s - Pop(s): Return (and remove) the top item of stack s
Insertion Sort
- The Big Oh analysis is that worst-case running time follows the largest number of times each nested loop can iterate
Searching in a tree
- The binary search tree labeling uniquely identifies where each key is located. - Start at the root, unless it contains the query key x, proceed either left or right, depending upon whether x occurs before or after the root key. - The algorithm works because both the left and right subtrees of a binary search tree are themselves binary search trees. - The search algorithm runs in O(h) where h denotes the height of the tree
Insertion/Deletion dict w/ doubly/singly ll
- The complication here is deletion from a singly-linked list - The definition of the delete operation states we are given a pointer x to the item to be deleted. But what we really need is a pointer to the element pointing to x in the list, because that is the node that needs to be changed - We spend linear time searching for the predecessor in a singly-linked list. - Doubly-linked lists avoid this problem because we can immediately retrieve the predecessor of x. - Deletion is faster for doubly-linked lists than sorted arrays because splicing out the deleted element from the list is more efficient than filling the hole by moving the array elements
Pointers
- The connections that hold the pieces of linked structures together. - Pointers represent the address of a location in memory. - A variable storing a pointer to a given data item can provide more freedom than storing a copy of the item itself. - A pointer is assumed to give the address in memory where a particular chunk of data is located - Pointers in C have types declared at compiler time, denoting the data type of the items, they point to. - *p is used to denote that the item is pointed to by pointer p, and &x to denote the address of a particular variable x. - A special NULL pointer value is used to denote structure-terminating or unassigned pointers.
Dictionary Operations with singly-linked sorted list
1. Search(L,k): O(n) 2. Insert(L,x): O(n) 3. Delete(L,x): O(n) 4. Successor(L,x): O(1) 5. Predecessor(L,x): O(n) 6. Minimum(L): O(1) 7. Maximum(L): O(1)
Deletion operation of dict w/ unsorted array
- The definitions states that we are given a pointer x to the element to delete, so we need not spend any time searching for the element. But removing the xth element from the array A leaves a hole that must be filled. - We could fill the hole by moving each of the elements A[x+1] to A[n] up one position, but this requires Theta(n) time when the first element is deleted. - The better implementation is just write over A[x] with A[n] and decrement n which only takes constant time
Dictionaries
- The dictionary data type permits access to data items by content - Primary operations of dictionary are: 1. Search(D,k): Given a search key k, return a pointer to the element in dictionary D whose key value is k, if one exists 2. Insert(D,x): Given a data item x, add it to the set in the dictionary D 3. Delete(D,x): Given a pointer to a given data item x in the dictionary D, remove it from D - Certain dictionary data structures also efficiently support other useful operations: 1. Max(D) or Min(D): Retrieve the item with the largest (or smallest) key from D. This enables the dictionary to serve as a priority queue 2. Predecessor(D,k), or Successor(D,k): Retrieve the item from D whose key is immediately before (or after) k in sorted order. This enables us to iterate through the elements of the data structure
Arrays
- The fundamental contiguously-allocated data structure. - Arrays are structures of fixed-size data records such that each element can be efficiently located by its index or address - Constant-time access given the index - Because the index of each element maps directly to a particular memory address, we can access arbitrary data items instantly provided we know the index - Space efficiency: Arrays consist purely of data, so no space is wasted with links or other formatting information. End-of record information is not needed because arrays are built from fixed-size records - Memory locality: Arrays are good for iterating through all the items because the exhibit excellent memory locality. Physical continuity between successive data access helps exploit the high-speed cache memory on modern computer architectures. - The downside of arrays is that we cannot adjust their size in the middle of a program's execution. Program will fail soon as we try to add the (n+1) customer if we allocate room for n records. - We can compensate by allocating extremely large arrays, but this can waste space, again restricting what our program can do.
Maximum
- The maximum element sits at the tail of the list, which would normally require theta(n) time to reach in either singly or doubly linked lists - We can maintain a separate pointer at the tail of the list, provided we pay the maintenance costs for this pointer on every insertion and deletion. - The tail pointer can be updated in constant time on doubly-linked lists
Traversal dict w/ doubly/singly ll
- The predecessor pointer problem complicates this - The successor can be implemented in constant time
Containers
- The term container is used to denote a data structure that permits storage and retrieval of data items independent of content - Containers are distinguished by the particular retrieval order they support. - In the two most important types of containers, this retrieval order depends on the insertion order - Containers can be implemented using either arrays or linked lists
Best, Worst and Average-Case Complexity
- The worst-case complexity of the algorithm is the function defined by the maximum number of steps taken in any instance of size n. - The best-case complexity of the algorithm is the function defined by the minimum number of steps taken in any instance of size n. - The average-case complexity of the algorithm, is the function defined by the average number of steps over all instances of size n. - The worst-case complexity proves to be most useful of these three measures in practice.
Insertion in a tree
- There is only one place to insert an item x into a binary tree T where we know we can find it again - We must replace the NULL pointer found in T after an unsuccessful query fo the key k - The implementation uses recursion to combine the search and node insertion stages of key insertion. 1. A pointer l to the pointer linking the search subtree to the rest of the tree 2. The key x to be inserted 3. A parent pointer to the parent node containing l. - The node is allocated and linked in on hitting the NULL pointer. - O(h) time
Collision Resolution
- There may be two distinct keys that will occasionally hash to the same value. - Chaining is the easiest approach to collision resolution: Represent the hash table as an array of m linked lists. The ith list will contain all the items that hash to the value of i. Chaining is very natural but devotes a considerable amount of memory to pointers. This is space that could be used to make the table larger and hence the "lists" smaller - The alternative to chaining is open addressing. The hash table is maintained as an array of elements, each initialized to null. On an insertion, we check to see if the desired position is empty. If so we insert it. If not we must find some other place to insert it instead. The simplest possibility (called sequential probing) inserts the item in the next open spot in the table. If the table is not too full, the contiguous runs of items should be fairly small, hence this location should be only a few slots from its intended position. - Chaining and open addressing both require O(m) to initialize an m-element hash table to null elements prior to the first insertion. - Traversing all elements in the table takes O(n + m) time for chaining, because we need to scan all m buckets looking for elements
Predecessor/Successor operation of dict w/ unsorted array
- These traversal operations refer to the item appearing before/after x in sorted order. - In an unsorted array an element's physical predecessor (successor) is not necessarily its logical one. - Instead the predecessor of A[x] is the biggest element larger than A[x] - Both operations require a sweep through all n elements of A to determine the winner
Traversal in a tree
- Visiting all the nodes in a rooted binary tree proves to be an important component of many algorithms. - A prime application of tree traversal is listing the labels of the tree nodes. - Binary search trees make it easy to report the labels in sorting order. - By definition all keys smaller than the root must lie in the left subtree of the root and all keys bigger than the root in the right subtree. - Visiting the nodes recursively in accord with such policy produces an in-order traversal - Each item is processed once during the course of traversal, which runs in O(n) times, where n denotes the number of nodes in the tree
Dynamic Arrays
- We can efficiently enlarge arrays as we need them though dynamic arrays. - Suppose we start with an array of size 1, and we double it's size from m to 2m each time we run out of space - This doubling process involves allocating a new contiguous array pf size 2m, copying the contents of the old array to the lower half of the new one and returning the space used by the old array to the storage allocation system. - Each of the n elements move only two times on average and the total work of managing the dynamic array is the same O(n) as it would have been if a single array of sufficient size had been allocated in advance. - The primary thing lost is the guarantee that each array access takes constant time in the worst case. Now all the queries will be fast, except for those relatively few queries triggering array doubling.
How good are binary search trees?
- When implemented using bst, all three dictionary operations take O(h) time - The smallest height is when the tree is perfectly balanced, where h = [logn] - The insertion algorithms puts each new item at a leaf node where it should have been found. The data structure however has no control over the order of insertion. - Binary trees can have heights ranging from n to logn
Dictionary Operations with doubly-linked unsorted list
1. Search(L,k): O(n) 2. Insert(L,x): O(1) 3. Delete(L,x): O(1) 4. Successor(L,x): O(n) 5. Predecessor(L,x): O(n) 6. Minimum(L): O(n) 7. Maximum(L): O(n)
Dictionary Operations with singly-linked unsorted list
1. Search(L,k): O(n) 2. Insert(L,x): O(1) 3. Delete(L,x): O(n) 4. Successor(L,x): O(n) 5. Predecessor(L,x): O(n) 6. Minimum(L): O(n) 7. Maximum(L): O(n)
Dictionary Operations with doubly-linked sorted list
1. Search(L,k): O(n) 2. Insert(L,x): O(n) 3. Delete(L,x): O(1) 4. Successor(L,x): O(1) 5. Predecessor(L,x): O(1) 6. Minimum(L): O(1) 7. Maximum(L): O(1)