ECS 36C Midterm 1 Study Guide

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Bipartite Matching

* Will not be asked anything from this slide deck *

Different Classes of Functions *

- Logarithms (regardless of base) are upper-bounded by polynomials. - Polynomials are upper-bounded by exponential functions. - Exponential functions are upper-bounded by factorial - n! is O(n^n)

Binary search trees. Understand how find, insert, and delete work. Binary tree vs. binary search tree.

A binary search tree is a binary tree in which nodes in the left subtree are less than the root and nodes in the right subtree are greater than the root. A binary search tree is similar to a binary search structure: we are comparing the node to our target, if it is not equal, we are comparing whether our target is less than or greater than to the left or root node and continuing accordingly. The worst case look for a binary tree is linear. This scenario occurs when you insert a B that is greater than A and a C that is greater than B and a D that is greater than C. Worst-case find scenario takes Θ(n) time. Inserting in a binary search tree. We are assuming that there are no duplicate elements. First, we would perform a find to determine the adequate location for the new element. Then, we would create a new node. Deleting in a binary search tree. First, we would find the target element. Then we would remove it: If the target is a leaf, then we may remove it and be finished. On the other hand, if the node that we are deleting has children, how can we remove the node without ruining the order of the children? Solution: replace the deleted element with a proper descendant. We need to pick an element adjacent to delete one in the sorted order of the values. We have two choices for a replacement: 1) The max of the left subtree, that max's left subtree is moved up to max's old spot. 2) Min of right subtree, that min's right subtree is moved up to min's old spot. Binary Search Tree Worst-Case Runtimes Find() -- Θ(n) Insert() -- Θ(n) Delete() -- Θ(n)

Tree Terminology

A tree consists of nodes and edges. Edges have implicit downward direction. A root node is a node with no incoming edges. The children of a node are the nodes that sprout from a root node. A parent is a node from which there is an edge to a target node. A leaf has no children. Siblings are nodes with the same parent. An ancestor is a node encountered along the path from the root to the target node. A proper ancestor does not include itself. A descendant is a node C such that B is an ancestor of C. A proper descendant does not include itself. A subtree is a parent and its descendants. The level/depth of a node is the number of edges on the path from the root to the node. The height of a tree is the maximum level of any node in the tree (when finding the height, the root does not count).

Why the worst-case time complexity of find, insert, and delete are linear for both a binary tree and a binary search tree.

ASK IN OFFICE HOURS ???

Using asymptotic notation to create a placeholder for an anonymous function, as was done on slide #49.

Abuses of notation such as T(n) = O(f(n)) permits certain conveniences, such as making formulas containing an anonymous function that need not be named. EXAMPLE: f(n) = 2n^2 + Θ(n) is Θ(n^2) - "Θ(n)" refers to some function satisfying this order of growth.

Binary Tree Traversal

Common patterns for traversing all nodes of a tree. Preorder Traversal: PLR- parent, left, right Inorder Traversal: LPR- left, parent, right Postorder Traversal: LRP- left, right, parent All tree traversals take Θ(n) itme. Doubling n means doubling the number of nodes to traverse.

How linear search works.

In linear search, we are iterating through a list (starting at the beginning) until we find the target element.

Big-O Formal Definition. Proving that a big-O holds. Big-O is an upper-bound.

It indicated how fast a function grows as n increases, so the rate of change of f(n) as n increases. Classifies growth rate of a mathematical function f(n) by placing an upper bound on said growth rate. Focuses on dominant part of the function, dismissing non-dominant terms and leading coefficients. Can be thought of as an "upper-bound".

Representing a prefix code using a binary tree. Be able to recognize a prefix code is optimal or not in simple cases, e.g. you should be able to recognize that the prefix code on slide #14 is not optimal. (slide deck#6) Although the definition of optimal is defined by the ABL, you will not be asked about the ABL or to compute it Will not be asked about the Shannon--Fanno approach.

Representing prefix codes using binary trees: * binary tree with |S| leafs, where S is the alphabet * label each leaf with a distinct letter in S * go to LEFT child if 0 * go to RIGHT child if 1 the GOAL: Find a binary tree T and a labeling of the leaves of T that minimizes the average number of bits per letter. Full binary tree: Each node that is not a leaf has two children. Binary tree corresponding to optimal prefix code must be full.

Trees: Computing the height and balance factor of a node.

Self-balancing binary search trees can usually achieve logarithmic time, i.e. Θ(lg n). Balance Factor; Can give each node a balance factor --> height(leftSubTree - height(rightSubtree). Subtree is left-heavy if root's balance factor is positive and right-heavy if negative. The height of empty/nonexistent subtree is -1. height(node) = 1 + max(height(leftSubTree), height)rightSubtree)).

Auxiliary space*

Space used by the algorithm, as a function of n(the input size), not including the input size.

Data compression

There are various definitions: 1) Encoding information using fewer bits than the original representation 2) removing redundancy from data 3) reducing the average number of bits per character. Examples of usage: Compressing a file to reduce the storage requirements. Improving the network speed of transfer by reducing/shrinking the item that is being transferred.

Properties

Symmetry: f(n) is Θ(g(n)) if and only if g(n) is Θ(f(n)). Transpose symmetry: f(n) is O(g(n)) if and only if g(n) is Ω(f(n)).

Definition of big-Θ. What kinds of bounds it is. Proving that it holds.

T(n) is Θ(f(n)) if both of the following are true: 1. T(n) is O(f(n)) 2. T(n) is Ω(f(n)). This means that T is asymptotically tight-bounded by f. Example: T(n) = 2n^2 +6 T(n) is Θ(n^2) T(n) is not Θ(n^3) because it is not Ω(n^3) T(n) is not Θ(n) because it is not O(n).

How binary search works. When it can be used.

The algorithm for binary search: Splitting the ordered list in half, check if the middle item in the list is equal to the target. If it is: return true. If not: If target < middle, repeat the process on the left-side of the list If target > middle, repeat the process on the right-side of the list. Repeat the above steps until you either find the target or there are no more elements to look at. Binary search can only be used on ordered/sorted lists (it does not matter whether sorted from smallest to largest or largest to smallest).

Change of Base Formula

To go from base of b to base of a (and vice versa): EX: log(5)n is Θ(log(7)n) log(7)n is Θlog(5)n

Fixed-Length Encoding - How it works - Why it is wasteful and not ideal

We store symbols using a fixed number of bits for each. For example, all instances of A would be represented as 00000, all instances of B would be 00001, C would be 00010, and so on. In order to create sequences, you just concatenate the words: AB --> 0000000001. However, this approach is wasteful as it fails to take into account that some symbols are more common than others. This approach does not take advantage of minimizing data that is more commonly used. We would want data that is more common to be represented with less bits. For example, A is more common than X or Z, so it would make more sense for A to use 4 bits as opposed to 5.

Variable length encoding - Why it is typically preferred over a fixed-length encoding - Why we want a variable-length encoding to be a prefix code

We will observe morse code. Morse code translates each letter into a sequence of dots (short pulses) and dashes (long pulses). More frequent letters are encoded with shorter strings. This method is preferred over a fixed-length encoding because this method takes into account the more commonly used letters, minimizing the space taken. However, there IS a drawback: In morse code, certain bit strings are prefixes of other bit strings. In real life, we would use short pulses, long pulses, and pauses to distinguish these, however, these can not be translated to bits.

Vectors: - Why inserting at the front of a vector takes linear time. - Implementation of std::vector. Capacity. When the capacity changes. Why changing the capacity may invalidate references. - Why inserting at the back of a vector takes linear time in the worst case. - You will not be asked about amortized analysis.

When we insert an item into a vector at full capacity, we can't assume that there will be an open slot at the end of the vector where we'll be able to place the new item. In order to correct this, we need to allocate a new underlying array with an increased capacity (usually twice as much previous capacity) and then we need to copy the elements to the new underlying array and delete the old array. Deleting the old array will invalidate references. Invalidating references: if you have a reference to an element of a vector and the vectors capacity changes (doubling in size), the elements are going to move therefore your reference to that element will no longer be correct. In order to find the new address, you just need to reassign the vector. Best-case scenario for insertion: inserting at the end with push_back(), emplace_back() or insert() and triggering a capacity increase. Such an insertion takes Θ(n) time, due to moving n-1 elements. Worst-case time complexity of insertion (insert at the front) is linear time. Best-case time complexity of insertion (insert at back) is linear time. All that a "dynamic array" does is hide the work of resizing a "static array" from you.

Prefix codes slide deck: data compression, slide 6 clarification on slide 8, 9, 10

a set S that maps one symbol to represent another symbol such that the representative symbol does not contain a prefix to another symbol. Recipient reconstructs messages by: - Scanning bit sequence left to right - After encountering enough bits to match encoding to a letter, output the letter. - Delete that letters bits and continue scanning No symbols representation is a prefix of another symbols representation.

Common categories of functions: constant, logarithmic, or polynomial time.

na

Splay Tree: - Find/insert/delete. Splaying. Understand the different kinds of rebalancing operations that are done / when to do them. - You will not be asked about the amortized worst-case time complexity. - Understand why a splay tree might be preferable over an AVL tree in certain situations

(m (n) A splay tree is self-balancing but less rigidly so. There are no balance factors. When a node is accessed (whether by find, insert, or delete), the node is pushed/splayed to the top at some point, using the shortest sequence of rebalancing operations. Rebalancing operations. Each rebalancing operation involves three nodes: the node-to-splay and the next two nodes on the path to the root. There are two kinds of operations: 1) Zig-zag (AVL tree-style double rotation) and 2. Zig-zig (not similar to anything an AVL tree does). If the node-to-splay ic a child of the root, then do a single rotation/pivot. When inserting, we, again, use BST rules. When splay operation occurs: Push node to the root when: * Insert the node, i.e., insert the node following normal BST rules, then push to top. * Find the node. Unlike previous data structure, a find operation modifies the structure. * Delete the node. Deletion 1. Push the node-to-delete to the top. (Use a find operation to find the node) 2. Delete it (now the root) 3. Find the largest in left subtree. Push that element to root of left subtree. 4. Make it root of entire tree. Reasons to Use Splay Tree * Possibly preferred in situations in which data that was recently accessed is more likely to be accessed. Worst-case time complexity * Single operation (find/insert/delete): Θ(n) in worst-case * m operations (any combination of find/insert/delete): Θ(m lg n) time in worst case --> single operation takes amortized Θ(lg n) time in worst case. Splay operation on a deep node will bring it and its surrounding nodes up --> worst case scenario probably can't happen consistently, unlike normal BST. Compared to AVL Tree * AVL tree is preferable for consistency * Splay tree is easier to program * Splay tree uses less memory; no height info

How a hash table and the hash function works.

* Supports less operations than self-balancing BST but are faster; sorted order of elements isn't maintained. * Underlying implementation is list of m slots/buckets * As with BSTs, location of element is influenced by its key * Hash function maps key to some number in range [0, m - 1] * Range of possible keys usually much larger than m (number of slots/buckets) ( Want to distribute keys as evenly as possible * Assume keys are always integers for now -> typical hash function is hash(x) = x % m. You can find an element using a hash function. For example, if you were looking for 97 in a 10 bucket hash table, you would do 97 % 10 = 7, 97 should be found in bucket 7.

Sorting + Selection sort - Understand how the algorithm works - Understand the worst-case time complexity + Insertion sort - Understand how the algorithm works - Understand the worst-case time complexity - Understand why this algorithm is better than selection sort of an already sorted list + Mergesort - Understand how the algorithm works + Quicksort - Understand how the algorithm works with the first choice pivot rule - Understand the worst-case time complexity with the first choice pivot rule

*Selection sort* - View the list as two partitions, one sorted part, one unsorted - Until partition 2 (unsorted) is empty, you take the smallest element from the unsorted part and switch it with the element following the end of the sorted list - Worst case time complexity is θ(n^2) - Earlier inner loops do more of the work, best case θ(n^2) because you still have to compare every element to all other elements *Insertion sort* - View list as two partitions: sorted and unsorted. You take the first element of the unsorted partition and slide it to its correct spot relative to the sorted partition. - Worst case time complexity is θ(n^2) - Later inner loops do more work - Best time complexity is when the list is already sorted, resulting in linear θ(n) time complexity - Insertion sort is better than selection sort when it comes to an already sorted list because insertion sort would just *Mergesort* - Divide and conquer algorithm - Steps: 1. Divide unsorted list into two equal halves 2. Conquer: recursively mergesort each half * Can view base case as either (doesn't change runtime): 1. 1 or 2 element remain 2. 1 element remains * Worst case time complexity is θ(n lg n) 3. Combine two sorted halves into final sorted list *Quicksort* - Worst-case time complexity is θ(n^2) - Average-case time complexity is θ(n lg n) - Also a divide and conquer algorithm - Steps: 1. According to some rule, choose a pivot element v in list. Divide list into two partitions, such that the first partition has elements less than v and the second partition has elements greater than v. - v is excluded from both halves. After this partitioning, we know v is in its final spot. 2. Conquer: Recursively quicksport each half (base case: 0 or 1 elements remain. 3. Combine two sorted halves into final sorted list.

Huffman's Algorithm - How to perform it

1. Find the two least frequent letters y*, z* in the set S. 2. Replace y* and z* with a "meta-letter" ω (omega), forming smaller alphabet S' such that |S| - 1 = |S'|. 3. Recursively find prefix code for S'. * Base case (|S| = 2): Encode one letter with 0 and the other with 1. 4. "Open up" ω back into y* and z*. SLIDE DECK 6, SLIDE 17 FOR BETTER VISUAL EXAMPLE Unpacking the alphabet created using Huffman's algorithm, will give you the optimal prefix. Huffman's Algorithm Implementation and Worst-Case Time Complexity * Let k = |S| (size of the alphabet( * Each recursive call decreases size of alphabet by 1 -> θ(k) recursive calls. * In each call, you must identify the wo lowest-frequency letters * Naive: θ(k) time * Priority queue: θ(lg k) time (letters frequency is uts key. Extract minimum twice and insert new meta-letter) Total: * Naive: θ(k^2) time * Priority queue: θ (k lg k) time.

Binary Tree

A binary tree is a tree in which every node has at most two children. Every node in a tree will either have a left child, a right child, or no children. With a binary tree, every node is called a tree. This further supports the idea that there are subtrees within trees. Inserting into a binary tree, the quickest way, would be to insert eh node wherever it fits. First, we would create the node. If we are inserting left, we set the left child of the node to the left of whatever we are inserting. Then, we set the left child of the root node we are working on to the new node. TLDR: We are connecting C to B and then connecting A to C. A -- (left) B -->A - (left) C - (left) B.

Hashing and Hash Tables: Collision resolution: - Separate chaining - Open addressing * Linear probing * Quadratic probing * Double hashing Deletion - Lazy deletion - Computing the load factor - Rehashing - Be able to argue why a given operation may be faster/slower on a hash table vs. an AVL tree vs. an ordered or unordered linked list

A collision occurs when you are inserting a new item into the function but there's already an item within the bucket. For example, if you were to insert 13 % 10 = 3 and then tried to insert 53 % 10 = 3, you would find that the 13 is already in bucket #3. This is called a collision. In order to solve this collision we can do SEPARATE CHAINING. In separate chaining, each bucket within the hash table contains a linked list. You would use the hash function to determine what list to check. If there's a collision, you would just append to the beginning of the linked list. (deck 7, slide 7). Separate chaining analysis: The load factor, or average length of list, is represented by lambda λ is the ratio of number of elements in the hash table to the table size, λ = n / m. Search involves finding the list-to-traverse (takes constant time) and traverse the said list. On average, unsuccessful search checks λ nodes. For separate chaining, load factor is more important than table size. m (the size of the table) SHOULD ALWAYS BE KEPT AS A PRIME NUMBER, as it helps with the distribution of keys. Collision resolution -> Open addressing * no linked lists * if collision occurs, try to place key at another bucket as determined by open addressing scheme, repeat until successful. * There are three kinds of open addressing schemes: 1. Linear probing 2. Quadratic probing 3. Double hashing Linear probing: * If you can't place the key at the bucket, try to place at the next bucket (with wrap around). If that doesn't work, try the next bucket, and so on. * Find operation: if you do not find the target at the bucket is was supposed to be at, just keep checking the next buckets until target is found or empty spot is found * If you find an empty spot, assuming no deletion, then you know there is no way your target is in hash * Deletion (the BAD way) simply deleting an item from its bucket will report a false NOT FOUND. For example, if I have 23 % 7 = 2 in bucket 2 and then add 37 % 7 = 2 in bucket 3. If I were to delete(23), I would find an empty slot and would get that there is not 37 to be found in the hash. * Deletion (the GOOD way): When deleting an item, mark that the item is deleted (don't actually have to delete it, this saves time). A lazily deleted node can be replaced. After deleting the item, mark the node as empty but that it once held an element in it, this will allow us to continue to search to the following buckets, knowing that there was an item there that was deleted, so our target could potentially be in a neighboring bucket. Weakness of linear probing: Primary clustering: blocks of nearby occupied buckets tend to form. New key may take several collision resolution attempts (this adds to the cluster). Quadratic probing solves this problem. Quadratic probing. * Eliminates the primary clustering issue. * If you can't place the key at bucket u, try to place at bucket 1^2 = 1 after that one. If that doesn't work, try to place at bucket 2^2 = 4 after u. If that doesn't work try to place at bucket 3^2 = 9 after u. And so on... (wraparound when appropriate). Alternative way of thinking; check next bucket, then check 3 buckets later, then check 5 buckets later, then check 7 buckets later, etc. Quadratic probing Analysis: * For linear probing, high λ degraded performance * For quadratic probing, λ > 1/2 can make it impossible to find an empty bucket. If the table size is not prime, it can even happen when λ <= 1/2. It can be proved that if a table is half empty and the table size is prime, it is guaranteed to find an empty bucket. Weakness: Quadratic probing is vulnerable to secondary cluster, where elements hashed to the same location will probe the same buckets. Collision Resolution - Open Addressing: Double Hashing * Can eliminate the secondary clustering issue * Requires a second hash function h2(x). * If can't place key k at bucket, try to place h2(k) spots later. If doesn't work there, try h2(k) spots later. And so on... (wraparound when appropriate). * Slower than quadratic probing in practice because of the second hash function. Rehashing * If λ is too high, we can rehash. * Steps: (let m be old size, m' be new table size) REHASH WHEN λ > 1/2 1. Create a new table of size m' = nextPrime(2m), where nextPrime(x) returns the lowest prime number above x. 2. Insert each element in old table into new table, using new hash function h2(k) = k%m'. Rehashing worst-case time complexity: * Find/insert/delete: Rehashing: θ(n)

Logarithmic Functions

A common example of a logarithmic function is found in binary search. The worst-case scenario for logarithmic functions is the case where the target is not found, meaning we would go through every iteration. The number of iterations scales logarithmically as n increases. Doubling the input size will increase the number of times the list can be cut in half by 1 which then increases the number of iterations by 1. The time complexity of a binary search: Θ(lg n) (Θ(log2n)).

P4 Concepts + Move semantics - You should understand what a move constructor or move assignment operator does vs. a copy constructor or copy assignment operator and why the move version is more efficient than the copy version - Why it is important to leave the object whose contents are "stolen" in a "moved from" state

A copy constructor will make a copy of the data structure, wasting time and space. With the move constructor, you are keeping the original structure, you are just transferring ownership. Allows objects we'll call x to steal the contents of an object we'll call y. NO COPYING OCCURS! x is not copying the contents of y. - Requires that after an object y is moved into an object x: x is equivalent to the former value of y. y is in a moved from state. Can only: 1. Reassign it. 2: Destruct it. - It is important to leave y in a moved from state, as we do not want y to have any more access to the data.

Deques: How std::deque is implemented, and how this implementation makes std::deque support a faster insertion at the front than std::vector does. Know why accessing an arbitrary element in an std::deque is slower than in an std::vector.

A deque is a double ended queue; both ends are open. Like a vector, deque is dynamic (a easily resized) array. Example: std::deque<int> d; d.push_front(5); d.push_back(18); d.push_front(13); d.push_back(12); If we iterate through the deck and print it, (for int a : d) std::cout << ", "; we would get 13, 5, 18, 12 Unlike a vector, a deque supports constant time insertions at the front (push_front()) and back (push_back()). Interacting with the middle of the container is not ideal as accessing a specific element requires more than one indirection --> slower than a vector. Inserting/deleting in the middle might require moving some elements, invalidating references.

Linked lists: - How the various methods work and how to implement them. You could be shown a linked list implementation that could be the one from the lectures or one that is similar, and you could be asked to write code involving such an implementation. You should be able to recognize flaws in an implementation, e.g. why a given implementation of the remove() method does not work. - You should be able to determine the worst-case time complexity and best-case time complexity of a given linked list operation.

A linked list maintains ordering of elements without guaranteed contiguous placement in memory, you only need to know where the head is, and from there, you are able to link to the rest of the items in the list. A linked list is a chain of nodes wherein every node is made up of 1) data 2) a pointer (or reference) to the next node. Lasts node's next pointer is nullptr. - pushFront appends to the front of the ist, takes constant time where n is the length of the list. When appending an element to the front of the list, we are appending it to the head. Therefore, we will make a new shared ptr (type node T) and then we set temp = head (in order to keep track of the initial list). The, we would create a new node wherein we will store the data. Finally, we would connect our head to the new node, and link the new node to the previous second node (if any). (Worst-Case time complexity: constant Best-case time complexity: constant ) - isEmpty() is to check if the list is empty, we know a list is empty when the head nodes pointer goes to null. (Worst-Case time complexity: constant Best-case time complexity: constant) - length(), to get the length of the list we would have to traverse through the entire list and keep a count of the nodes we encounter. We can keep a curr pointer and a counter variable to keep track of the nodes we encounter. curr will start at the first node, we increment the count, then we go to the next node, then we increment the count. Once we reach a null we will know that we are the end, therefore we can break out of the loop and return the counter. (Worst-Case time complexity: linear Best-case time complexity: linear) - contains() is a function used to find a target within our linked list. We will use a loop to start at the head and iterate through the list until we hit a null. Using a curr, again, we compare our curr.data to our target and return true if it is found. If we iterate through our entire list without once finding our target, we would return false. (Worst-Case time complexity: linear Best-case time complexity: constant) - remove(), we are traversing the list (element-by-element) until we find the node with our target data. We would then remove this node. An issue, however, is that the next link of this node will be broken, making us lose access to the remainder of our list. In order to correct this, we can use a variable to store the next node of the node we are deleting and then set the link of the previous node to the node that followed the deleted node. So, if the list was once 1->2->3->4 and our target node was 3, after the list would be 1->2->4. When the item we are deleting is a head, we can simply just update the head. (Worst-Case time complexity: linear Best-case time complexity: constant) - pushBack() gives us two scenarios: 1) the list is nonempty, in this case we would keep iterating until our previous variable references the last node and then we would add our new node. 2) the list is empty, we would set a new node as head. (Worst-Case time complexity: linear Worst-case time complexity: linear) - operator<< prints the elements in the list. (Worst-Case time complexity: linear Best-case time complexity: linear) /std::ostream& operator<<(std::ostream& os, const Array<T, Length>& array) { for (int i = 0; i < (int)Length; i++) { os << array.at(i) << " "; } os << std::endl; return os; /

Queues

A queue is a linear data structure in which elements are inserted and removed at opposite ends. The first item in, is the first item out. Think of it as a line, we exit the line at the front, and join a line at the back.

Stacks

A stack is a linear data structure where elements are inserted and removed from the same end. In other words, the first items in are the last items out. Think of it as a stack of dishes, when you place a new dish on the stack, you place it up top. When you take a dish from the stack you take it from the top. std::stack are container adapters, meaning their underlying container is by default another STL container (e.g. std::vector, std::deque, std::list). The underlying container affects the runtime analysis. A container adapter effectively decreases the functionality of a container in order to present a simpler interface.

AVL Tree: - Understand how an AVL tree has better worst-case time complexity for find/insert/delete than a binary search tree does. - How find works. - How insertion works. When/how to rebalance. Be able to identify the four different rebalancing scenarios and the appropriate rebalancing operations (two single rotations and two double rotations). - How deletion works. When/how to rebalance. May have to rebalance multiple times. Rebalance operation might involve nodes that were not ancestors of the deleted node.

An AVL tree is a binary search tree in which each node has a balance factor of -1, 0, or 1. Can prove that at any time, the height of an AVL tree is at most around 1..4 lg n, where n is the number of nodes. Thus, search takes Θ(lg n) in the worst case. We avoid linear worst-case scenarios. Insert() Start with the same procedure as BST. Inserted node is a leaf, i.e. height and balance factor of ). Update the height and balance factor of the parent: New node is left chil --> increment parent's balance factor. Else --> decrement. Update parent's parent, et.c Recursively apply until either: Update root, Update node whose balance factor becomes zero, Update node that becomes unbalanced; rebalancing occurs. If you adjust paren't balance factor to zero, don't adjust ancestors. Rebalancing - You never need to do more than once rebalancing per insertion. Four Reasons a rebalance is needed: 1. An insertion into the left subtree of the left child of k1. 2. An insertion into the right subtree of the left child of k1. 3. An insertion into the left subtree of the right child of k1. 4. An insertion onto the right subtree of the right child of k1. Deletion Start with the same procedure as BST Going from replacement node (if any) up to the root of the tree, rebalance any unbalanced node.

Definition of big-Ω (omega). What kinds of bounds big-Ω has. Proving that it holds.

Big Omega is a lower bound.

Graphs - Will not be asked terminology-based questions about graphs, e.g. what is the degree of this vertex? - Graph representation: adjacency matrix vs adjacency list. Trade-offs in regards to space, edge lookup, and iterating over neighbors. - For each of breadth-first search (BFS) and depth-first search (DFS) * Understand how the traversal works * Be able to tell if a given ordering of the vertex is a BFS or DFS ordering. * Be able to run the algorithm in order to generate a BFS or DFS ordering. - Will not be asked about layer-based BFS. - Will not be asked about recursive DFS - Single-source shortest paths problem * Be able to perform Dijkstra's Algorithm *Network Flow problem* - Flow network/capacities - Constraints on a flow - Value of a flow - Ford-Fulkerson algorithm (Residual graph, how the algorithm works)

Defining a graph: * Consists of vertices/nodes and edges * Each edge joins two nodes * self-loop: edge from node to itself * multi-edge: two or more edges joining the same two vertices * unless otherwise stated, we assume no self-loops or multi-edges Terminology: * An edge is incident to a node if the edge touches the node. Such a node is incident to the edge. * neighbors: two nodes joined by an edge * parth: sequence of nodes, such that each consecutive pair is jointed by an edge * simple path: distinct (i.e. no repeated) vertices * cycle: all nodes are distinct, except for the first and last, which must be the same * acylic graph: has no cycles * weighted graph: each edge has a weight * undirected graph: each edge has no direction/is bidirectional * nodes are the ends * directed graph: each edge has direction * edge leaves tail and enters head Undirected graph * degree: number of incident edges Directed graph * in-degree: number of incoming edges * out-degree: number of outgoing edges An undirected graph is connected if for every pair of nodes u and v, there's a path from u to v. (Every node has to have at least one dege connecting it to the rest of the graph). Graphs vs Trees * trees: undirected, connected, acylic graph * all algorithms that work for arbitrary graphs work for trees, but tree-specific algorithms may work better on trees than the graph-equivalent Representing a graph: Adjacency matrix * n-by-n matrix A (where n is the number of vertices( such that A[u, v] = 1 ("true") if edge between u and v * undirected graph --> symmetric matrix (i.e. A[u, v] = A[v, u]) Analysis * Good for dense graphs (θ(n^2) edges) * θ(n^2) space --> space-inefficient for sparse graphs (not θ(n^2) edges). * θ(1) time edge lookup * θ(n) time to iterate over neighbors Adjacency List * Each node has neighbor list (for every node, there is a list that contains all the nodes that the curr node can get to through its edges) Analysis * usually preferred for sparse graphs * θ(m + n) space, where m is number of edges, n is number of vertices * θ(dv) time edge lookup, where v is an incident node and dv is its degree * θ(dv) time to iterate over neighbors Graph Traversals * From a given start node, iterate over all nodes connected to it * The specific traversal affects the order the nodes are traversed/processed (many common terms). Processing must only be done at most once per node. * Two main traversals: * breadth-first search (BFS) * depth-first search (DFS) Breadth-First Search (BFS) * Process start node, then process its neighbors, then process those nodes' neighbors, etc. * Layer by layer traversal. Explores nodes closer to start node first. * May be multiple legal BFS traversals/orderings. * Space complexity (auxiliary space): θ(n), where n is the number of vertices * Time complexity: θ(n^2)? θ(nm)? Implementation * BFS only requires one queue * Dequeue node then process it * When process node, enqueue neighbors Side Note: Handshaking Lemma * aka degree sum formula * two definitions: * Every undirected graph has an even number of vertices with odd degree * The sum od dv = 2m, where m is the number of edges and the sum of dv is the sum of all vertices' degrees * Intuitively: Whenever we add a new edge to a graph, the sum of the degree goes up by 2 Depth-First Search (DFS) * Rather than layer-by layer, you go down until you find a node that's already been discovered. * May be multiple legal DFS traversals/orderings. Implementation: Only requires one stack (could be the activation stack (indirectly) --> recursive DFS) * Processed [v] is tye iff v has been processed (i.e. the node has been popped from the stack at least once.) Mark it to true once its actually been processed, might end up in stack multiple times THIS IS SOMETHING WE WANT TO HAPPEN. DFS Analysis * Space complexity (auxiliary space): θ(n), where n is the number of vertices * Time complexity * First loop does n iterations * Outer while loop does m iterations during the entire duration of the algorithm * Inner for loop does at most the sum of dv = 2m iterations during the entire duration of the algorithm, NOT per outer loop iteration, where m is the number of edges * Final: θ(n + 3m) = θ(n + m) --> linear time Single Source Shortest Paths (SSSP) * Input: graph and vertex s * Can be weighted or unweighted (unweighted graph is weighted graph in which each edge has same arbitrary cost, e.g. 1) * Output: shortest path from s to each of the other nodes, where shortest means smallest total edge cost Dijkstra's Algorithm * Finds shortest path from start node s to all other nodes in weighted graph * Maintains set of vertices X, if v is in X, then we have determined the shortest path from s to v. * At start, X = {s} * Each step, use a greedy rule to choose which vertex to add to X next, (including which edge to use to get to that vertex) * Chosen vertex must be neighbor of a vertex in X. * Repeat until either: * t is in X (if only want shortest path from s to specific node t) * All vertices in X. * We just need the correct greedy rule: which vertex do we choose next? Dijkstra's Algorithm Analysis Worst-time Completion * Outer while loop; n iterations * Choosing next vertex: θ(n) tme per outer While loop iteration * Iterating over neighbors: dv time per outer While loop iteration --> θ(m) iterations during entire duration of algorithm * Total: θ(n^2 + m) = θ(n^2) time --> quadratic time *Network Flow problem* - A directed graph made up of a source node, internal nodes, and sink node. The source node has no incoming edges, generates traffic/flow. The sink node has no outgoing edges, absorbs traffic/flow. Internal nodes may have incoming and outgoing edges. Edge weights are CAPACITIES. Always assume one source and one sink node. Edge weights specify current flow. Constraints are: capacity condition and conservation condition. NEED TO FINISH

Priority Queues - Binary heap implementation. * Definition of binary head. Heap-order property and structure property. * You will only be asked about min heaps not max heaps. * Insertion. Percolate up. * Delete. Percolate down. * Will not be asked about underlying array representation. * Understand the worst-case time complexity of operations. - Will not be asked about the extended priority key API. DecreaseKey() IncreseKey(), Remove() will not be on this exam. - Will not be asked about build head, which we skipped during lecture.

Priority queues decide what item will leave the queue based on various criteria (not just first in, first out like queues). Priority queue: * Insert * DeleteMin Unordered Linked List * insert: constant time (at front) * DeleteMin: linear time Ordered Linked List * Insert: linear time * DeleteMin: constant time (at front) Binary Search tree * Insert: linear time DeleteMin: linear time Self-balancing BST * Insert: logarithmic time * DeleteMin: logarithmic time Binary Heap * Preferred implementation for priority queue * Insert: logarithmic time (Average: constant time) * DeleteMin: logarithmic time (Peeking at min: constant time) BINARY HEAP * binary heap: a binary tree with two properties 1. structure property: complete binary tree (completely filled, except possibly the last level which is filled left to right) 2. heap-order property: each node's key is less than its childrens keys * min heap: root is smallest element (max heap reverses heap-order property) * basic operations: insert, delete root (min if min heap) Insert: 1. Insert new element in correct spot to maintain structure property 2. Move the element up until heap-order property is restored (percolate up) Insert Analysis: * Worst-case time complexity: θ(lg n) time (inserted node becomes roots * Average-case time complexity: constant time Delete: 1. Remove root 2. Move rightmost leaf to be root, to maintain structure property 3. Move this new root down (by swapping) until heap-order property is restored (percolate down) (always swap with child that has smaller key) Delete Analysis: * Worst-case time complexity: θ(lg n) * Average-case time complexity: θ(lg n) * Percolating down a node that was previously at the bottom **SKIPPED OVER HEAP CONSTRUCTION BC IT WONT BE ON EXAM**

Identifying worst-case time complexity, best-case time complexity, worst-case space complexity, and best-case space complexity (with each of big-O, big-Ω, and big-Θ of a given segment of code (or function).

na


Kaugnay na mga set ng pag-aaral

La Tour Eiffel Lecture Vocabulaire

View Set

Chapter 08: Concepts of Care for Patients at End of Life

View Set