CPSC-350
Benefits of Empirical Analysis
- It's an intuitive/obvious approach. - In this type of analysis you actually implement the algorithms. - You also use the exact same input to have a controlled variable/input
Kruskal's Algorithm (MST)
- Kruskal's algorithm is ideal for sparse graphs, weighted ,and undirected graphs - Kruskal's is a greedy algorithm which means its always trying to find the edge with the lowest cost/weight - Rather than adding one vertex to the tree at a time, it finds the best edge for connecting 2 trees in a spreading forest of growing MST subtrees 1.) Start with V single-vertex trees (graphs) 2.) Repeatedly combine 2 trees using the shortest edge possible, as long as that edge does not create a cycle 3.) Stop when all the vertices are connected - Time Complexity is O(|E| log |E|) - Space Complexity is O(|E| + |V|)
Benefits of Linked Lists
- Linked Lists are dynamically allocated in memory which means you can add to its size without resizing - Elements can be added or removed without reorganizing the rest of the elements - Only uses the memory it needs
Benefits of Mathematical Analysis
- The algorithm can be defined as a function of the input - Takes all possible inputs into account this way - More cost effective -Not dependent on external variables
Minimum-Spanning Tree (MST)
- The most efficient way of connecting all the vertices together in the graph(but you wont use all the edges - Given a graph, connected or unconnected, a spanning tree of graph G is a subgraph that is a tree and connects all the vertices together - Finds the minimum sum of edge weights in the graph
Benefits of Trees (BST)
- Very efficient for sorting and searching for data (fast look ups) - Runtime is O(log n) because each comparison skips half the trees nodes
Queues
- A Queue is an ADT in which items are inserted at the back of queue and removed from the front (think of a real line of people at a store)
Graphs
- A graph is a set of vertices (nodes) and a set of edges that connect pairs of distinct vertices. That is, at most one edge connects any pair of vertices Undirected Graph: - An undirected simple graph with V vertices has at most v(V-1)/2 edges Directed Graph: - A set of vertices and a collection of directed edges that each connects an ordered pair of vertices (digraphs), consists of vertices connected by directed edges -Application ex: directed graph could be used to represent one-way streets, to represent links between websites, maps for navigation, or course prerequisites Weighted Graph: - A graph with a weight attached to each edge, these can be directed or undirected
How does a single/doubly linked list differ from other data structures / whats its runtime
- A linked list is different because it uses pointer to connect the nodes instead of having a sequential block of memory - Linked lists only use as much memory as it needs, unlike an array which allocates memory itself based on the arrays fixed size - Runtime to access and search is O(n) - Runtime to insert/delete is O(1) at the front and O(n) anywhere else
Stacks
- A stack is an ADT in which items are only inserted or removed from the top of a stack (like a deck of cards) - Stacks are LIFO (Last in First out)
Benefits of Arrays
- Arrays are guaranteed to have sequential blocks of memory next to eachother - To access an element is O(1)
Disadvantages of Arrays
- Arrays are only a fixed size - If full, you need to initialize a new array and copy over data - Not dynamically allocated
Disadvantages of Mathematical Analysis
- Big O notation doesn't give us a qualitative/quantitative answer - Slight room for error but can give worst case scenario
Bubble Sort (Brute Force)
- Bubble Sort works by repeatedly swapping adjacent elements if they are still in the wrong order. - Compare 2 numbers, if the one of the left is bigger then swap it and move one index to the right - It will iterate through the list until everything is sorted - Larger numbers that the end are sorted first - If you want to sort 100 coconuts based on size then use bubble sort because its easy to implement and only a few lines of code - Performance is O(n^2) because quick but bad with large sets of data
Disadvantages of Trees (BST)
- Complicated to implement, especially removing a node - Although efficient, from a development view its really complex - Tree only have efficient logarithmic performance if the tree is balanced(why BSTs are useful) - If the Tree is unbalanced then the tree has linear performance - Trees only work for elements which has a less-than or greater-than relationship(like integers)
Graph Use Cases
- Computer Networks: local area network, internet/web - Circuitry: Printed circuit board/integrated circuit - Logistics networks
Rules of A Good Hashing Function
- Determinism: for a given input(key), it must always generate the same hash value - Uniformity: Should map the expected input as evenly as possible over its output range - Defined Range: Output of a hashing function should have a fixed size - Non-Invertible: Impossible to reverse engineer the input from the output of h(x) (essentially should be a one-way hash)
Big-O and Runtime for HashTables
- For Access/Insert/Delete the average case is O(1) and the worst case is O(n)
Benefits of Graphs
- Graphs can display some sort of relationship whether that's the distance(like on a map) or something else - Allows us not only to represent a collection of items but the connections/relationships between them
Benefits of Hash Tables
- Hash Tables have very fast look up times of O(1) on the average case
Disadvantages of Hash Tables
- Hash tables take up a lot of space(EX: you have a hash table of 100 but only use a bit of it then its O(n) not O(1) - As the hash table gets larger, the performance can potentially diminish
Rules of a BST
- Inserted elements must be sorted. Values that are smaller than the root go to the left while values larger than the root go to the right - Each tree has a root which is the initial entry point. The root is the only node without a parent - Each node has up to two children (A left or right child - An external node( or leaf node) is a node with no children - An internal node is a node with at least one child - A parent is a node with a child - Ancestors include the nodes parent, nodes parent parent, all the way up to root
Insertion Sort (Brute Force)
- Insertion sort works by treating the input as two parts, a sorted and unsorted part, and then repeatedly inserts the next value from the unsorted part into the correct location in the sorted part(on the left) - Variable i is the index of the first unsorted element. Initially the element at index 0 is assumed to be sorted, so it starts at 1 - Variable j keeps track of the index of the current element being inserted into the sorted part. If the current elements is less that the element to the left, the values are swapped - Once the current element is inserted in the correct location in the sorted part, i is incremented to the next element in the unsorted part - If the current element being inserted is smaller than all the elements in the sorted part, then that element will be repeatedly swapped with each sorted element until index 0 is reached - Once all elements in the unsorted part are inserted in the sorted part, the list is then sorted Side Notes: - Runtime is O(n^2), best case is O(n) - Copies are less costly than swaps - For random data, it will run 2x as fast as bubble sort and still faster than selection sort -For pre-sorted data, runs much faster than others and is O(n) -For sorted or nearly sorted inputs, the runtime is O(n) - If you cant use recursion, then use insertion sort
Rules of AVL Tree
- Make sure that the difference is never more than 1 (should stay height balanced) - Am empty tree is height balanced - If T is a non-empty binary tree with TL and TR as left and right subtrees, T is height balanced is the abs(HL-HR) <= 1 (the balance factor) - AVL trees rotate much more frequently than a Red-Black tree, which yields faster searches
Merge Sort (Divide and Conquer)
- Merge Sort works by dividing the list into two halves, recursively sorts each half, and then merges the sorted halves to product a sorted list. The recursive partitioning continues until a list of 1 element is reached, as a list of 1 element is already sorted - Once the lists start merging back together, the elements are being compared and sorted into the right positions of their section until there are two sorted lists/partitions - Requires O(N) of memory space for all the temporary arrays that are made with the partitioning. Side Notes: - Worst/Best/Ave Case is O(n log n) - Downfall is the auxiliary memory used
Disadvantages of Empirical Analysis
- Not optimal for large sets of data. - Not very cost efficient. - The algorithm needs to be implemented so that in of itself costs money. - There are many other variables that can change outcome such as certain hardware and platform differences and different compilers
Disadvantages of Graphs
- Overall graphs have a large memory complexity - If using a matrix, time complexity is O(n^2)
Primm's Algorithm (MST)
- Primm's algorithm is ideal for weighted and directed graphs - Primm's is a greedy algorithm which means its always trying to find the edge with the lowest cost/weight -Primm's algorithm has no cycles 1.) Start with 2 sets, VisitedVertex() and NonVisitedVertex() = empty 2.) Pick an arbitrary vertex and ad it to the tree. Add to VisitedVertex() 3.) Find the edge of the least weight that connects a vertex not in the tree( NonVisitedVertex() ) to the tree. Add this same vertex to the tree and to TreeVertex() 4.) Repeat step 3 until all vertices are added to the tree( TreeVertex() ) - Time Complexity is O(|V|^2)
Quick Sort (Divide and Conquer)
- Quick Sort works by picking an element as a pivot and partitions the given array around the selected pivot. Pivot can be any value but its usually the middle array element. Values in the lower partition are less than or equal to the pivot values - This is then repeated for the smaller arrays until you get partitions with only 1 element which are all sorted - Midpoint = low index + (high index - low index) / 2 = 5 - Once partitioned, each partition needs to be sorted. Quick sort is typically implemented as a recursive algorithm using calls to quicksort to sort the low and high partitions. This recursive process continues until a partition has one or zero elements, and thus is already sorted Side Notes: - Runtime is O(n log n) - Worst Case runtime would be O(n^2) - If the selected pivot is the smallest or largest element, one partition will have just one element, and the other partition will have all other elements. - This algorithm requires slightly more CPU power than a merge sort
Runtime of Trees (BST)(RBT)(AVL)
- Runtime to access an element is O(log n) - Insert O(log n): Starts at the root and compares the element you're inserting with the root node. If the inserting node is less than the root it goes to the left, otherwise it goes to the right. Repeat this comparing process until you find a node with no child. This method of inserting is efficient because it removes half the possible searches each time you compare a node - Search O(log n): We trace a downward path starting at the root and then make comparisons, once we reach a leaf node with no children then we are done with search - Space O(n) - Delete O(log n) - Rotations O(1) - Tree Traversal O(n): Pre-Order is where a node is visited before its descendants. Post-Order is where a node is visited after its descendants. In-Order is where a node is visited after its left subtree and before its right subtree
Selection Sort (Brute Force)
- Selection Sort sorts by repeatedly finding the minimum element from the unsorted part and putting it at the beginning swapping with current min. This algorithm maintains two subarrays in a given array - Selection Sort treats the input as two parts, a sorted and unsorted part which are kept track of by variables i and j - The selection sort algorithm searches the unsorted part of the array for the smallest element; indexSmallest stores the index of the smallest element currently found - Elements at i and indexSmallest are swapped - Indices for the sorted and unsorted parts are updated - the unsorted part is searched again, swapping the smallest element with the element at i - This process repeats until all the elements are sorted Side Notes: - Repeatedly place the next smallest value in its correct position - East to implement, but runtime is O(n^2), but still a lot less swaps than bubble sort - With a list of 50 elements, at least 49 elements will be assigned indexSmallest
Rules of a Red-Black Tree
- Storing the color of a node uses 1 bit of memory - Every node must be either red or black - The root node is always black - A red nodes children cannot also be red - A null child is considered to be a black leaf node - All paths from a node to any null leaf descendant node must have the same number of black nodes
How is an Array different from other data structures / whats its runtime
- We can access any element in an array by adding the data of a space in the array to a pointer - Essentially accessing myArray[2] takes the same amount of time as myArray[999] - Runtime to access is O(1) - Runtime for insert/search/delete is O(n)
How do Graphs differ from other data structures?
- We can use a graph to represent relationships and connectivity. - You can see that two vertices are connected but what does that represent?(are they friends? do they know each other? - If you look at a linked list as storing data, then graphs can show the relationships within the data
How does a tree (BST) differ from other data structures
- When comparing to a linear search like searching through an array, searching through a BST is faster than searching through an array - To add a node into a BST, you must first find the right insertion point for the node by sorting it
AVL Tree (self balancing)
An AVL Tree is a BST with a height balance property and specific operations to rebalance the tree when a given node is inserted or deleted
Trees (BST)
An abstract model of a hierarchical structure which consists of nodes with a parent-child relationship
Hash Tables/Maps
A Hash Table is a data structure used to create an associative array or dictionary. This data structure maps keys to values(aka a hashmap). It consists of a hashing function to compute the index into an array(the index in the array is called a bucket). This data structure makes use of fast access of arrays(O(1)). Hashing is constant O(1)
Red-Black Tree (self balancing)
A Red-Black Tree is a BST with two node types. The two types are red and black nodes. This separation of node types allows us to ensure the tree stays balanced when a given node is removed or added
Big-O Notation
A mathematical way of describing how a function behaves in relation to the size of the input given.
Circular Queue
A queue where the last position is connected back to the first position to make a circle/cycle
ADT (Abstract Data Type)
Ad ADT is defined by its interface (functionality) rather than its implementation
Divide and Conquer Sorting Algorithms
Characteristics: - Ave case is O(n log n) - Harder to implement - Generally fast and efficient - They are recursive Examples: - Merge Sort - Quick Sort
Brute Force Sorting Algorithms
Characteristics: - Ave case is O(n^2) - Easy to implement - Generally not as efficient Examples: - Bubble Sort - Insertion Sort - Selection Sort
Shortest Path Algorithms (Dijkstras & Warshalls)
Dijkstra's Algorithm: - Initializes all vertices' distances to infinity and set the initial node's to zero - Set the start vertex as current and mark all other vertices as unvisited - Create a list of all the unvisited nodes called the unvisited list - For the current node, consider all of its unvisited neighbors and calculate their distances - Compare the calculated distance to the current assigned value and assign the smaller one - When we are done considering all of the neighbors of the current node, mark the current node as visited and remove it from the unvisited list - If the destination vertex has been marked visited or if the smallest distance among the vertices in the unvisited set is infinity then stop you're done - Otherwise, select the unvisited vertex that is marked with the smallest distance, set it as the new current vertex, and return to step 3 - Time complexity with a list based queue will be O(V^2) Warshall's Algorithm: - Generates a IVI x IVI matrix of values representing the shortest path lengths between all vertex pairs in a given graph - Should not have any negative cycles, negative edges are allowed STEPS: 1.) Every entry is assigned to infinity 2.) Each entry representing the path from a vertex to itself is assigned with 0 3.) For each edge from X to Y in the graph, the matrix entry for the path from X to Y is initialized with the edge's weight - Has a space complexity of O(IVI2) - The matrix has a complexity of O(IVI3)
Big-O Notation Formal Definition
For function f(n) & g(n), f(n) = O(g(n)) as long as values c and n0 exist such that f(n) <= cg(n) for all values of n < n0
Mathematical Analysis
Instead of getting a quantifiable answer/amount. Instead this type of analysis determines the runtime of a function based on the input and quality of the code.
Single and Doubly Linked Lists
Linked lists are like arrays but have one or two pointers pointing to either only the next node or the next and previous node.
Common Hashing Functions
Modulo Hash: - Uses the remainder from division of the key by the hash table Mid-Square Hash: - Mid square log hash 10 and log base 2 Multiplicative Hash: - ...
Techniques for analyzing runtime
O(1) - Only a few code instructions/or smt that only runs once. O(log n) - Breaks problem down into smaller pieces by a constant factor. O(n) - A problem with a single loop iterating over n(the input) O(n log n) - Breaks the big problem down in smaller pieces, solved separately, and then re-combined at the end O(n^2) - Double nested loops iterating over n(all pairs) O(n^3) - Triple nested loops O(e^n) - Brute force solutions which try every possible combination(exponential)
Little-O
Same as Big-O but instead f(n) < cg(n)
Little Omega
Same as Big-O but instead f(n) > cg(n)
Big Omega
Same as Big-O but instead f(n) >= cg(n)
Techniques for Handling Hash Collisions (Separate Chaining, Linear Probing & Quadratic Probing)
Separate Chaining: - Each bucket is independent, each cell in the table points to a linked list of entries that map there - Popular approach because its simple to implement - Requires additional memory outside of the table - Time to hash table operation O(1) - Time to access bucket O(1) - In worst case O(n) Linear Probing: - Perform a sequential search to find a free bucket in the hash table, this method doesn't require additional memory - Have a starting value and internal which is the step size, normally is 1 - Goes through buckets in the list until it finds an empty bucket - Then it hashes - Load Factor: N/k where N is the number of elements in the hash map and k is the number of buckets - A load factor grows, the hash tale becomes slightly slower Quadratic Probing: - The step size grows quadratically - Formula is (H+c1*i + c2*i^2) mod (table size) - H is the key - c1 & c2 are programmer-defined constants - Starting with i = 0, each time an empty bucket is not found, i is incremented by 1
Priority Queue
The contents of the queue sorted on some key values. Each item(element) has a priority, and items with higher priority are closer to the front of the queue than items with a lower priority
Sorting Function
The process of converting a list of elements into ascending or descending order
Empirical Analysis
This is when you implement both programs and run them with the same input. This approach is somewhat obvious but not the most efficient solution.
Two ways to Implement/Represent a Graph
To keep track of edges... You can use either an... Adjacency List: - For each vertex, we keep a linked list of all the vertices adjacent to itself - Often implemented using a list of lists - Makes better use of memory - Fast iteration over all edges, but slow lookups for specific edges when compared to a matrix OR Adjacency Matrix: - Keep a V* V array(a matrix) and call it m (this has O(V*V) complexity - If there is an edge between I and J, set m[I][J] = 1 and m[J][I] = 1 - If there's no edge then set to 0 - Quick lookups, but slow to iterate over all edges O(n^2)
Two Ways to Traverse a Graph
You can use either.... Depth-First Search (DFS): - It visits all the vertices and edges of graph G, determines whether G is connected or not, computes the connected components of F, computes a spanning forest of G - A DFS on a graph with n vertices and m edges takes O(n + m) time - Traverse down a tree(graph_ one edge at a time (recursive, useful in cycle detection, and requires less memory) IMPLEMENTATION: - A DFS is implemented with a stack and a few rules: 1.) Begin at the starting vertex, mark as visited, push to stack 2.) if possible, visit an unvisited adjacent vertex, mark as visited, push to stack 3.) if you cant do 2, pop an item off the stack 4.) if you can do 2 or 3, then quit OR Breadth-First Search (BFS): - We examine all the nodes at the same level of the tree before moving onto next level. - Good to find the shortest path - Used for discovering spanning trees - Requires more memory than DFS IMPLEMENTATION: - A BFS is implemented with a queue and a few rules: 1.) Visit the next unvisited vertex(if one exists) that's adjacent to the current vertex. Mark it as visited and insert into the queue 2.) If you cant do rule 1 because there are no more unvisited vertices, remove a vertex from the queue, make it the current vertex, and try to do rule 1 3.) if you cant carry out rule 2 because the queue is empty, you're done