CS161
probability range
0 ≤ P(A) ≤ 1
Algorithms that have worst case O(n^2) time
- selection sort - quick sort - insertion sort - merge sort
5 Steps of Inductive Proof
1. Define some property P(n) that you'll prove by induction (some property is true for 0 and that if that property holds for n, it also holds for n + 1) 2. State that the proof is by induction (say that it's true for all the numbers you care about. (all natural numbers etc)) 3. State and prove your base case. (after already defining what P(n) is in step (1), and now you need to prove P(0)) 4. State and prove the inductive step. (assuming P(k) and then proving P(k+1)) 5. Conclude the proof. T ("completing the proof by induction")
Basic Properties of Binary Heaps
1. Must be a complete binary tree: all levels filled except last is filled left most as possible 2. Must abide by Heap Ordering Property: must be a min or max heap The Heap Ordering Property isconsidered invariants
Steps for Counting line intersections:
1. Sort the lines according to the y-coordinate of their intersection with the line x = a. Number the lines in sorted order 2. Produce the sequence of line numbers sorted according to the y-coordinate of their intersection with the line x = b 3. Count/report inversions in the sequence produced in step 2.
Two examples of mergesort used in lecture
1. counting inversions and listing inversions in O(n log n + k) time by modifying mergesort 2. counting line intersections and listing line intersections in O(n log n + k) time, by reducing the problem to inversion counting
Binary heaps
A binary heap is a binary tree that abides by two extra properties: 1. Complete Tree: all levels are completely filled except last which is filled to the left 2. Must be a min or max heap (min heap - min at root, max heap - max at root) Can be implement via an array or pointers (not to be confused with a linked list, instead you would have a left and right pointer for a vertex)
Binary Tree Data Structure
A binary tree is a tree such that each node has at most two childern(left and right children). A binary search tree is a binary tree that has its data values arranged as follow: each node in the tree contains a value that is larger than each value in its left subtree and smaller that each value in its right subtree.
Sequential Search
A linear search method of finding a targeted value within a list, looking one at a time until a match is found. Think iterating through an array. O(n) Space: O(1)
Linked Lists Data structure
A linked list is a linear data structure, in which the elements are not stored at contiguous memory locations. The elements in a linked list are linked using pointers.
Connected Component (graph)
A part of the graph that is connected only to itself. aka Biconnected Components
Queue Data Structure
A queue stores items in a first-in, first-out (FIFO) order. This often involves two fxns: dequeue, which removes an element from the back, and enqueue which adds an element to the front. Picture a queue like the line outside a busy restaurant. First come, first served.
Binary Search
A search algorithm that starts at the middle of a sorted set of numbers and removes half of the data; this process repeats until the desired value is found or all elements have been eliminated. O(logn) Space: O(1)
Bucket Sort
A sorting algorithm that works by distributing elements of array into a number of bucket, then buckets are sorted individually, is considered a comparison sort algorithm Time Worst: O(nk) Average: Θ(n+k)
Counting Sort
An algorithm for sorting a collection of objects according to keys that are small integers; an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. It is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently. Time Worst: O(n+k) Average: Θ(n+k)
Insertion Sort
An array is split into a sorted and unsorted part and elements from the unsorted part are placed in the correct places in the sorted part. Time Worst: O(n^2) Average: Θ(n^2)
recurrence relation
An equation that is defined in terms of itself. Any polynomial or exponential can be represented by a recurrence.
Inversion
An inversion lists a set of pairs that can be swapped with swapped with one another to get closer to a sorted list. Started from the leftmost element, it would be paired with an element to it's right only if it's smaller, once all pairs listed, we move on to the second rightmost and list and only pair it with an element to the right that is smaller, etc this continues until all elements to the current rightmost are only smaller or if the second to the last is reached.
Array Data Structure
Arrays are defined as the collection of similar types of data items stored at contiguous memory locations. It is one of the simplest data structures where each data element can be randomly accessed by using its index number.
Which algorithms can check for directed components in a graph
BFS and DFS (every call of either algorithm corresponds to the number of connected components)
When to use Bayes Formula?
Bayes theorem is used to find the reverse probabilities if we know the conditional probability of an event. Hence why the reverse conditional probability is in the function.
Insertion for Binary Heaps
Because our heaps are complete trees, we know where the new node must go. We have no choice, it must go in the bottom level, as far left as possible. The new value is placed in this node. We then check if the resulting tree is a heap: the place chosen for the new node guarantees that the structural property will be satisfied, but the ordering property might be be violated. The ordering property is re-established by the `SIFT UP' operation. Fill in left-most nodes of end node first then fill in the right nodes first add leftmost of each subtree (at the very bottom) then compare to parent to abide by heap ordering property
Are binomial coefficients (n choose k) represented as combinations or permutations?
Combinations nCk = n!/k!(n-k)! C = number of combinations n = total number of objects k = number of objects selected
Equivalence Class
Connected components of vertices and their edges
Difference between DFS in directed graphs and DFS in undirected graphs
DFS for a undirected graph takes 2x the edge checks, compared to a directed graph which just checks each edge once
Sorting Terminology
Describes different ways methods of sorting are categorized some example of categories are: Types of Sorting: -Internal Sorting -Unstable Sorting Sort Stability: -Stable Sort -Unstable Sort
Comparison Based Sorting
Determining order by comparing pairs of elements usually with a less than or equal to operator. Ex: - quick sort - heap sort - merge sort - insertion sort - selection sort - bubble sort
A ____ _____ _____ is a directed graph with no cycles
Directed Acylic Graph, or DAG,
Merge Sort
Divide and Conquer paradigm, an array is initially divided in two (or divided into halves until individual elements are established) and then combined into pairs, sets of four, etc. The individual halves are sorted and then merged iteratively until the entire array has been sorted and then merged. Implemented recursively. Time Worst: O(n^2) Average: Θ(n^2)
Theta Notation (Θ)
Encloses the function/run-time from above and below, and is used to for analyzing the average-case complexity of an algorithm Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0} Note: Θ(g) is a set ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0} Note: Θ(g) is a set
Formula for any directed graph
For a directed graph
Biconnected Graph
Graph G has no separation edges and no separation vertices
It is possible to define a ____ graph with some edges undirected, others directed.
mixed
External Sorting:
Performed for massive amounts of data, data that needs to be sorted and cannot be placed in memory all at the same time. data are sorted one small segment at a time and then stored into temporary memory. Ex: - merge sort - tag sort - external radix sort - polyphase merge sort - replacement selection sort
polyphase merge sort
It is a version of mergesort that uses loops instead of recursion to essentially perform the same process, the benefit is that it is more efficient than mergesort b/c it doesn't use recursion. not stable, external sorting algorithm read in groups of m records sort each group write each sorted group to a separate output file iterate: choose f files merge contents of f inputs files into a new output file delete the f input files 1. n records (items) in file 2. m records can fit in memory at once (m n) 3. f input files can be open at once.
Common examples of greedy Algorithms
Huffman Encoding Dijkstra's Algorithm fractional knapsack problem (questionable whether algorithm works to find optimal soln here)
Stable Sorting
If the same element is present multiple times, then they retain their original ordering relative to one another after the data is sorted. Ex: - merge sort - insertion sort - bubble sort
Hash Table Data Structure
Implements an associative array or dictionary (essential ly unique key value pairs), and uses a hash function to compure an index/hash-code into an array of buckets from which the value can be found. Usually there is the concern of collisions since hash functions are imperfect and generates the same index for multiple keys.
Heapify
Is utilized when binary heap doesn't abide by the max or min heap property or is the process of creating a heap data structure from a binary tree represented using an array. Heapify uses recursion A process in Minimum Heap Trees where the new node is switched up until min heap state is achieved. A process in Maximum Heap Trees where the new node is switched up until max heap state is achieved. The primary use of such a data structure is to implement a priority queue.
A _____ in a directed graph is a pair of edges with the same endpoints and the same direction.
multiedge
The sorting algorithm ______ is not optimal if we look at exact number of comparisons
Merge sort (because MergeSort requires 8 comparisons to sort 5 items)
The lower bound of these two sorting algorithms ______ & ______ is asymptotically tight (i.e., cannot be improved asymptotically)
MergeSort , HeapSort
The worst-case running time of _______ and _________ on an input of size n is O(n log n).
MergeSort , HeapSort
Strassen's method
Method of matrix multiplication that uses divide-and-conquer (recursion) to be more efficient since it removes one recursive call. So multiplying 2x2 matrices requires 7 scalar multiplication instead of 8 which would usually be required. T(n) = 7T(n/2) + O(n^2) a = 7, b = 2, k = 2 therefore, O(a ^(n/b)*n^k)
Radix Sort
Non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts. Time Worst: O(nk) = O(d(n + b)). d= # fields we're sorting b = size of each range n = number of items Average: Θ(nk)
For counting line intersections: Asymptotically how long does it take to sort the lines according to the y-coordinate of their intersection with the line x = a. Number of the lines in sorted order (step 1)
O(n log n) time
For counting line intersections: Produce the sequence of line numbers sorted according to the y-coordinate of their intersection with the line x = b (step 2)
O(n log n) time
the median of n numbers can be found in _____time
O(n)
Conditional Probability
P(A | B) = P(A∩B) / P(B) The probability of event A occurring given event B has already occurred is equal to the probability that event A AND event B occur divided by the probability of event B
Bayes Formula
P(A | B) = P(B | A) ⋅ P(A) / P(B) The probability of event A occurring given event B has already occurred is the product of the probability of event B given event A and the probability of event A divided by the probability of event B
Rule of Complementary Events
P(A') + P(A) = 1 The compliment A' = 1 - P(A) P(A) is the probability of event A occurring
Disjoint Events
P(A∩B) = 0 Events that cannot occur simultaneously (mutually exclusive)
independent events
P(A∩B) = P(A) ⋅ P(B) The outcome of one event does not affect the outcome of the second event The probability of event a AND event b occurring is the product of the probability of the individual events
Rule of Addition
P(A∪B) = P(A) + P(B) - P(A∩B) Probability of event A OR event B occurring: Where the union of event A and event b is the sum of the probabilities of A and B while subtracting any overlapping probability between the two events that would be double counted (the instance in which both events occur)
Quick Sort
PIVOT: a sorting technique that moves elements around a pivot and recursively sorts the elements to the left and the right of the pivot. left are smaller, right are larger. 1. select pivot (strive for middle), move to back 2. determine first element from the left that is smaller pivot determine first element from the right that is smaller than pivot 3. swap item from left with item from right until element from left > item from right 4. swap item from left with pivot 5. repeat with right sub-array, selecting a pivot and swapping Time Worst : O(n^2) Average: Θ(nlogn)
Master Method
Recursion in the form: T(n) <= a * T(n/b) + O(n^d) logical meaning of values: a - represents speed of subproblems peliferation b - represents speed of work-per-subproblem reduction
Selection Sort
Repeatedly finds the minimum element from unsorted part and puts it in the beginning (assumes ascending order). This process maintains two subarrays, an unsorted array and a sorted array. Time Worst: O(n^2) Average: Θ(n^2)
Big-Oh Notation
Represents the upper bound of the run-time of an algorithm, aka worst-case complexity O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
Heap Sort
Requires only one spot of extra space. Works like an improved selection sort. It divides its input into a sorted and unsorted region, and iteratively shrinks the unsorted region by extracting the smallest element and moving it into the sorted region. It uses a heap structure to maintain the unstructured portion to more quickly find the minimum. unstable, comparison-based sorting Time Worst: O(n logn) Average: Θ(n logn)
Priority Queues
Same thing as a queue just with three additional properties: 1. every item has a priority associated with it 2. an element with high priority is dequeued before an element with low priority 3. If two elements have the same priority they are removed according to their order in the queue
Example of the difference b/w Permutations & Combinations:
Suppose there are 11 runners in a race. Most of the time, all 11 runners aren't given places, just the first 3. There are 11 possibilities for first place, 10 possibilities for second place, and 9 possibilities for third place, or 990 permutations in the first 3 places. If all runners were given places, that would lead to 11! places, or 39,916,800 possible combinations.
sift down for Binary Heaps
The DELETE operation is based on `SIFT DOWN'. SIFT DOWN starts with a value in any node. It moves the value down the tree by successively exchanging the value with the smaller of its two children. The operation continues until the value reaches a position where it is less than both its children, or, failing that, until it reaches a leaf. Complexity = O(height) = O(logN)
sift up for Binary Heaps
The `SIFT UP' operation starts with a value in a leaf node. It moves the value up the path towards the root by successively exchanging the value with the value in the node above. The operation continues until the value reaches a position where it is less than its parent, or, failing that, until it reaches the root node. Complexity = O(height) = O(logN)
A _____ in an undirected graph is a pair of edges with the same endpoints.
multiedge
Little-Oh Notation (o)
The not asymptotically tight upper bound of an algorithm (removes equality in Big-Oh) o(g(n)) = { f(n): for any positive constant c, there exists a positive contact n0 such that 0 ≤ f(n) < cg(n) for all n ≥ n0 }
Combinations
The number of different ways in which objects can be arranged without regard to order, usually in smaller sets than a particular set. (Smaller than permutations)
Permutation
The number of ways objects can be ordered that can be created from a particular set.
nth Harmonic Number
The sum of reciprocals of the first n natural numbers: Hn = Sum from k=1 to n of 1/k
replacement selection sort
a sub-section or large array is stored in memory and the least element is selected, and placed in an already sorted array, then the next element in the larger array is place in sub-array, then the least that is larger than the the element in the already sorted array is then added to the sorted array, this process repeats itself until no more elements are larger. The sorted array is results from this is considered a run. Then the next sub array in the larger array goes through the same process. When the entire large array has been sorted into separate sub-arrays then the subarrays are merged into larger groups and sorted, this repeats until all the contents are merged Makes heavy use of min-heaps while building initial runs and during merging phase
Deletion for Binary Heaps
binary heaps are designed to give access to the min/max element at the top so we're only allowed to delete that node. move most rightmost node of a subtree up to be the root and then swap top to bottom, swapping node with smaller child (this process convientely maintains the correct shape while attempting to establish the heap ordering property) 1. delete root (maintains complete tree property, shape) 2. move bottom rightmost value to root 2. swap down following the least child until node is in the right position
_____ ________ is optimal with respect to worst-case performance.
binary search
Any algorithm for locating an item in an array of size n using only comparisons must perform at least __________ comparisons in the worst case
the floor of (log base 2 of n) +1
Factorials
all possible outcomes for an event
greedy algorithm
an algorithm that follows problem solving heuristic of making optimal choices at each stage. Hopefully finds the global optimum.
separation edge
an edge whose removal disconnects a graph
For some purposes, we can simulate an undirected edge using a pair of _____ directed edges
antiparallel
Since the decision tree is a binary tree with n nodes, the depth is at least ______.
the floor of log base 2 of n
Any algorithm that sorts a list or array of size n using comparisons can be modeled as a ____ ____.
decision tree
Any algorithm that that searches for an item x in an array A of size n by comparing entries in A against x can be modeled as a _____ _____.
decision tree
Any algorithm for searching an array of size n can be modeled by a _____________________________________________.
decision tree with at least n nodes
Link Relation
definition of biconnected components The link relation on the edges of a graph is an equivalence relation
G is _____ if for any two vertices u, v ∈ V(G)
biconnected (or 2-connected) In order to be biconnected, a graph must be connected.
In an undirected graph the ___ of a vertex v is the number of edges incident on v
degree
The worst-case number of comparisons for the algorithm is the ____________________________-.
depth of the decision tree + 1. (Remember, root has depth 0).
In a ______ or _____the edges have directions
directed graph, digraph
The link components of a connected graph G are the ____ ___ of edges with respect to the link relation
equivalence classes
A _____ ____ is an undirected graph with an edge between every pair of vertices.
complete graph A complete graph on n vertices has m = n choose 2 edges
biconnected component analysis can be done in the same asymptotic time as ____ _____ _____.
connected component analysis
A ____ is a set of objects, called ____, and a set of pairs of objects, called _____.
graph , vertices, edges
To count _____: - Run a sorting algorithm - Every time data is rearranged, keep track of how many inversions are being removed. In principle, we can use any sorting algorithm to count _______. Mergesort works particularly nicely.
inversions, inversions
A ______ is an edge with both endpoints the same
loop (sometimes called a self-loop)
To list inversions using a slight modification to _____ we can report inversions in _____ time.
mergesort, O(n log n + k) where k is the number of inversions
In a directed graph the _____
indegree
An _____ in a sequence or list is a pair of items such that the larger one precedes the smaller one.
inversion
Sorting is the process of removing ______.
inversions
In a list of size n, there can be as many as ______ inversions.
n choose k combinations nCk = n!/k!(n-k)! C = number of combinations n = total number of objects k = number of objects selected
Combination Formula
nCr = n!/r!(n-r)! C = number of combinations n = total number of objects r = number of objects selected
Permutation Formula
nPr = n!/(n-r)! P = # of possible different arrangements n = total number of objects r = the number of objects selected
In an undirected graph the ____ of a vertex is the number of outgoing edges.
outdegree
The reporting algorithm (aka listing inversions) is an example of an ____-____ algorithm.
output-sensitive The performance of the algorithm depends on the size of the output as well as the size of the input.
Sorting is the problem of finding a particular distinguished ______ of a list.
permutation
A ________ of a sequence of items is a reordering of the sequence. A sequence of n items has ____ distinct _____.
permutation, n!, permutations
A ______ ______ is an equation that uses recursion to relate terms in a sequence or elements in an array. It is a way to define a sequence or array in terms of itself.
recurrence relation
Any comparison-based algorithm for sorting a list of size n must perform at least ______ comparisons in the worst case.
roof of log base 2 of n factorial
In an ____ ____ the edges do not have directions.
undirected graph
separation vertex
vertex whose removal disconnects a graph
Any comparison-based algorithm for sorting a list of size n must perform at least ________ comparisons in the worst case.
Ω(n log n)
I Comparison-based sorting has lower bound of ________ comparisons.
Ω(n log n)
A directed graph G is ____ _____ if there is a path from every vertex in G to every other vertex in G.
strongly connected
Formula to determine the degree of a vertex
sum[deg(v)] = 2m
Asymptotic Notation
the classification of runtime complexity that uses functions that indicate only the growth rate of a bounding function
Decision Tree
- Each node is labeled with an integer ∈ {0 . . . n − 1}. - A node labeled i represents a 3-way comparison between x and A[i]. - The left subtree of a node labeled i describes the decision tree for what happens if x < A[i]. - The right subtree of a node labeled i describes the decision tree for what happens if x > A[i]. A = array A[i] = particular element in the array x = item in array
Internal Sorting Algorithms
- heap sort - bubble sort - selection sort - quick sort - insertion sort
Stable Sorting Algorithms
- merge sort - insertion sort - bubble sort
External Sorting Algorithms
- merge sort - tag sort - external radix sort - polyphase merge sort - replacement selection sort
Comparison Based Sorting Algorithms
- quick sort - heap sort - merge sort - insertion sort - selection sort - bubble sort
Unstable Sorting Algorithms
- quick sort - heap sort - shell sort
Address-Calculation Sorting
A sorting algorithm which uses knowledge of the domain of items to calculate the position of each item in the sorted array. Ex: - counting sort - bucket sort - radix sort
Stack Data Structure
A stack involves elements stored linearily in the order Fist-In-Last-Out (FILO or LIFO) principle. Often implemented in the form of a class it has two fxns to insert and remove elements. POP to remove at the top, and PUSH to insert into the top.
Unstable Sorting
After sorting elements of the same kind are in a different order relative to one another compared to before all elements were sorted. Ex: - quick sort - heap sort - shell sort
Internal Sorting:
All data is put in internal memory and then sorted. Ex: - heap sort - bubble sort - selection sort - quick sort - insertion sort - shell sort
Properties required to implement a greedy algorithm
Greedy Choice Property - Algorithm never reconsiders it's choices b/c it doesn't need to Optimal Substructure Property - Overall optimal solution is comprised of optimal solutions to subproblems
If the median of n numbers can be found in ____ time, then this means that the maxima-set algorithm presented in the text runs in _____ time because...
O(n) , O(n log n) , the preprocessing step involves sorting the points lexicographically (alphabetical order) prior to calling the MaximaSet algorithm on S therefore it takes O(n log n) for preprocessing and O(1) for each recursive call to find the middle of the list.
Dictionary Data Structure
Stores a group of objects as key value pairs. When the dictionary is presented with a key then it returns an associated value. Dictionaries are generally unordered. This is also known as a hash, a map, or hashmap depending on programming language
Biconnected Components
Subgraphs of a larger graph, count up the number of biconnected components by identifying the separation vertices/separation edges. a separation edge would be it's own biconnected component. subgraphs created from the removal of a separation vertex would be biconnected components a subgraph that results from the removal of a separation edge also is a biconnected component
Omega-Notation (Ω)
The lower bound of run-time of an algorithm, aka Best Case complexity Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
T/F: greedy algorithms are faster than dynamic programming for optimization.
True
Define a link relation on the edges of a graph G:
Two edges e and f are linked if e = f or if G has a simple cycle containing e and f .
When to use recursion
When backsteps need to be taken, like exploring a maze or a tree (it's more difficult to think about how a loop would be able to create this funcitonality)
To list inversions using a slight modification to _____ we can count inversions in _____ time.
mergesort, O(n log n)
A ___ ____ is an edge whose removal causes G to become disconnected.
separation edge
A ___ ____ is a vertex whose removal causes G to become disconnected.
separation vertex
A _____ ____ is a graph that has no loops and no multiedges.
simple graph
How is the improved bicomponent algorithm an improvement over the preliminary bicomponent algorithm?
the preliminary algorithm checks every edge that's connected to a cycle, the improved algorithm takes note of which edges contribute to a cycle and