CS161

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

External Sorting Algorithms

- merge sort - tag sort - external radix sort - polyphase merge sort - replacement selection sort

Comparison Based Sorting Algorithms

- quick sort - heap sort - merge sort - insertion sort - selection sort - bubble sort

Unstable Sorting Algorithms

- quick sort - heap sort - shell sort

Algorithms that have worst case O(n^2) time

- selection sort - quick sort - insertion sort - merge sort

probability range

0 ≤ P(A) ≤ 1

Decision Tree

- Each node is labeled with an integer ∈ {0 . . . n − 1}. - A node labeled i represents a 3-way comparison between x and A[i]. - The left subtree of a node labeled i describes the decision tree for what happens if x < A[i]. - The right subtree of a node labeled i describes the decision tree for what happens if x > A[i]. A = array A[i] = particular element in the array x = item in array

Stable Sorting Algorithms

- merge sort - insertion sort - bubble sort

Binary heaps

A binary heap is a binary tree that abides by two extra properties: 1. Complete Tree: all levels are completely filled except last which is filled to the left 2. Must be a min or max heap (min heap - min at root, max heap - max at root) Can be implement via an array or pointers (not to be confused with a linked list, instead you would have a left and right pointer for a vertex)

Binary Tree Data Structure

A binary tree is a tree such that each node has at most two childern(left and right children). A binary search tree is a binary tree that has its data values arranged as follow: each node in the tree contains a value that is larger than each value in its left subtree and smaller that each value in its right subtree.

Sequential Search

A linear search method of finding a targeted value within a list, looking one at a time until a match is found. Think iterating through an array. O(n) Space: O(1)

Linked Lists Data structure

A linked list is a linear data structure, in which the elements are not stored at contiguous memory locations. The elements in a linked list are linked using pointers.

Connected Component (graph)

A part of the graph that is connected only to itself. aka Biconnected Components

Define a link relation on the edges of a graph G:

Two edges e and f are linked if e = f or if G has a simple cycle containing e and f .

Any algorithm that sorts a list or array of size n using comparisons can be modeled as a ____ ____.

decision tree

Link Relation

definition of biconnected components The link relation on the edges of a graph is an equivalence relation

In an undirected graph the ___ of a vertex v is the number of edges incident on v

degree

In a ______ or _____the edges have directions

directed graph, digraph

The link components of a connected graph G are the ____ ___ of edges with respect to the link relation

equivalence classes

A ______ ______ is an equation that uses recursion to relate terms in a sequence or elements in an array. It is a way to define a sequence or array in terms of itself.

recurrence relation

5 Steps of Inductive Proof

1. Define some property P(n) that you'll prove by induction (some property is true for 0 and that if that property holds for n, it also holds for n + 1) 2. State that the proof is by induction (say that it's true for all the numbers you care about. (all natural numbers etc)) 3. State and prove your base case. (after already defining what P(n) is in step (1), and now you need to prove P(0)) 4. State and prove the inductive step. (assuming P(k) and then proving P(k+1)) 5. Conclude the proof. T ("completing the proof by induction")

Basic Properties of Binary Heaps

1. Must be a complete binary tree: all levels filled except last is filled left most as possible 2. Must abide by Heap Ordering Property: must be a min or max heap The Heap Ordering Property isconsidered invariants

Steps for Counting line intersections:

1. Sort the lines according to the y-coordinate of their intersection with the line x = a. Number the lines in sorted order 2. Produce the sequence of line numbers sorted according to the y-coordinate of their intersection with the line x = b 3. Count/report inversions in the sequence produced in step 2.

Two examples of mergesort used in lecture

1. counting inversions and listing inversions in O(n log n + k) time by modifying mergesort 2. counting line intersections and listing line intersections in O(n log n + k) time, by reducing the problem to inversion counting

Queue Data Structure

A queue stores items in a first-in, first-out (FIFO) order. This often involves two fxns: dequeue, which removes an element from the back, and enqueue which adds an element to the front. Picture a queue like the line outside a busy restaurant. First come, first served.

Binary Search

A search algorithm that starts at the middle of a sorted set of numbers and removes half of the data; this process repeats until the desired value is found or all elements have been eliminated. O(logn) Space: O(1)

recurrence relation

An equation that is defined in terms of itself. Any polynomial or exponential can be represented by a recurrence.

Which algorithms can check for directed components in a graph

BFS and DFS (every call of either algorithm corresponds to the number of connected components)

Are binomial coefficients (n choose k) represented as combinations or permutations?

Combinations nCk = n!/k!(n-k)! C = number of combinations n = total number of objects k = number of objects selected

Equivalence Class

Connected components of vertices and their edges

Difference between DFS in directed graphs and DFS in undirected graphs

DFS for a undirected graph takes 2x the edge checks, compared to a directed graph which just checks each edge once

Sorting Terminology

Describes different ways methods of sorting are categorized some example of categories are: Types of Sorting: -Internal Sorting -Unstable Sorting Sort Stability: -Stable Sort -Unstable Sort

A ____ _____ _____ is a directed graph with no cycles

Directed Acylic Graph, or DAG,

Formula for any directed graph

For a directed graph

Biconnected Graph

Graph G has no separation edges and no separation vertices

Properties required to implement a greedy algorithm

Greedy Choice Property - Algorithm never reconsiders it's choices b/c it doesn't need to Optimal Substructure Property - Overall optimal solution is comprised of optimal solutions to subproblems

Common examples of greedy Algorithms

Huffman Encoding Dijkstra's Algorithm fractional knapsack problem (questionable whether algorithm works to find optimal soln here)

Stable Sorting

If the same element is present multiple times, then they retain their original ordering relative to one another after the data is sorted. Ex: - merge sort - insertion sort - bubble sort

Hash Table Data Structure

Implements an associative array or dictionary (essential ly unique key value pairs), and uses a hash function to compure an index/hash-code into an array of buckets from which the value can be found. Usually there is the concern of collisions since hash functions are imperfect and generates the same index for multiple keys.

The sorting algorithm ______ is not optimal if we look at exact number of comparisons

Merge sort (because MergeSort requires 8 comparisons to sort 5 items)

The lower bound of these two sorting algorithms ______ & ______ is asymptotically tight (i.e., cannot be improved asymptotically)

MergeSort , HeapSort

The worst-case running time of _______ and _________ on an input of size n is O(n log n).

MergeSort , HeapSort

Strassen's method

Method of matrix multiplication that uses divide-and-conquer (recursion) to be more efficient since it removes one recursive call. So multiplying 2x2 matrices requires 7 scalar multiplication instead of 8 which would usually be required. T(n) = 7T(n/2) + O(n^2) a = 7, b = 2, k = 2 therefore, O(a ^(n/b)*n^k)

If the median of n numbers can be found in ____ time, then this means that the maxima-set algorithm presented in the text runs in _____ time because...

O(n) , O(n log n) , the preprocessing step involves sorting the points lexicographically (alphabetical order) prior to calling the MaximaSet algorithm on S therefore it takes O(n log n) for preprocessing and O(1) for each recursive call to find the middle of the list.

Conditional Probability

P(A | B) = P(A∩B) / P(B) The probability of event A occurring given event B has already occurred is equal to the probability that event A AND event B occur divided by the probability of event B

Bayes Formula

P(A | B) = P(B | A) ⋅ P(A) / P(B) The probability of event A occurring given event B has already occurred is the product of the probability of event B given event A and the probability of event A divided by the probability of event B

Rule of Complementary Events

P(A') + P(A) = 1 The compliment A' = 1 - P(A) P(A) is the probability of event A occurring

Disjoint Events

P(A∩B) = 0 Events that cannot occur simultaneously (mutually exclusive)

Master Method

Recursion in the form: T(n) <= a * T(n/b) + O(n^d) logical meaning of values: a - represents speed of subproblems peliferation b - represents speed of work-per-subproblem reduction

Selection Sort

Repeatedly finds the minimum element from unsorted part and puts it in the beginning (assumes ascending order). This process maintains two subarrays, an unsorted array and a sorted array. Time Worst: O(n^2) Average: Θ(n^2)

Big-Oh Notation

Represents the upper bound of the run-time of an algorithm, aka worst-case complexity O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }

Heap Sort

Requires only one spot of extra space. Works like an improved selection sort. It divides its input into a sorted and unsorted region, and iteratively shrinks the unsorted region by extracting the smallest element and moving it into the sorted region. It uses a heap structure to maintain the unstructured portion to more quickly find the minimum. unstable, comparison-based sorting Time Worst: O(n logn) Average: Θ(n logn)

Priority Queues

Same thing as a queue just with three additional properties: 1. every item has a priority associated with it 2. an element with high priority is dequeued before an element with low priority 3. If two elements have the same priority they are removed according to their order in the queue

Dictionary Data Structure

Stores a group of objects as key value pairs. When the dictionary is presented with a key then it returns an associated value. Dictionaries are generally unordered. This is also known as a hash, a map, or hashmap depending on programming language

Biconnected Components

Subgraphs of a larger graph, count up the number of biconnected components by identifying the separation vertices/separation edges. a separation edge would be it's own biconnected component. subgraphs created from the removal of a separation vertex would be biconnected components a subgraph that results from the removal of a separation edge also is a biconnected component

Example of the difference b/w Permutations & Combinations:

Suppose there are 11 runners in a race. Most of the time, all 11 runners aren't given places, just the first 3. There are 11 possibilities for first place, 10 possibilities for second place, and 9 possibilities for third place, or 990 permutations in the first 3 places. If all runners were given places, that would lead to 11! places, or 39,916,800 possible combinations.

sift up for Binary Heaps

The `SIFT UP' operation starts with a value in a leaf node. It moves the value up the path towards the root by successively exchanging the value with the value in the node above. The operation continues until the value reaches a position where it is less than its parent, or, failing that, until it reaches the root node. Complexity = O(height) = O(logN)

Omega-Notation (Ω)

The lower bound of run-time of an algorithm, aka Best Case complexity Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }

Little-Oh Notation (o)

The not asymptotically tight upper bound of an algorithm (removes equality in Big-Oh) o(g(n)) = { f(n): for any positive constant c, there exists a positive contact n0 such that 0 ≤ f(n) < cg(n) for all n ≥ n0 }

Combinations

The number of different ways in which objects can be arranged without regard to order, usually in smaller sets than a particular set. (Smaller than permutations)

Permutation

The number of ways objects can be ordered that can be created from a particular set.

nth Harmonic Number

The sum of reciprocals of the first n natural numbers: Hn = Sum from k=1 to n of 1/k

When to use recursion

When backsteps need to be taken, like exploring a maze or a tree (it's more difficult to think about how a loop would be able to create this funcitonality)

replacement selection sort

a sub-section or large array is stored in memory and the least element is selected, and placed in an already sorted array, then the next element in the larger array is place in sub-array, then the least that is larger than the the element in the already sorted array is then added to the sorted array, this process repeats itself until no more elements are larger. The sorted array is results from this is considered a run. Then the next sub array in the larger array goes through the same process. When the entire large array has been sorted into separate sub-arrays then the subarrays are merged into larger groups and sorted, this repeats until all the contents are merged Makes heavy use of min-heaps while building initial runs and during merging phase

Factorials

all possible outcomes for an event

separation edge

an edge whose removal disconnects a graph

For some purposes, we can simulate an undirected edge using a pair of _____ directed edges

antiparallel

G is _____ if for any two vertices u, v ∈ V(G)

biconnected (or 2-connected) In order to be biconnected, a graph must be connected.

A _____ ____ is an undirected graph with an edge between every pair of vertices.

complete graph A complete graph on n vertices has m = n choose 2 edges

biconnected component analysis can be done in the same asymptotic time as ____ _____ _____.

connected component analysis

A ____ is a set of objects, called ____, and a set of pairs of objects, called _____.

graph , vertices, edges

In a directed graph the _____

indegree

An _____ in a sequence or list is a pair of items such that the larger one precedes the smaller one.

inversion

Sorting is the process of removing ______.

inversions

To count _____: - Run a sorting algorithm - Every time data is rearranged, keep track of how many inversions are being removed. In principle, we can use any sorting algorithm to count _______. Mergesort works particularly nicely.

inversions, inversions

A ______ is an edge with both endpoints the same

loop (sometimes called a self-loop)

To list inversions using a slight modification to _____ we can report inversions in _____ time.

mergesort, O(n log n + k) where k is the number of inversions

It is possible to define a ____ graph with some edges undirected, others directed.

mixed

A _____ in a directed graph is a pair of edges with the same endpoints and the same direction.

multiedge

A _____ in an undirected graph is a pair of edges with the same endpoints.

multiedge

In an undirected graph the ____ of a vertex is the number of outgoing edges.

outdegree

Any comparison-based algorithm for sorting a list of size n must perform at least ______ comparisons in the worst case.

roof of log base 2 of n factorial

A ___ ____ is an edge whose removal causes G to become disconnected.

separation edge

A ___ ____ is a vertex whose removal causes G to become disconnected.

separation vertex

A _____ ____ is a graph that has no loops and no multiedges.

simple graph

A directed graph G is ____ _____ if there is a path from every vertex in G to every other vertex in G.

strongly connected

Formula to determine the degree of a vertex

sum[deg(v)] = 2m

Asymptotic Notation

the classification of runtime complexity that uses functions that indicate only the growth rate of a bounding function

Any algorithm for locating an item in an array of size n using only comparisons must perform at least __________ comparisons in the worst case

the floor of (log base 2 of n) +1

Since the decision tree is a binary tree with n nodes, the depth is at least ______.

the floor of log base 2 of n

How is the improved bicomponent algorithm an improvement over the preliminary bicomponent algorithm?

the preliminary algorithm checks every edge that's connected to a cycle, the improved algorithm takes note of which edges contribute to a cycle and

In an ____ ____ the edges do not have directions.

undirected graph

separation vertex

vertex whose removal disconnects a graph

Any comparison-based algorithm for sorting a list of size n must perform at least ________ comparisons in the worst case.

Ω(n log n)

I Comparison-based sorting has lower bound of ________ comparisons.

Ω(n log n)

sift down for Binary Heaps

The DELETE operation is based on `SIFT DOWN'. SIFT DOWN starts with a value in any node. It moves the value down the tree by successively exchanging the value with the smaller of its two children. The operation continues until the value reaches a position where it is less than both its children, or, failing that, until it reaches a leaf. Complexity = O(height) = O(logN)

the median of n numbers can be found in _____time

O(n)

T/F: greedy algorithms are faster than dynamic programming for optimization.

True

Internal Sorting Algorithms

- heap sort - bubble sort - selection sort - quick sort - insertion sort

Bucket Sort

A sorting algorithm that works by distributing elements of array into a number of bucket, then buckets are sorted individually, is considered a comparison sort algorithm Time Worst: O(nk) Average: Θ(n+k)

Address-Calculation Sorting

A sorting algorithm which uses knowledge of the domain of items to calculate the position of each item in the sorted array. Ex: - counting sort - bucket sort - radix sort

Stack Data Structure

A stack involves elements stored linearily in the order Fist-In-Last-Out (FILO or LIFO) principle. Often implemented in the form of a class it has two fxns to insert and remove elements. POP to remove at the top, and PUSH to insert into the top.

Unstable Sorting

After sorting elements of the same kind are in a different order relative to one another compared to before all elements were sorted. Ex: - quick sort - heap sort - shell sort

Internal Sorting:

All data is put in internal memory and then sorted. Ex: - heap sort - bubble sort - selection sort - quick sort - insertion sort - shell sort

Counting Sort

An algorithm for sorting a collection of objects according to keys that are small integers; an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. It is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently. Time Worst: O(n+k) Average: Θ(n+k)

Insertion Sort

An array is split into a sorted and unsorted part and elements from the unsorted part are placed in the correct places in the sorted part. Time Worst: O(n^2) Average: Θ(n^2)

Inversion

An inversion lists a set of pairs that can be swapped with swapped with one another to get closer to a sorted list. Started from the leftmost element, it would be paired with an element to it's right only if it's smaller, once all pairs listed, we move on to the second rightmost and list and only pair it with an element to the right that is smaller, etc this continues until all elements to the current rightmost are only smaller or if the second to the last is reached.

Array Data Structure

Arrays are defined as the collection of similar types of data items stored at contiguous memory locations. It is one of the simplest data structures where each data element can be randomly accessed by using its index number.

When to use Bayes Formula?

Bayes theorem is used to find the reverse probabilities if we know the conditional probability of an event. Hence why the reverse conditional probability is in the function.

Insertion for Binary Heaps

Because our heaps are complete trees, we know where the new node must go. We have no choice, it must go in the bottom level, as far left as possible. The new value is placed in this node. We then check if the resulting tree is a heap: the place chosen for the new node guarantees that the structural property will be satisfied, but the ordering property might be be violated. The ordering property is re-established by the `SIFT UP' operation. Fill in left-most nodes of end node first then fill in the right nodes first add leftmost of each subtree (at the very bottom) then compare to parent to abide by heap ordering property

Comparison Based Sorting

Determining order by comparing pairs of elements usually with a less than or equal to operator. Ex: - quick sort - heap sort - merge sort - insertion sort - selection sort - bubble sort

Merge Sort

Divide and Conquer paradigm, an array is initially divided in two (or divided into halves until individual elements are established) and then combined into pairs, sets of four, etc. The individual halves are sorted and then merged iteratively until the entire array has been sorted and then merged. Implemented recursively. Time Worst: O(n^2) Average: Θ(n^2)

Theta Notation (Θ)

Encloses the function/run-time from above and below, and is used to for analyzing the average-case complexity of an algorithm Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0} Note: Θ(g) is a set ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0} Note: Θ(g) is a set

Heapify

Is utilized when binary heap doesn't abide by the max or min heap property or is the process of creating a heap data structure from a binary tree represented using an array. Heapify uses recursion A process in Minimum Heap Trees where the new node is switched up until min heap state is achieved. A process in Maximum Heap Trees where the new node is switched up until max heap state is achieved. The primary use of such a data structure is to implement a priority queue.

polyphase merge sort

It is a version of mergesort that uses loops instead of recursion to essentially perform the same process, the benefit is that it is more efficient than mergesort b/c it doesn't use recursion. not stable, external sorting algorithm read in groups of m records sort each group write each sorted group to a separate output file iterate: choose f files merge contents of f inputs files into a new output file delete the f input files 1. n records (items) in file 2. m records can fit in memory at once (m n) 3. f input files can be open at once.

Quick Sort

PIVOT: a sorting technique that moves elements around a pivot and recursively sorts the elements to the left and the right of the pivot. left are smaller, right are larger. 1. select pivot (strive for middle), move to back 2. determine first element from the left that is smaller pivot determine first element from the right that is smaller than pivot 3. swap item from left with item from right until element from left > item from right 4. swap item from left with pivot 5. repeat with right sub-array, selecting a pivot and swapping Time Worst : O(n^2) Average: Θ(nlogn)

Radix Sort

Non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts. Time Worst: O(nk) = O(d(n + b)). d= # fields we're sorting b = size of each range n = number of items Average: Θ(nk)

For counting line intersections: Asymptotically how long does it take to sort the lines according to the y-coordinate of their intersection with the line x = a. Number of the lines in sorted order (step 1)

O(n log n) time

For counting line intersections: Produce the sequence of line numbers sorted according to the y-coordinate of their intersection with the line x = b (step 2)

O(n log n) time

independent events

P(A∩B) = P(A) ⋅ P(B) The outcome of one event does not affect the outcome of the second event The probability of event a AND event b occurring is the product of the probability of the individual events

Rule of Addition

P(A∪B) = P(A) + P(B) - P(A∩B) Probability of event A OR event B occurring: Where the union of event A and event b is the sum of the probabilities of A and B while subtracting any overlapping probability between the two events that would be double counted (the instance in which both events occur)

External Sorting:

Performed for massive amounts of data, data that needs to be sorted and cannot be placed in memory all at the same time. data are sorted one small segment at a time and then stored into temporary memory. Ex: - merge sort - tag sort - external radix sort - polyphase merge sort - replacement selection sort

greedy algorithm

an algorithm that follows problem solving heuristic of making optimal choices at each stage. Hopefully finds the global optimum.

Deletion for Binary Heaps

binary heaps are designed to give access to the min/max element at the top so we're only allowed to delete that node. move most rightmost node of a subtree up to be the root and then swap top to bottom, swapping node with smaller child (this process convientely maintains the correct shape while attempting to establish the heap ordering property) 1. delete root (maintains complete tree property, shape) 2. move bottom rightmost value to root 2. swap down following the least child until node is in the right position

_____ ________ is optimal with respect to worst-case performance.

binary search

Any algorithm that that searches for an item x in an array A of size n by comparing entries in A against x can be modeled as a _____ _____.

decision tree

Any algorithm for searching an array of size n can be modeled by a _____________________________________________.

decision tree with at least n nodes

The worst-case number of comparisons for the algorithm is the ____________________________-.

depth of the decision tree + 1. (Remember, root has depth 0).

To list inversions using a slight modification to _____ we can count inversions in _____ time.

mergesort, O(n log n)

In a list of size n, there can be as many as ______ inversions.

n choose k combinations nCk = n!/k!(n-k)! C = number of combinations n = total number of objects k = number of objects selected

Combination Formula

nCr = n!/r!(n-r)! C = number of combinations n = total number of objects r = number of objects selected

Permutation Formula

nPr = n!/(n-r)! P = # of possible different arrangements n = total number of objects r = the number of objects selected

The reporting algorithm (aka listing inversions) is an example of an ____-____ algorithm.

output-sensitive The performance of the algorithm depends on the size of the output as well as the size of the input.

Sorting is the problem of finding a particular distinguished ______ of a list.

permutation

A ________ of a sequence of items is a reordering of the sequence. A sequence of n items has ____ distinct _____.

permutation, n!, permutations


Set pelajaran terkait

Geography final (Ch 8-13) (i forgot chap 11 oops)

View Set

Interpersonal Communications Ch 4-6

View Set

VV Hoorcollege 5 - Veilige voeding voor mens en dier

View Set

MGT 370 Management Principles: Exam 2

View Set