COMP 3270-002 - Exam 1 - Dr. Heaton
Steps of Insertion Sort
At each step, insert the next element into the left subarray in its sorted position, shift remaining elements from that subarray to the right.
Lower bound of a binary tree
A complete binary tree with n+1 leaf nodes is depth log n + 1 so the lower bound on this type of search is Ω(log n)
What is the Implementation: Dynamic Array Stack
A dynamic array stack is an efficient LIFO (Last In, First Out) data structure that takes advantage of the dynamic array's ability to perform insert_last and delete_last operations in O(1) time (amortized).
What is the Implementation: Linked List Queue
A queue that uses a linked list and can have a tail pointer
What is an algorithm?
A series of steps that take in an input and generates a correct output for a computational problem. A function with inputs and outputs. For each input there is at least one correct output
How can algorithms be written in non-destructive mode?
All algorithms can be written in non-destructive mode through copying the data first.
What is the Implementation: Array Queue
An Array Queue is a queue data structure implemented using a contiguous block of memory, typically an array, to store elements.
What is the Implementation: Dynamic Array Sequence
An array based sequence implementation that is efficient for insert_last and delete_last. Allocate more memory than you need. The most common thing to do is allocate double the current capacity when you run out of space
Decision Tree view of algorithm
Any algorithm can be viewed as a decision tree-a path of execution with branches at conditional statements. The leaf nodes are the results The tree is binary, because the comparison operators only return true or false, thus making a binary branch
What two properties does a Algorithm contain?
Arbitrary input length and algorithm fixed length.
What are the two storage methods?
Array based pointer based
Common Implementations of a Sequence
Array, Linked List, Vector, Deque_vec, doubly linked list
Direct Access Array
Assume that the keys are unsigned integers, meaning that the keys are values that can be mapped onto word-length unsigned integers Let u represent the size of the universe of possible keys, such that any key k is in the range k ∈ [0,u) Now we can just allocate an array of size u and store elements ki at array[ki] Suddenly contains_key(k) and get(k) run in constant time! Takes a lot of memory to store.
What do we use to measure efficiency?
Asymptotic Notation
Why does Quick Sort a randomized algorithm?
Avoids the worst-case runtime of O(n^2) for adversarial data
Steps for Bubble Sort
Bubble sort works by comparing elements next to one another and swapping them if the right element is less than the left element up to n - 1 times
Steps of Quick Sort
Choose a pivot (First, last, or random) Partition the array versus the pivot Recursively do the same to the subarrays on each side
How do yo handle a collision?
Create another data structure to store at array[h(ki )] that can store multiple items. When we do a get or contains key for an key that has multiple items, we have to search within that inner data structure. This is known as chaining Find another location for this item - many ways to achieve this, and it is what is done inpractice, but it gets complicated and messy. This is known as open addressing
Steps of Selection Sort
Find the smallest element in the array, put it at position 0, Find the next smallest element, put it at position 1, Repeat until array is sorted.
What is the implementation: Sorted Array backed set
In a sorted array-backed set, the keys are stored in a sorted array, which allows for efficient searching using binary search. However, the trade-off is that insertion and deletion are more expensive since the elements must remain sorted, requiring shifting elements in the array.
How do you prove correctness of an algorithm for arbitrary input size?
Induction Proofs are used to prove correctness of an algorithm
What are the two randomized algorithms for choosing a pivot?
Las Vegas algorithms: algorithms that involve randomness, but guarantee an optimal solution. However, its runtime may depend on the randomness. Monte Carlo algorithms: has a fixed or bound runtime, but its correctness is a function of the randomness.
What is the benefit of a consecutive chunk of memory?
accessing any element at a specific index very efficient, as the address of each element can be computed directly using the formula: address of element = base address of array + i × sizeof(type)
What is an in-place algorithm?
an algorithm that only uses O(1) extra memory
What is a destructive algorithm?
an algorithm that overwrites the input array
What is Divide and Conquer?
an algorithmic paradigm in which we divide the problem into two or more subproblems recursively until they are small enough to solve trivially. The subproblems are then combined into a solution of the larger problem.
Time Complexities of Implementation: Array Sequence
build - O(n) get(i) and set(i,x) - O(1) - because we can compute the address of this element using address of the array + i * sizeof(type) and then via our basic operations that take O(1), we can read or write from that memory address insert(i,x), delete(i), insert_last(x), delete_last(), insert_first(x), delete_first() - ALL O(n) - always need to shift items and/or allocate new array and copy items
Time Complexity of Implementation: Sorted Array backed set
build() - O(n log n) get(k)/contains_key(k) - binary search! O(log n) insert(k), delete(k) cost O(n) - need to get to the location which is O(log n) but then we will need to shift all of the other numbers over which is O(n)
Time Complexity of Implementation: Linked List Sequence
build() - O(n) get(i), set(i,x) - O(n) much worse than array based insert_first(x), delete_first() - O(1) much better than array based insert(i,x), delete(i) - O(n) only because you have to traverse the list to find the item, once you find it, its constant additional work insert_last(x), delete_last() - O(n) by default/standard implementation, but if the head node keeps a last node pointer then it can be O(1)
Core supported operations of Sets and Maps
build(x,y,z,...) - no order to items unlike sequences contains_key(k): Checks if the set or map contains the key k. This operation returns True if the key exists in the structure, and False otherwise. get(k) (for maps): this operation retrieves the value associated with the key k. If the key does not exist, it either returns None or raises an error. insert(x) - insert an item, often insert(k, x) for maps, rarely but sometimes called push delete(k) - deletes item at key k
Core supported operations of a Priority Queue
build(x0..xn−1) enqueue(x) - add item to queue - the priority value is often the item value, but could be anothervalue. Sometimes this is just called push dequeue() - get and remove an item in priority order. Sometimes called pop() or popmin() for amin-priority queue. Often also a peek function
Core supported operations of a Queue
build(x0..xn−1) enqueue(x) - add item to queue, sometimes called push dequeue() - get and remove an item in FIFO order, sometimes called pop. Also sometimes there is a peek
Core supported operations of a Stack
build(x0..xn−1) push(x) - push item to the top of the stack pop() - get and remove item from the top of the stack (sometimes also peek() but thats just a popfollowed by a push)
Core supported operations of a Sequence
build(x0..xn−1) - Constructs the sequence from a given set of items. get(i) - Retrieves the value at position i. set(i, x) - Sets the value at position i to x. iter() - Iterates through the items one by one in sequence order. len() - Returns the number of items stored.
Time Complexity of Implementation: Dynamic Array Sequence
build(x0..xn−1) - O(n) get(i)/set(i,x) - O(1) insert_last(x) / delete_last() - O(1)* (*amortized) insert_first(x)/delete_first() - O(n) - must shift items before insert or after delete. this can be improved to constant amortized with a deque_vec implementation insert(i,x)/delete(i) - O(n) - must shift items after delete or prior to insertion
describe array based storage
contiguous blocks of memory - this is most common - have to often shift elements or copy elements over to new location
Optional Dynamic Operations of a Priority Queue
decrease_key(i, k) - in min priority queues with indexed values, this decreases the priority valueof item i to priority k. This is a necessary function for certain algorithms such as dijkstras, prims,and A*. This can be an expensive operation in some implementations or efficient in others.
Optional Dynamic Operations of a Sequence
delete(i): Removes the item at position i, shifting subsequent elements if necessary. insert(i, x): Inserts an item x at position i, shifting subsequent elements to make space. delete_last() (often called pop): Removes the last item in the sequence. insert_last(x) (often called append, push, or push_back): Adds an item x to the end of the sequence. delete_first(): Removes the first item in the sequence. insert_first(x): Adds an item x to the beginning of the sequence.
Time Complexities of Simple array backed set
get(k), delete(k), contains_key(k) must search the entire array giving O(n) time build is O(n). Because we have stipulated that keys are unique, insert also needs to first determine if the key already exists
Link to Bubble Sort
https://wheaton5.github.io/PlayBubblesort.mp4
Link to Insertion Array
https://wheaton5.github.io/PlayInsertionsort.mp4
Link for Merge Sort
https://wheaton5.github.io/PlayMergesort.mp4
Under the Comparison computational model, how can we differentiate items?
only using comparisons like <, >, <=, >=. !=, ==
Binary Search Pseudo Code
procedure binary_search(A, x) if len(A) == 0 then return false mid <- len(A)/2 if x == A[mid] then return mid else if x < A[mid] then return binary_search(A[0...mid], x) else if x > A[mid] then return binary_search(A[Amid+1...len(A)], x)
Pseudo Code for Bubble Sort
procedure bubble_sort(A) for iteration <- 0 to n do any_change <- false for i <- 0 to n-1 do if A[i + 1] < A[i] then swap (A, i, i + 1) any_change <- true if not any_change then break
Insertion Sort Pseudo Code
procedure insertion_sort(A) for i <- 1 to n do tmp <- A[i] j <- i-1 while A[j] < A[i] do j <- j-1 for k <- i down to j - 1 do A[k] <- A[k - 1] A[j + 1] <- tmp
Pseudo Code for Merge Sort (indentions don't work for quizlet)
procedure merge(L, R, A) l, r, a <- 0 // pointers for each array while l < len(L) or r < len(R) do if L_pointer == len(L) then A[a] <- R[r++] else if R_pointer == len(R) then A[a] <- L[l++] else if L[L_pointer] <= R[R_pointer] then A[a] <- L[l++] else A[a] <- R[r++] a += 1 procedure merge_sort(A) if len(A) == 1 then return L, R <- A[0:len(A)/2], A[len(A)/2:len(A)] merge_sort(L) merge_sort(R) merge(L, R, A)
Pseudo Code for Quick Sort
procedure quick_sort(A) pivot <- rand_int(0, n) pivot_pos <- partition(A, pivot) quick_sort(A[0:pivot_pos]) quick_sort(A[pivot_pos + 1 : n]) procedure partition(A, pivot) swap(A, pivot, 0) pivot_val <- A[0] low <- 1 high <- n-1 while low <= high do while A[low] <= pivot_val do low += 1 while A[high] > pivot_val do high -= 1 if high > low then swap(A, low, high) swap(A, 0, high
Selection Sort Pseudo Code
procedure swap(A, i, j) tmp <- A[i] A[i] <- A[j] A[j] <-temp procedure selection_sort(A) for pointer <- 0 to n - 1 do mindex <- pointer for i <- pointer to n do if A[i] < A[mindex] then mindex <- i swap(A, mindex, pointer)
Time Complexity of Implementation: Dynamic Array Stack
push/pop - O(1)
Time Complexity of Implementation: Linked List Stack
push/pop - O(1) insert_last / delete_last are also O(1)
what is data structure (implementation)?
refers to a particular implementation of that interface which may have different run times for the operations involved.
What is abstract data structures?
refers to the set of operations a data structure allows sometimes also know as the interface (ex append, get_min, remove(i), etc.)
What is the purpose of Data Structures?
store data in a way that efficiently supports how we want to access and change that data.
What is the Abstract data structure: Priority Queue
stores items with some ordering according to some priority value
What does it mean for a computational problem to have an arbitrary input size?
the problem can handle inputs of any size or length. This makes the problem scalable.
What is a problem with Hashing
two uniwue keys ki an kj could map to the same location in the array. This is known as a collisions
What is the difference between abstract data structures and data structures?
we can think of abstract data structures as the problem of how we want to be able to access and edit data vs implementations as the solution to how we will store, access, and manipulate that data to solve the problem
What do we do if an algorithm is a constant time factor?
we ignore that in asymptotic notation
Link for Quick Sort
https://wheaton5.github.io/PlayQuicksort.mp4
Link to Selection Sort
https://wheaton5.github.io/PlaySelectionsort.mp4
Why use swap(A, i, j) instead of insert(x, i)?
insert(x, i) is an O(n) operation because it must shift the elements. Swapping items can be done in O(1) time. (Super simple, but remember to use a temp variable to not overwrite the elements in the array
What are some constant time operations?
integer and floating point arithmetic (add, sub, multi, and div) logical operations (and, or, not, ==, <, >, <=, >=) bitwise operations - &, |, «, », etc. given a memory address a, read(a) and write(a) for a usize number of bits (64 bits in modern computers), this means that accessing an index of a list is O(1)
Implementation of Simple array backed set
just an array with keys.
describe pointer based storage
non contiguous memory - More Flexible - No random access unless you have the address - in addition to the data, we need to store memory addresses of other data points
Insertion Runtime
O(n) + O)n) = O(n^2)
Steps of Merge Sort
Merge sort proceeds by first recursively splitting the data until we hit our base case of singleton arrays. Once we return up the stack call, we have two sorted arrays that a merge method requires for efficiency. Merge is a simple procedure making use of two pointers, one for each sorted subarray. At each step, you take the minimum element from the two sorted arrays at their respective pointer locations and add that to the merged array. You then increment the pointer associated with the value you just added to the merged array. proceed until all elements from both sorted arrays have been added into the merged array
What are some common asymptotic runtimes?
O(1) - constant O(log n) - logarithmic O(n) - linear O(n log n) - log linear O(n^2) - quadratic O(n^c) - polynomial (where c is an integer greater than 2) O(2^n) - exponential
Selection sort runtime?
O(n^2)
Bubble Sort Runtime
O(n^2) - can break early if lucky
What is partitioning
Partitioning is the idea of choosing a "pivot" value, and moving all values less than that pivot to the left of the pivot, and everything greater than the pivot to the right of the pivot?
What is the Implementation: Linked List Sequence
Pointer based data structure. Each element has a value and a memory address pointing to the next element. Can change the order, insert, or delete items just by changing pointers. Cannot address an arbitrary element without traversing the linked list.
What is the Implementation: Linked List Stack
Same as Dynamic Array Stack but has a tail pointer.
What is the Implementation: Array Sequence
Sequence backed by static array - a consecutive chunk of memory.
How does Universal Hashing work?
Shift key k by a multiple and a constant a and b and add another modulo by parameter p. ha,b(k) = (((ak + b) % p) % m)
What is the Abstract data structure: Queue
Simple first in first out (FIFO) data structure. Think of waiting in a line
What is the Abstract data structure: Stack
Simple last in first out (LIFO) data structure. think of a deck of cards
Properties of Abstract data structure: Sequences
Stores a sequence of items It is indexed starting from 0, with items denoted as x0, x1, ..., xn-1
What is the Abstract data structure: Sets and Maps
Stores a set of items with unique keys. When keys have an associated value, we call it a map
Merge sort via Substitution Method
T(n) = 2T(n/2) + n T(n/2) = 2T(n/4) + n/2 T(n) = 2(2T(n/4) + n/2) + n = 4T(n/4) + n + n 4T(n/4) + 2n T(n) = 2^k T(n/2^k) + kn T(n) = 2^(log2 n) T(1) + n log2 n = n + n log n = O(n log n)
What does it mean for an algorithm to have a fixed length?
The algorithm itself has a fixed set of steps of instructions. The algorithm doesn't change based on the input size; the logic stays the same.
what is merge sort the textbook example of?
The algorithmic paradigm known as Divide and Conquer
Explain Amortized Analysis
The key idea behind amortized analysis is that the cost of resizing is "paid off" over a series of operations. For every append operation, some extra cost is "banked" in anticipation of future resizing. When a resize occurs, this banked cost is used to cover the high expense of copying the elements.
What is the runtime analysis of this procedure: procedure CONTAINS_DUPS(arr) for i in 0...len(arr) do for j in 0...len(arr) do if i != j and arr[i] == arr[j] then return true return false
The outer loop runs n times. For each of those n loops, the inner loop runs n times. And the inner loop does a constant time amount of work each time it runs. So that is O(n) ∗ O(n) ∗ O(1) = O(n^2) which is quadratic.
Why do we not time out programs to test efficiency?
The test is only on one input and one input size Different Computers operate at different speeds
Hashing
To reduce the excessive memory usage seen in direct access arrays, we allocate a smaller array of m, where m <= u, the size of the universe of keys. In the general form of a hash function, the keys no longer need to be unsigned integers, but canbe arbitrary type and size.
What are the three asymptotic bounds?
Upper bout (O) - most used Lower bound (Ω) Tight bound (Θ)
Merge sort via Recursion tree mehtod
Visualize the recursion tree and count the number of operations at each level and how many levels O(n log n) (View tree on lecture notes)
What do we do if an algorithm has 2 terms, one that dominates the other?
We drop the lower order term (ex. O(n^2 + n) is just O(n^2)
Time Complexity of Implementation: Array Queue
With a deque_vec backing, insert_first and delete_first (or you could do it the other way around with insert_last and delete_first) with O(1) time enqueue(x), dequeue() - O(1)
Time Complexity of Implementation: Linked List Queue
With a linked list, insert_first and delete_last can be O(1) if we keep a tail pointer, so this is an easysolution enqueue(x), dequeue() - O(1) but once again, while theoretically identical, the deque_vec arraybacked implementation is slightly faste
Are in-place algorithms destructive?
Yes, All in-place algorithms are destructive. However, not all destructive algorithms are in-place
Is Quick Sort in-place?
Yes, this makes it practically faster than merge sort
What is a computational problem?
a mapping from inputs to correct outputs given some property which makes that output correct for that input.
What is a problem?
a question with a correct answer
What is stable sort?
a sort which does not alter the order of equal items vs the input array.
