ALGORITHMS/DATASTRUCTURES

Ace your homework & exams now with Quizwiz!

Binary Search Tree delete node flow

Find the node; If it is leaf - just remove; If has one child - replace the deleted node with a child; If has two children - find the largest value in the left subtree (go right as far as possible without turning left), or the smallest value in the right subtree (vice versa). Replace, repeat recursively.

Merge Sort overview

Uses recursion, needs n extra space Divide and conquer algorithm. Better complexity than for iterative insertion or selection, highly parallelizable Not best performance, requires space

Cartesian Tree definition

A binary tree with an invariant: heap invariant + in-order traversal returns the elements in the order they were inserted. The first inserted element is the leftest, the last is rightest. Can be constructed in linear time. Used in sorting. Worst-case complexity is linear because the tree is unbalanced.

Static Array definition

A fixed length container containing n elements indExable from the range [0, n-1]. Array is a contiguous chunk of memory

Algorithm correctness with loop invariant

A loop invariant is a statement about the variables which is true before and after each iteration of the loop Prove that it is true at all the stages of loop: Initialization - it is true prior to the first iteration; Maintenance - if it is true before an iteration, it is true before the next iteration Termination - when the loop terminates, the invariant gives us a useful property that shows that the algorithm is correct

Red-Black Tree overview

A self-balanced tree, where no path from the root to leaf is more than twice as long as any other.

Heap definition

A tree based on the DS that satisfies the heap invariant: if A is a parent node of B then A is ordered with respect to B for all nodes A, B in the heap (the value of the parent node is always larger or equal to than all the child nodes, works for all the elements) Heap is implemented based on an array and sometimes hashtable

DS - Data Structure definition

A way of organizing data so that it can be used effectively

Left Child Right Sibling Representation

A way to represent a tree, where each node can have an arbitrary large amount of children. This representation would require each node to have only 3 pointers - parent, left child, right sibling. Right sibling points to the child of the parent.

Dynamic Array complexity: access, search, insert, append, delete

Access - O(1) Search - O(n) Insert - O(n) Append - O(1) Delete - O(n)

Static Array complexity: access, search, insert, append, delete

Access - O(1), Search - O(n), Insert, Append, Delete - N/A, N/A, N/A

Complete Binary Tree definition

All the levels (except the last ones) are completely filled and all the elements are as far left as possible

Binary Tree underlying DS

Array

Hash Table underlying structure

Array (since it is indexable and has constant time) + something for collisions resolutions (linked list)

Union Find underlying datastructure

Array and hashtable (not sure if it is the best implementation, if values are not ints, use pointers somehow)

Asymptotic complexity and size input

Asymptotic complexity is basically big O notation. Usually, an algorithm with the best asymptotic complexity is the best choice for all but the smallest inputs

Hash Table Complexity: insert, remove, search

Average - O(1), Worst - O(n)

Binary Search Tree Complexity: insert, remove, search

Average: O(log(n) Worst - O(n)

Cartesian tree search, insertion, deletion complexity

Average: O(log(n)) Worst: O(n)

Skip List access, search, insertion, deletion, space complexity

Average: O(log(n)) Worst: O(n) Space: O(n*log(n))

Quicksort overview

Bad complexity in the worst-case scenario - when the partition is unbalanced - at every recursion the pivot produces a partition with only one element in one of the parts. Best scenario - partition produces halves. In practice, average case is closer to the best case than the worst case. Performance is also quadratic when the array is completely sorted.

Insertion Sort complexities

Best - O(n) Average - O(n^2) Worst - O(n^2) Space - O(1)

Heap Sort Complexities

Best - O(n*log(n)) Average - O(n*log(n)) Worst - O(n*log(n)) Space - O(1)

Merge Sort complexities

Best - O(n*log(n)) Average - O(n*log(n)) Worst - O(n*log(n)) Space - O(n) Merge procedure is linear itself

Quick Sort complexities

Best - O(n*log(n)) Average - O(n*log(n)) Worst - O(n^2) Space - O(log(n))

Radix Sort complexities

Best: O(n*k) Average: O(n*k) Worst: O(n*k) Space: O(n+k) where n is the input size, k is the max number of digits possible. Complexities depend on the underlying sorting stable algorithm chosen.

Heapsort complexities

Best: O(n*log(n)) Average: O(n*log(n)) Worst: O(n*log(n)) Space: O(1)

Counting Sort complexities

Best: O(n+k) Average: O(n+k) Worst: O(n+k) Space: O(n) - may be O(k), depends where n is the size of input and k is the range of input (the size of an auxiliary array)

Bucket Sort complexities

Best: O(n+k) Average: O(n+k) Worst: O(n^2) Space: O(n) where n is the input size, k is the number of buckets

Selection Sort complexities

Best: O(n^2) Average: O(n^2) Worst: O(n^2) Space: O(1)

Runtime Complexity types

Big O - upper bound, worst case Big Ω (omega) - lower bound, best case Big Θ (theta)- both O and Ω

Binary Search Tree definition

Binary Tree, satisfies Binary Search Tree invariant (BST invariant) - left subtree has smaller elements, right subtree has larger elements.

BFS algorithm overview

Breadth-First Search (поиск в ширину) - graph traversal algorithm, implemented with the queue. Algorithm starts at a node (may be tree root, may be an arbitrary node, references as 'search key') and traverses the entire graph. Moves forward by visiting direct neighbors of the current node, moving to the next layer away from the search key only after all the nodes are visited at the previous layer. Current layer - frontier/visiting group

Bucket Sort flow

Breaks the interval [0, 1) into equal intervals - buckets. Buckets are represented by an auxiliary array of linked lists. Distribute values into these buckets, Sort values within the buckets in the process. Combine to get an output.

Open Addressing definition

Collisions Resolution Technique, finds another place is a hash table by offsetting it, no auxiliary structure. Relevant term - load factor = items number/table size. If the load factor is too high - performance degrades exponentially (after about 0,8). Once some threshold is met - create a bigger table, reindex elements, reinsert. If there is a hash collision, we offset the original hashed position to a probing sequence P(x). Keep doing it until a free slot is found; Problem - may result in infinite loops. - happens when a bad Probing function is chosen

Separate Chaining definition

Collisions Resolution Technique, maintains a list of values mapped to the same hash, then linear search it (auxiliary DS, bucket )

Union Find usage examples

Connectivity problem (computers in a network, people on the social network, elements in a mathematical set ...) Dynamic connectivity, Percolation, Least common ancestor, equivalence of finale state automata, Kruskals minimum spanning tree algorithm, image processing

Big O complexities from smallest to largest

Constant time: O(1) Logarithmic time: O(log(n)) Linear time: O(n) Linearithmic time: O(n*log(n)) Quadric time: O(n^2) Cubic time: O(n^3) Exponential time: O(b^n), b > 1 Factorial time: O(n!) where n is the size of the input Also, asymptotic efficiency of algorithms

Union Find complexity: construction, union, find, get component size, check if connected, size

Construction - O(n) Union - α(n) Find - α(n) Get component size - α(n) Check if connected - α(n) Size - O(1)

Hash Function definition and properties

H(x) - function that maps an object to a whole natural number in a defined range. If H(x) == H(y), x may be equal to y; If H(x) != H(y), x != y. Must be deterministic. If H(x) = y once, it must always be so. Must be uniform to have as little hash collision as possible.

Binary Heap definition

Heap where each node no more than two values

Stable Sort definiton

If the input has repeating elements, the output will have these elements in the same order as in the input. Stable sorts: merge, insertion, counting

Open Addressing collision resolution

If there is a hash collision, we offset the original hashed position to a probing sequence P(x). Keep doing it until a free slot is found

PQ complexity: poll, peek, add, remove, contains

Implemented based on binary heap Poll - O(log(n)) Peek - O(1) Add - O(log(n)) Remove - O(n) - because need to find first Contains - O(n) Implemented with hash table Remove - O(log(n)) Contains - O(1)

Heapsort overview

In place, uses a binary heap with an array as an underlying DS. Flow: build a max heap from the array (in place) - now the largest element is in the root. Remove the root by swapping it with the last element and bubble the last element down. Repeat for all the elements. In the cycle, the array has two parts: in the end are the sorted removed roots, the rest represents the heap. Compared to Quicksort has worse performance but better worst case, compared to Mergesort has worse performance but better space consumption. Not used much. Good to find max/min values quickly.

Selection Sort overview

In-place sort. Complexity compared with insertion, but usually performance is worse. Only makes sense to use on small inputs. Double cycle. Divides the input into two parts - sorted and forgotten on the left, remaining elements on the right. On each inner loop iteration, finds the smallest element among the remaining and appends it to the sorted part.

Insertion Sort algorithm

Iterative, In place sort, uses double loop -outer for loop iterates over the elements from the second to the last; -take an element, save its index and value; -inner while loop iterates backwards from the saved index to the first element; -on each step, compare the saved value with the value on the left; -if on the left is larger - continue iterating (for an array shift the value one step to the right); -else the place to insert the saved element is found; -take care not to get out of bounds, check if the inner loop reached the first element yet; Outer loop moves by one element to the right every iteration, leaving the elements on the left in the sorted order, and elements on the right in the original order. Inner loop takes one entry from the unsorted at a time and finds a place for it in the sorted part.

Find Minimum Spanning Tree Algorithm name

Kruskal's Algorithm

Kruskal's Algorithm definition

Kruskal's Minimum Spanning Tree - algorithm to find a minimum spanning tree in a graph. It is a subset of all edges that connect all the vertices in the graph with the minimal total edge cost. Implemented with a Union Find. Notice - tree by definition cannot have cycles!

Stack defenition

LIFO, one-ended linear data-structure

Probing Sequences types

Linear, Quadratic, Double (define a secondary hash function) , Pseudo Random Number Generation (seed the generator with the hash value)

Skip List definiton

List-like ordered DS that allows to improve search from O(n) to O(log(n)) Contains multiple layers (represented by linked lists), the 0 layer contains all the elements, each level up duplicates approximately half the elements from the previous layer, chosen randomly. This allows to skip a bunch of element while performing linear search.

Binary Heap Remove arbitrary node algorithm

Start at the root and do a linear search - implementation is array - until the element is found. Swap the found element with the last element. Remove the last element easily. Bubble up or down until the heap invariant is satisfied. Note: if we look at the underlying array - it is linear search, if we look at the tree - it is Breadth First Traversal

Tree definition

Undirected graph, has no cycles, (has N nodes and N-1 edges, any 2 vertices are connected by only 1 path)

Insertion Sort overview

Uses incremental approach Easy to implement, good for small inputs, good for almost sorted inputs, constant space consumption. Complexity is like for bubble sort and insertion sort, but in practice is more efficient Bad performance for large inputs

Union Find operations

find (returns a group where the given element belongs), union (merges groups together)

Stack complexity: push, pop, peek, search, get size

implementation using linked list Push - O(1) Pop - O(1) Peek - O(1) Search - O(n) Get size - O(1)

Amortized Constant Time definition

α(n), almost constant time; average time taken per operation, if you do many operations When many operations are done, it does not matter that once in a while the operation is very slow

Counting Sort flow

Create an auxiliary array with the size of the range. This array will have counters - how many elements of each possible vale are in the input array. Iterate over the input array and count each value. Transform the counters array into the prefix sum array - replace each value with a running sum. The new array will have correct indices (-1) for the sorted values. Allocate a new array for the output. Iterate the input array backwards. For each value, index into the prefix sum array, decrement it (save decremented value) - this is the correct index. Copy the value to the output array.

Hash Table defintion

Data structure that lets us construct a mapping (key, value), using hashing. Key must be unique and hashable. Keys must be immutable.

Union Find definition

Disjoint Set, DS that keeps track of elements which are split into one or more disjoint sets (disjoint sets = connected components)

Hashing methods

Division method: h(k) = k mod m where m is the needed range. Fast. Avoid m that is a power on 2. Good choice is a prime not too close to the power of 2. Multiplication method: h(k) = floor( m * (k*A mod 1) ) Multiply the key by a constant value in (0, 1), extract a fractional part. Then, multiply by m and take the floor of the result. Value of m is not critical, better power of 2. Universal hashing: randomized hashing.

Union Find Path Compression definition

Dynamically change the parents of the notes so that they all point to the root, thus reducing time complexity for find

Binary Heap Add element algorithm

Elements are added top to bottom, left to right. Elements are not added to the new level unless the previous level is not full. Add the element to the leftest free slot on the current level. Perform 'bubbling up': If its relation with the parent breaks the heap invariant, swap the element and the parent. Continue until the right place for the element is found.

Queue complexity: enqueue, dequeue, peek, contains, remove, is empty

Enqueue - O(1) Dequeue - O(1) Peek - O(1) Contains - O(n) Remove - O(n) Is empty - O(1)

Binary Heap with Hashtable

Every node is mapped to its index. Lookup to get the index and skip the linear search. To avoid collisions, map one value to multiple positions - use a set. Movements and swaps must be tracked.

Queue definition

FIFO, linear data structure

Big O notation general definition

Gives the upper bound of the complexity in the worst case, when input size becomes arbitrary large Also, order of growth

Tree Traversal definition and types

Going to each node of the tree according to some rule (usually recursively); Preorder - print value, recursion left, recursion right - value is printed as soon as it is visited, from root down to the leftest leaf, then up then down then up; Inorder - recursion left, print value, recursion right, values printed from the leftest leaf up, if it is a BST - values are sorted in ascending order; Postorder - recursion left, recursion right, print Level Order Traversal - not recursive, print one layer at a time, from left to right. Use breadth-first search and a queue.

Kruskal's Algorithm flow

Make a list of all the edges and nodes they connect in the ascending order; Start from the very beginning and check each edge; If none of the nodes belong to any group yet, create a new group for this edge; If only one of the nodes belongs to some group, add the edge to this group; If both of the nodes belong to the same group, ignore; If nodes belong to the different groups, merge the groups and add the edge there; Continue until all the nodes are in the same group.

Heap types (based on ordering)

Max heap - tree root is the biggest value - used in heap sort Min head - tree root is the smallest value - used in PQ

Union Find Connected Component definition

Maximum set of objects that are connected among all the objects. Thus find operation checks if two objects are in the same connected component.

Big O properties

O(n + c) = O(n) O(cn) = O(n) where c > 0, because when n goes to infinity, constant c does not matter It is theoretical, in real world if c is laaaaarge, it will have effect on computation time

Binary Heap construction from array complexity

O(n)

Comparison Sort (type) definition

Sorting algorithm based on comparison between elements. For comparison algorithms, O(n*log(n)) is the best possible complexity

Union Find implementation

On construction array all the values are put into the array. Combination (value, index) is put into a map. At this point every value is in a separate group and is its root. Index in a map represents the parent of the value. On union, the parent of the first value is taken from the map. The second value's record is updated so that the second value's new parent is the first value's current parent. On find, the value's parent is looked up in the map. Then the array is indexed by this value (that was taken from the map). If the result is is not the value we search, then the value is not a root. Perform the same search with the value, that was taken from the array (recursion). Repeat until the root value is found. The root value's index is the group. NOTE recursive implementation is not optimal NOTE: for efficient implementation, name object in integers so that they can be indexed in the array

Open Addressing remove element

Place a placeholder where the value used to be - tombstone. This way, while probing with the probing sequence later and encountering the tombstone, we will know that we must continue probing. Tombstones may be replaces with values later, or will be removed at table resizing. Also may do lazy deletion - replace a tombstone with the next valid value so that we do not have to probe that much the next time.

Priority Queue definition

Queue where each element has a certain priority, elements with higher priority are dequeued first. Only support comparable data. Typically based to heaps

Radix Sort overview

Radix = base (base 10 for decimal) Works for numbers with a fixed number of digits. For intermediate sorting, any stable sort must be used (counting works well, but is not in place). Good for sorting values like dates (that have year, month, date), where values are keyed by multiple keys.

Quicksort flow

Recursive On each iteration, the pivot is chosen. The pivot's correct place in the array must be found. For that, elements are rearranged in such way, that all the elements lesser than the pivot are in the left part, all the elements larger than the pivot are in the right part. Steps: 1. Swap the pivot with the rightest element; 2. Start scanning the array from the left and from the right; 3. Identify values larger than the pivot from the left and smaller from the right, swap them; 4. Insert the pivot in the middle, where the scanning pointers meet; 5. Recursively apply to right and left.

Merge Sort Algorithm

Recursive, uses extra space -the mergeSort function takes an array and two indices that represent a subarray to be sorted; -if the subarray is of size 1, then it is sorted, return; -else divide the subarray in two; -merge sort each subarray; -merge the sorted subarrays; - return; -merge takes two sorted subarrays (actually one array and three indices that represent them - subarrays are one after another in the original array) ; -copy subarrays to new arrays; -because the subarrays are sorted, compare the values at the beginning on the arrays repeatedly; -on each comparison, choose the smallest value, remove and add it to the merged array; -return the merged array;

Tree node types (relation to each other)

Root - the top of the tree, any node can be a root, has no parent (my be the parent of itself) Parent - one node up Child - one node down Leaf - has no children

DLL complexity: search, insert head, insert tail, remove head, remove tail, remove middle

Search - O(n), same for insert in the middle Insert head - O(1) Insert tail - O(1) Remove head - O(1) Remove tail - O(1) Remove middle - O(n)

SLL complexity: search, insert head, insert tail, remove head, remove tail, remove middle

Search - O(n), same for insert in the middle Insert head - O(1) Insert tail - O(n) Remove head - O(1) Remove tail - O(n) Remove middle - O(n)

Hash Collisions Resolution Techniques

Separate Chaining, Open Addressing

Quickly multiply by two

Shift the bits of an integer by one position to the left. It is because it is binary. 0001 << 0010 << 0100 << 1000 equivalent to 1 << 2 << 4 << 8 Imagine same in decimal - left shift would be an equivalent of writing a zero in the end - multiplying by 10

Basic Linked lists types

Singly and Doubly Linked lists

Singly vs Doubly linked list

Singly uses less memory, has easier implementation Doubly can be traversed backwards Singly cannot easily access previous element Doubly takes two times more memory

BST find successor and predecessor flow

Successor and predecessor (преемник и предшественник) - by value Ancestor and descendant (предок и потомок) by structure. To find successor, two cases are possible: the node either does or does not have the right child. If there is the right child, the successor is the min values in the right subtree. If there is not right child, the successor if the closest ancestor of the node, whose left child is also the ancestor of the node. Go up the tree until you find a node with the left child. Predecessor is symmetric.

Union Find Eager approach

Suppose our entries are integer, then we can use them to index into the array. Two elements are going to be in the same connected component if their values in the set are the same

Radix Sort flow

Suppose we know that the max number of digits in the input is 4. Radix sort will first sort the input only by the last (least significant) digits. Then (on second iteration) by the second from the end on so on. After the 4th iteration, the output will be sorted.

Binary Heap Remove root (Poll) algorithm

Swap the root value with the last element (lowest, rightest). Remove the root value from the end easily. Perform 'bubbling down': if the heap invariant is broken, swap the parent with its smallest child (for min heap). Continue until the right place for the element is found. If the children are equal, swap with the left one.

BST find min and max flow

The minimum value of the subtree is the leftest node - go as far left as possible, while the left child is not null. Finding maximum value is symmetric.

Complexity analysis types

Time complexity and space consumption (for space input and output do not count, only the memory allocated by the algorithm)

Binary Search Tree worst case

Tree looks like a line, (e.g. values 1, 2, 3, 4, 5, 6, ... are inserted), alternative - Balances Binary Search Tree

Binary Tree definition

Tree where each node has no more than 2 children

Binary Heap structure

Viewed as a binary tree. Based on an array. Elements are stored by levels - iterating over array means performing a Breadth-First traversal. All the levels except the last one are filled. Each next level is twice as large as the previous one (1 -> 2 -> 4 -> 8 -> ...) Given the index of a parent node, incidences of the children are easily calculated: node=i, left=2i, right=2i+1; child=i, parent=i%2. Implementations do these by shifting. Height of a node - the number of edges on the longest simple downwards path to a leaf (number od layers below)

Weighted Union Find algorithm

When unioning two subtrees, make sure that the smaller tree becomes the child of the bigger one, not the other way around. Thus the distance for each element from the root is guaranteed to be lower. Size of the subtree is stored in an auxiliary array - it stores the size of the tree for each item (max find - O(log(n)))

Counting Sort overview

Works on integers when the min and max possible values are known. Efficient when the range is known it is not very big - otherwise needs a lot of space. Can be used with arbitrary objects if they have some fixed range int label. Is stable. Not in-place.

Bucket sort overview

Works when the values are distributed uniformly in the range [0, 1)


Related study sets

Chapter 31 : The Child with a Metabolic Condition

View Set

DNA REPLICATION MASTERING BIO PART 2

View Set