Python Data Structures / Algorithms

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Recursion (requirements)

function that calls itself -3 requirements: 1) call itself 2) base case 3) alter the input parameter -without a base case, we could end up in infinite recursion

Kth largest vs Rank in a stream

1) For Kth largest - we just look once at an unsorted array to find which is the kth largest 2) For Rank in a stream of values - best to keep a BST order statistics tree insert - lgn update - lgn

Tree Terminology - diameter/width

longest path between any two leaf nodes -include the two leafs in count

Binary Search - calculating mid

mid = lo + (hi-lo) https://stackoverflow.com/questions/25571359/why-we-write-lohi-lo-2-in-binary-search

Counting Sort - When is it useful

- better when range k is smaller than n. When range is large, it's better to resort to mergesort or heapsort to guarantee O(nlgn) - if k<=n , it's an n / n algorithm

Binary Search Time Complexity

-every time the array doubles in size, it takes an extra iteration of the algo 2^(# iterations) = (array size) -Binary Search is O(log n) lgn space for recursive, 1 space for iterative

heapq.nlargest complexity

-list of length *M*, get *n* largest items algo: 1) slice n items from M ==> O(n) 2) heapify(list) ==> O(n) 3) for elem in list: heappushpop(list, elem) ==> O( (M-n)logn ) 4) sort() ==> O( nlgn ) Total= n + n + (M-n)lgn + nlgn ==> O( Mlgn) *if M>>>n* - If M=n, Total complexity = nlgn , which is just equal to MlgM - which is the same as simply sorting the list nlargest ==> works with min heap nsmallest ==> works with max heap https://stackoverflow.com/questions/33644065/overall-complexity-of-finding-kth-largest-element-in-python

Sorting - Bucket Sort

1) create n empty buckets 2)put each item into appropriate bucket 3)sort individual buckets 4)concatenate all individual buckets

Abstract Data Type (ADT) vs Data STructure

ADT- can be implemented multiple ways provided they meet function requirements ------- -List, Stack Queue, Priority Queue -------------------- Data Structures - the actual way to implement ------------------ - Linked List, Array, Heap -Priority Queue ADT can be implementedwith a binary heap

Types of Binary Tree: Full (strict/proper)

All internal nodes has 0 or 2 children (never just 1 child) Can be used to represent mathematical expressions

Tree Traversal BFS - Level Order

BFS: Level Order: Top to bottom, left to right Optimal complexity => O(n)

Sorting - Timsort Complexity

Time - nlgn Space - n

Binary Heap Min/Max Condition

-binary heap is a complete tree (totally filled other than the rightmost elements on the last level) -insertions happen at the bottom right most -the root is either the min or max (in python it's the min

Sorting - Selection Sort

-search through array and find min -swap min with first element -advance sorted marker Time - O(n^2) Space - O(1)

Memoization

-store the values for function calls so we don't have to repeat the work -Ex - in Fibonacci sequence - we have many of the same repeated calls ( fib(3), fib(2), etc...) -in python, we can either: 1) add a decorator to the function to cache the first maxsize values: @lru_cache(maxsize=1000) maxsize default is 128 2)Create a dictionary of cached values

Sorting - Binary Insertion Sort vs Normal Insertion Sort

Binary search reduces the number of comparisons (vs normal insertion sort) by using binary search to find the proper location to insert the selected item. Normal insertion sort ==> O(n) comparisons worst case Binary Insertion sort ==> lgn comparisons in worst case Binary insertion sort still has worst case running time of O(n^2) due to swaps requires for insertion

Types of Binary Tree: Balanced

Different of left and right subtree height is max one for all the nodes A tree that balances itself is a "self balanced binary tree" - (AVL Trees, Red-Black Trees)

Naive Sorting vs Heapq.nlargest vs Quickselect vs Introselect - finding kth largest (kth order statistic)

Native/Normal Sort - nlgn / 1 heapq.nlargest - Mlgn / n or Nlgk/k (if M == n : this function just sorts ==> MlgM . This is the naive solution.) (if there's just one item it will call min() ) Quickselect - n^2 / n (mutates the input list to get answer) https://www.youtube.com/watch?v=BP7GCALO2v8 IntroSelect - n / n (quickselect with median of medians as fallback pivot choice) l = [8,3,1,2,9,-2] ; k=2; np.partition(l, k) ==> [-2,1,2,3,9,8] (result not necessarily in sorted order, but the resulting kth element (0-based) is in it's final place) - similar to quickselect by chosing a median of 3, but if recursion level goes above a certain threshold, we fallback to median-of-medians pivot - which guarantees O(n) https://stackoverflow.com/questions/33623184/fastest-method-of-getting-k-smallest-numbers-in-unsorted-list-of-size-n-in-pytho

heapify() complexity

O(n) https://www.geeksforgeeks.org/time-complexity-of-building-a-heap/ O(n) regardless of the branching factor - binary, ternary heaps etc..

Complexity - number of digits (d) in a number (n)

The number of digits (d) in a number (n) is on the order of 10^d d | n ----------- 2 | 1 to 10 (10^1) 3 | 1 to 100 (10^2 n = 10^d lgn = lg(10^d) lgn = d ctci https://stackoverflow.com/questions/50261364/explain-why-time-complexity-for-summing-digits-in-a-number-of-length-n-is-ologn

Sorting - Non-Comparison Sorts - Radix and Counting Sort

radix sort uses counting sort as a subroutine

Types of Binary Tree: Complete

All levels are completely filled except possibly the last level. And in last level nodes are as left as possible. -a perfect binary tree who's rightmost leaves have been removed is called a complete binary tree.

Hash Table - Linear Probing vs Chaining

Chaining --------------- Pros: simpler to implement, very flexible size, less sensitive to hash function or load factors. Cons: wastes space in that some parts of hash table are never used. Open Addressing: ----------------------- Pros: Better cache performance as everything in same table Cons: more computation, table may become full, suffers from clustering

Tree Traversal DFS - Pre, In, Post

DFS: 1) *PreOrder* (diagonal-wise) -Visit ==> Left ==> Right -Used to create a copy or serialize the tree (for later deserializing) 2) *InOrder* (column-wise) -Left ==> Visit ==> Right -In a BST, inorder traversal gives nodes in sorted order 3) *PostOrder* (level wise, starting from bottom, left to right) -Left ==> Right ==> Visit -Used to delete the tree using just O(1) space - since it will delete children before parent (other traversals require more space Complexity for these 3 traversal: O(n)

Rehashing

When the load factor increases to more than pre-defined value (default load factor is .75), the complexity increases. To overcome this, the array (or underlying data structure) is doubled and all values are hashed again against this larger array to maintain a low load factor and low complexity.

Complexity - sort each string in a list, then sort the list itself

['aba', 'cbd', 'cdc'] ===> ['aab', 'bcd', 'ccd'] -n = # elements in list -s = longest string 1) sort each string in list: n * slgs 2) sort the list itself s * nlgn ( comparing each string takes O(s) total = n*s(lgn + lgs)

AVL Tree

a self-balancing Binary Search Tree where difference between height of left and right subtrees cannot be more than one for all nodes (doesn't need to be complete. A complete has its nodes all the way to the left and while an AVL tree keeps the left and right subtree depth the same, it doesn't order the nodes all the way to the left) worst case complexity (search, insert, delete) - lgn

Sorting - Quicksort Complexity

best - nlgn avg - nlgn worst - n^2 space - lgn (n for worst case) (partition uses O(1) space, then multiply by recursion call tree depth of lgn ) -Worst case occurs when pivot is always greatest or smallest element (already in ascending or descending order, or elements all the same). If we know arrays are near sorted, we don't want to use quicksort. - n^2 problem can be mitigated by choosing a random or avg pivot value. Worst case can still occur if max (or min) element is always chosen as pivot -best case occurs when partition divides list into nearly two equal pieces. Each recursive call processes a list half the size

Linked List Time Complexity

insert/delete - O(1) *(removing last element w/ no tail can be O(n) get item - O(n)

Hash Table

-a data structure that implements an associative array abstract data type, a structure that can map keys to values -allows you to do lookups in constant time -take some value ==> convert value based on formula ==> spits out coded version of value ==> index into hash table -common pattern is to take modulo/remainder of last few digits of big number

Graph Connectivity

-measures minimum # of elements that need to be removed for a graph to become disconnected/disjoint -In left group, we can remove one connection and it becomes disconnected -Sometimes can use connectivity to answer which graph is 'stronger'

Queue Types: 1)Standard Queue

FIFO data structure (ex - a line of ppl) -front - oldest element in Q (front of line) -back- newest added elements enqueue() - add element to back dequeue() - remove element from front -peek() - look at front element

Complexity - Fibonacci

O( 2^N) -for recursive calls ==> O(branches^depth) -a tighter runtime is O(1.6^N) since at the bottom of call stack there is sometimes 1 recursive call instead of 2 space - O(n)

Complexity - looping from 1 to (x^2<=n)

O( sqrt(n) )

Complexity - Fibonacci and alternatives

recursive - 2^N / N recursive memo - N/N iterative - N/1 matrix mult - Lgn/1 formula - 1/1

AVL Tree vs B-Tree

-both are height balanced (self-balancing) tree which allows for all operations in log(n) time 1) AVL -better for in memory use where random access is cheap - a binary tree - just 2 children 2) B-Tree - not a binary tree - nodes can have more children -better for disk-backeds torage, because they group a larger number of keys into each node to minimize the number of seeeks (for read/write operations) https://stackoverflow.com/questions/2734692/avl-tree-vs-b-tree

Efficient Hash Function

-choose between 1) hash function that spreads out values evenly, but uses a lot of space 2) uses less buckets, but might have to searching within each bucket -Hashing questions are popular because there's never a perfect solution - you're expected to talk about upsides and downsides of whatever you choose. Do your best to optimize your hash function.

HashTable Collisions - Separate Chaining (open hashing)

-each cell of table points to a linked list -Pros: simple to implement, the table never fills up and we're always able to add more elements to chain -Cons: wastes space in extra linked list, slower lookup (O(n) worst case if all elements in one linked list

Hash Table Load Factor

-gives a sense of how full a hash table is. It's also the expected length of a chain (if we're using chaining) load factor less than 1 => mostly empty spaces, wasting space load factor more than 1 => collisions α = n/m (m => table size, n=> elements) -as long as you set you're table size to m=n, we are guaranteed O(1) operations

Binary Tree

-has at most two children (0,1, or 2 children)

Hash Table Collision

-two different inputs generate the same hash value/index -fixed via chaining, open addressing, or double hashing

Graph - Weakly Connected vs Strongly Connected vs Complete

-weakly and strongly generally apply to directed graphs *Weakly connected* - if considering as an undirected graph, it is connected (there may not be a path b/w some pairs of vertices) *Strongly Connected* - for a directed graph, there is a path between all pairs of vertices. *Unilaterally connected* - semi-path (touches all vertices). There's a path from A to B, but not necessarily B to A *Complete* - an undirected graph - there is a path between every pair of nodes

Sorting - Radix sort complexity

n => nums in array d => max number of digits b => base (usually using base 10) - each step of counting sort takes O(n+k) where k is the range - the range is always 0 to 9 ==> b , so O(n+b) - we do an O(n+b) operation for each digit, so final big o ==> O( d * (n+b) ) - very good if range of input is limited , compared to number of elements

Set Complexity for Set Operations (union, intersection, difference)

s1= {1,2,34,5} s2 = {4,5,6,7} Union s|t ==> O( s + t ) -loop over both sets s1 | s2 = {1,2,3,4,5,6,7} Intersection s&t ==> O( min(s, t) ) -loop over smaller length set to see if it's in larger set s1&s2 = {4,5} Difference s-t ==> O( len(s) ) -loop over one of sets and check if values are not in other s1 - s2 = {1,2,3}

Sorting - Merge Sort complexity

(# iter) * (# comparison @ each iter) 2^iter = n ==> iter = log(n) # comparison @ each iter = n Time Complexity: *O(n lgn)* Space complexity: *O(n)* -Space complexity - we only use 2 different arrays at each step, the original one and the new one we're copying into https://stackoverflow.com/questions/10342890/merge-sort-time-and-space-complexity

Tree Terminology - Internal Nodes, External Nodes / Leaf

*Internal Nodes* => all nodes except leafs (if there's only one node it's an external) *External Nodes / Leafs* => nodes at the end

Binary Tree - Search(item), Delete(item), Insert Complexity

*Search* - O(n) (any traversal algo is linear since BT there's no order to nodes) *Delete* - O(n) (starts with search since we need to find what we want to delete. Deltions are tricky, sometimes need to promote grandchildren up and/or re-arrange nodes) *Insert* - avg=> lgn, worst=> n

Tree Terminology - Levels, depth, height

*level* => how many connection it takes to reach root from leaf + 1 *height of a node* => number of edges between node and furthest leaf *depth of a node* => number of edges to the root -height and depth move inversely -root height is highest, leaf depth is highest Max # of tree nodes at Level => 2^(L-1) Max # of nodes in tree of height h => (2^h) - 1

Ω, O, Θ - Complexity

*Ω* / Big Omega - lower bound -Printing an array is Ω(n), Ω(lgn), and Ω(1) *O* => upper bound (oh) -Printing an array is O(n), O(n^2), O(2^n) *Θ* / Big Theta => both O and Ω - a tight bound on runtime -Industry meaning of big O is closer to what academics mean by Θ. In interviews we always try to offer a tightest description of runtime.

Sorting - Insertion Sort

-select the first unsorted element, swap elements until the unsorted element is in the correct position -advance the sorted marker Time - O(n^2) Space - O(1) best case - input array already sorted. Just one comparison for each item in the array. worst case - array in reverse order. -very good for small arrays (even better than quicksort). Good quicksort implementation will use insertion sort for arrays smaller than a threshold.

Tree DFS Traversal - Recursive vs Iterative vs Morris

1)Iterative (best time performance) - N / N 2) Recursive (simplest to write) - N/N 3)Morris (constant space) - N / 1 - morris finds the predecessor of the root and sets the predecessor's right to point to itself - so it knows how to go back up the tree (in the iterative and recursive implementations you would need a stack or call stack that would allow us to go back up from where we called) -iterative and recursive are one pass, morris is 2 pass but still considered linear time

Stack - implemented with a list

-*can use append() and pop() on a simple python list to implement a stack* push - O(1) pop - O(1)

Graph

Data structure designed to show relationships between objects. (also called a network) -Tree is as specific type of graph -can have cycles , can start anywhere -no root node -nodes generally store data, but edges can also store data in the form of weights

Hash Table Complexity of Hashing String

O(s) where s is the length of the input key -in python each type has a hashing function (or __hash__ for custom) - for a *string it loops through each char* -this is a different variable

DFS / BFS Complexity

Tree: DFS Pre / In / Post: V / h BFS Level Order: V / V Graph: DFS Iterative Recursive: V+E / h -or- b^d / d BFS Iterative: V+E / V -DFS/BFS Time complexity varies by graph representation (adj list (v+e) or adj matrix (v^2) ) -DFS/BFS space complexity varies with type of tree - with a skewed/degenerate tree, BFS space is 1 and DFS space is V

Priority Queue

ADT that is an extension of queue w/ following properties: 1) every item has a priority 2) element with highest priority dequeued before lower priority 3) 2 elements w/ same priority are served according to order in queue Operations: ----------------- insert(item, priority): Inserts an item with given priority. getHighestPriority(): Returns the highest priority item. deleteHighestPriority(): Removes the highest priority item. -used in algos like Djikstra's shortest path, and Prim's min spanning tree

Types of Binary Tree: Perfect

All internal nodes have exactly 2 children. All leaf nodes on the same level/depth. All perfect trees are complete, but not vice versa

Tree Definition and Properties

Trees are a restricted form of a graph - directed (one direction), acyclic (no cycles) graphs (Directed acyclic graph) -trees are an extention of a linked list Properties: 1) tree must be connected (can't have an unconnected node) 2)no cycles (acyclic

Python Heapq operations

h = [3,1,2] *heapq.heapify(h)* - creates a min heap O(n) *heapq.heappush(h, x )* - push and maintain heap O(lgn) *heapq.heappop(h)* - pop and maintain heap O(lgn) *heapq.heapreplace(h,x)* - pop then push O(lgn) *heapq.heappushpop(h,x)* - push then pop O(lgn) * heapq.nlargest(n, arr) - Nlgk for kth largest of arr size N heapq.merge(*it) - merge multiple sorted inputs into single sorted output

Binary Heap and Complexity (search, insert, delete, extractMax/extractMin, peek)

heap w/ at most two children. Often stored in an array. -must be complete (all levels except last are full, and all nodes all the way to the left) . New values added @ bottom level left to right. -being complete guarantees the correct shape and worst case O(lgn) Search: Avg - N, Worst - N ( Insert: Avg - 1, Worst - lgn Delete: Avg- lgn, Worst - lgn Peek: O(1)

Red-Black Tree

not as good at self-balancing as AVL, but good enough to guarantee lgn worst case Rules: --------- -each node Red or Black -root is Black - All (nil )leaves are black - Red node has two black children - every path from node to leaf has same number of black nodes worst case complexity (search, insert, delete) - lgn

Directed Graph Cycle Detection

1) BFS with Q and indegrees -this method is the best I found since it gives both topo sort list and ability to cycle detect 2) normal DFS with recStack tracking list - keep track of vertices currently in recursion stack - if we reach a vertex that is already in the recursion stack, there's a cycle (for disconnected graph we can check each individual connected graph for cycle) 3) DFS graph coloring algorithm - -White ==> all V initially white (not processed) - Gray ==> vertex being processed (DFS for this vertex started by all descendants not processed yet) - Black - vertex and all it's descendants are processed -if we encounter edge from current vertex to gray vertex, then this is a back edge and we have a cycle

Sequential/Linear Search

search every item sequentially O(n)

Djikstra

single source shortest path

Binary Search Tree (BST)

every value on the left is smaller, every value on right is larger

Heap

specific type of tree where root element is either max or min value max-heap => all parents >= children min-heap => all parents <= children -generic heap can have any number of children. Binary heap can have at most two children -often stored in an array (less space storage vs using Nodes with a bunch of left/right pointers)

DFS vs BFS Time Complexity Adj List vs Adj Matrix

-Adj Matrix: BFS and DFS time => O(V^2) -Adj List: BFS and DFS time => O(V+E)

DFS vs BFS space complexity

-DFS stack (or call stack) has at most height (for an skewed/degerate tree, this is V or N) -BFS generally has the whole level in queue at once in a n-ary tree. For a tree with

DFS - Tree/Forward/Back/Cross Edge

-Tree Edges - part of path explored by dfs - Back edge - up the tree edge hierarchy - Forward edge - down the hierarchy -cross edge - edge across 2 vertex w/ no hierarchy https://cs.stackexchange.com/questions/11116/difference-between-cross-edges-and-forward-edges-in-a-dft

Topological Sort

-linear ordering of vertices in a directed graph such that every directed edge u==>v , u comes before v in the ordering. Similar to how build packages like pip or npm must resolve dependencies first -must be acyclic and directed graph to do topological sort 2 ways : 1) DFS recursive way O(V+E) / O(V) - deque, and visited set - for each vertex i : if i is not in visited" call toposortutil(i, deq, visited) toposort util(): - add to visited, for each neighbor, if i not in visited 2) BFS with in-degree (Kahn's Algorithm) O(V+E) / O(V) - compute in-degree of all vertices, and initialize count of visited nodes as 0 - add all vertices with in-degree of 0 to queue while q: pop an item, add it to visited, loop through all neighbors and decrease their in degree - if we decrease an indegree to 0, add to q - add 1 to cnt if cnt != # of vertices: we have a cycle else: print( top)order)

Union Find

-makeSet(), union(), find() - all vertices start out as disjoint data structure and successively get merged into a tree with parents Naive Complexity Union - O(V) or O(N) Find - O(V) or O(N) Union Find with Path Compression and Ranked-Union Union - amortized constant Find - amortized constant makeSet() still O(N) Algorithm Pseduocode for cycle detection Main/Driver Algo: 1) Initialize - set parent array representing vertices all to 1. Also set visited set 2) loop through each edges that aren't visited (nested for loop) x = findparent(i) y = findparent(j) if x==y: return True ( we have a cycle) union(parent, x,y) findparent(parent, i) - search until we find top most parent of this set if parent[i] <= -1: return i else: findparent(parent, parent[i]) # find parent recursively union(parent, x,y) xSet = findparent(parent, x) ySet = findparent(parent,y) parent[xSet] = ySet

Union Find - use cases

-union find can be used when you have simply a list of edges and don't want to build the graph (adj list or adj matrix) for it and do dfs -union find is also good for adding edges dynamically in a stream - such as in Kruskal's algorithm

Bidirectional Search vs Traditional BFS Search

-used to find the shortest path b/w source and destination. 1) Traditional BFS Search - search K nodes on first level, then K nodes for each of the first K nodes ==> O(K^2) - we do this d times so total O(K^D) 2) Bidirectional Search - run two BFS simultaneously from start and destination - each search collides after approx d/2 levels (midpoint of the path. O(K^(d/2) ) -Bidirectional search is faster by a factor of K^(d/2) [ K^(d/2) * K^(d/2) = K^d ]

heappushpop() vs heapreplace()

1) *heappushpop()* - push first then pop. (Ie.. if the heap is empty, you can pop the same element you pushed). Used in heapq.nlargest(k, arr) z = [-1, 5, 0, 2, -3] heapq.heapify(z) # [ -3, -1, 0, 2, 5] heapq.heappushpop(z, 4) # return -3 # return top of stack -3 and reheapify O(lgn) 2)*heapreplace()* - pop first then push. (error on empty heap) More efficient than heappop() followed by heappushpop(). Best with a fixed-size heap. https://stackoverflow.com/questions/33701160/python-heapq-difference-between-heappushpop-and-heapreplace

Shortest Path (BFS vs Djikstra)

1) BFS => shortest path for unweighted graph V+E / V (space may be O(V^2)) 2) Djikstra => shortest path for weighted edge graph V+ElgV / V+E (if implemented using a min heap /priority q - uses an array for the heap - space complexity can be V https://stackoverflow.com/questions/50856391/what-is-the-space-complexity-of-dijkstra-algorithm

Strongly Connected Components (SCC) - Kosaraju and Tarjan algorithm

1) Kosaraju Algorithm - two pass - O(V+E) / O(V) dfsutil(v) - for each vert, run dfs - after dfs add the child to stack - reverse (transpose) all the graph directions - clear the visited set - pop each item from the stack, and do a dfs , adding items to the visited. When dfs for all children of vertex done, add vertex to this component. This will build the different components 2) Tarjan - one pass O(V+E) / O(V) - similar to DFS topo sort (in fact it generates topo sort as a byproduct)

Types of Binary Tree: Degenerate

Every parent node has only one child, either left or right -suffer same performance as a linked list (slightly slower since in tree we check left and right)

Sorting - HeapSort

Heapsort uses a heap (complete binary tree - all nodes except last filled, and all nodes all the way to the left). From left to right, down, each element of the input array become nodes. 1)build a max heap (heapify() ) 2)swap the first and last element - this moves the highest element to the end (similar to a selection sort) 3) delete the node from the heap -efficient for priority queues - as the heap method supports insert/delete/extract l = [4,10,3,5,1] heapify(l) [heappop(l) for _ in range(len(l))] # [1, 3, 4, 5, 10]

Union Find Path Compression and Union-by-Rank

Path Compression - parent of set becomes the topmost parent - this reduces the time we have to search - on each find query, compress the path: def find(x): if parent[x] == float('-inf'): return x else: parent[x] = find(parent[x]) # path compression return parent[x] Union by Rank - attach the shorter tree to the taller tree (in parent array we can store the number of nodes underneath array) - store a rank that shows the depth (only the top node stores rank -using both union by rank and path compression we achieve we get amortized constant for union and find

BST - Search, Insert, Delete Complexity

Search, Insert, Delete: avg=> lgn worst case => n -worst case when we degenerate to linked list

Priority Queue implementation array vs heap

Simple Array: ---------------- get/deleteHighestPriority O(1) (get first element) insert() O(n) (append) Linked List ----------------------- get/deleteHighestPriority - N (slightly better on delete) insert() - 1 As Heap Array ----------------- insert() - lgn getHighestPriority() - 1 deleteHighestPriority() - lgn

Simple vs multigraph Graph

Simple graphs - self loops not allowed Multigraph - self loops allowed

Undirected Graph Cycle Detection

Union-Find Algorithm can be used to check whether an undirected graph contains cycle or not. This method assumes the graph doesn't have any self loops

Sorting - Locality of Reference (linear search, quicksort, mergesort, heapsort)

access located around a small number of memory locations linear search - over an array has best locality of reference. Over a linked list has bad locality of reference Quicksort - partition strategy generally looks/swaps values in array indexes that are close to each other Mergesort - in the final step it could be access locations n/2 away at best Heapsort - will compare against values at locations that are twice or half the index of the current element (in max_heapify() ) linear search array > quicksort > mergesort > heapsort

BFS Shortest Path

add each path to the queue and operate one tradeoff is higher space complexity where all paths can end up in queue (max E is V^2) - so space complexity can be V^2 https://www.geeksforgeeks.org/building-an-undirected-graph-and-finding-shortest-path-using-dictionaries-in-python/

Python *List* Time Complexity

append - O(1) (amortized) pop - O(1) insert and remove - O(n) get item - O(1) *in* membership testing - O(n) -largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). -insertion and deletion messy (due to fixed sizes -good for accessing elements in the middle -If you need to add/remove at both ends, consider using a collections.deque instead.

Binary Tree - Level vs number of nodes

each new level can have twice as many nodes as the one before it. We're adding a power of two at each level, # nodes at Depth d ==> 2^D total nodes at Level L and lower ==> 2^L - 1

Sorting - Timsort

hybrid algo that uses merge sort (for large sequences) and binary insertion sort (for small sequences) -looks for increasing/decreasing sequences -incredibly fast for nearly sorted data

Trie (Prefix tree)

n-ary tree in which characters are stored at each node. Each path down the tree may represent a word - and isusually terminated with a * or null node -a node in a tried could have anywhere from 1 through Alphabet_Size +1 children -A trie can check if a string is a valid prefix in O(K) time where K is the length of the string. (Same as hash table looking up string of length K (must hash each char) ) -used in many problems involving a list of valid words or in situations where repeatedly look through prefixes (check M, then MA, then MAN - and check at a certain node if it has a child of Y for MANY

Sorting - Stability

stability means that equivalent elements retain their relative positions, after sorting. Ie: Useful to preserve the order if you want to sort on multiple values with the same last name, but different first name 1) stable by default: Insertion sort, Merge Sort, Bubble Sort 2)unstable by default: Quick sort, heap sort, selection sort, shell sort -Any given sorting algo which is not stable can be modified to be stable. -Radix sort requires that the underlying sorting algo be stable

Graph Representation - Adjacency List vs Adjacency Matrix (optimal implementation and complexity )

---------------------------------- Adj Matrix - dictionary of lists -better for dense graphs - better for weighted edges ---------------------------------- add/remove vertex => V Add/remove edge => 1 query => 1 space => V^2 ---------------------------------- Adj List- dictionary of sets -better for sparse graphs ---------------------------------- add vertex => 1 remove vertex => V Add/remove edge => 1 query => 1 space => V+E

Heap vs Binary Heap

-Heap is any number of children, binary heap is 2 -binary heap is a *complete tree* - all levels except last filled - won't degenerate

Graph Edges

-Nodes are people, but the edges between the people could be describing many things: people who met each other, or people who lives in the same city at the same time, or people who worked on a project at the same time. -information we decide to store, will depend on use case

Sorting - Quicksort vs MergeSort

-Quicksort is faster for smaller datasets, and arrays since it has good locality of reference -Mergesort is better for stability, large datasets (especially those that don't fit in memory), or linked list (LL don't have good locality of reference and work better for the merge operation). You're also guaranteed nlgn worst case.

Sorting - Quicksort

-a divide and conquer sorting algo (like mergesort) -picks an element as pivot and places all items less than pivot to the left, and all items greater than pivot to the right -Recursively quicksort the left and right halfs -in-place but not stable

Hash Table vs Arrays, Linked List, BST, Direct Access Table (Complexity of search and ins/del)

-arrays and linked list suffer search or insert/delete problems -balanced BST is lgn for all ops -direct table access is constant for all ops, but the size of the table would have to be huge (possibly larger than what could be represented -a HashTable is constant for all operations on avg

Queue Types: 3)Priority Queue

-assign each element a numerical priority on insertion -on dequeue, remove element with highest priority -in a priority tie, remove oldest element -Priority queue is an abstract data type, it can be implemented in many ways, although it's typically implemented as a heap (heap must be a tree (no cycles), not necessarily binary either)

Sorting - Bucket Sort Complexity

-best case => uniform distribution over buckets => *O(n+k)* (k buckets) -worst case => When input keys are close to each other (clustered), results in buckets containing more elements than avg. All elements in single bucket and performance dominated by sorting algo => *O(nlgn)* for merge/quicksort Space - n

Sorting - Bubble Sort (sinking sort)

- in each iteration, largest element will "bubble" to the top -in place, stable, sorting algo A better version of bubble sort, known as modified bubble sort, includes a flag that is set if an exchange is made after an entire pass over the array. If no exchange is made, then it should be clear that the array is already in order because no two elements need to be switched. In that case, the sort should end.

Degree, In-degree, Out-degree, Degree of Graph

------------------------ Undirected Graph ------------------------- Degree - The number of edges that are connected to vertex (loop edges count as 2) ----------------------- Directed Graph ----------------------- 1) In-degree - number of edges incoming to a vertex 2) Out-degree - number of edges outgoing from a vertex (directed loop edge count as 1 out-degree, and 1 in-degree ) Degree of Graph - sum of degrees of all vertices

Sorting - In Place vs Not In Place

- In Place ==> *sorted without having to copy to a new data structure* -In place have lower space complexity, since we're not recreating the data structure -tradeoff b/w space and time complexity - won't matter for small arrays, but if you have millions of elements, it makes a huge difference In Place : Bubble sort, Selection Sort, Insertion Sort, Heapsort. Not In-Place : Merge Sort. Note that merge sort requires O(n) extra space.

Binary Search Time Complexity alternate way of thinking

-each time we split the array in half we have: n/2, n/4, n/8 etc... items left. When we split enough we get n/2^i = 1 (where i is the number of comparisons) 1 = n/2^i 2^i = n i * log(2) = log n

Python method to use max-heap

-heapq is a min heap. heapq does not expose a max-heap, you need to: 1) invert the values and insert into heapq 2) when you pop (or return element) multiply with -1 to get original value

Hash Table String Keys

-key can be a string - one way is using ASCII values combined w/ a hash formula -can generally use ASCII if you ahve 30 or less string keys -if you use all letters in string key, the hash value becomes huge and memory may not be able to represent it as an index.

Complexity - Computing first n fibonacci numbers

-many people think it's O(n * 2^n) - but in each call to fib(n) , n is changing, so the total work is: fib(1) => 2^1 steps fib(2) => 2^2 steps etc... 2^1 + 2^2 + 2^3 +....+ 2^n ==> 2^N O(2^N) - The sum of powers 2^0 + 2^1 + 2^2 + 2^3 +....+ 2^n roughly equals 2^(n+1) - 1

Sorting - Merge Sort

-mergesort is an example of *divide and conquer* - the idea of breaking array up, sorting all the parts, than building it back up again. (several sorting algos use this principle) -worst case - reversing a list

Sorting - Bubble Sort Complexity

-overall we do (n-1) iterations and at each iteration we do (n-1) comparisons Time - O(n^2) (best case O(n) - only 1 number needing to bubble up or already sorted) Space - O(1) (in place)

HashTable Collisions - Open Addressing / Probing (closed hashing)

-when a collision occurs, we look for the next open space (in linear) (also quadratic, double hasing etc..) -CPython uses random probing, where the next slot is picked in a pseudo random order

Sorting - Heapsort complexity

-worst case - swapping value from bottom of tree to top is lgn swaps (height of tree is lgn) . This swapping is called n times ==> O(nlgn) Time - nlgn Space - 1

Collection Data Structure Comparisons

Lists - indexed elements. good for accessing elements in the middle. insertion and deletion messy O(n). Has potentially unused memory space Linked List - better for insert/delete O(1), but difficult to access elements in the middle Stack - easy to implement with a LL, easy push and pop O(1) Queue - easy to implement, fast enqueue/dequeue

Counting Sort

O(n + k) / O(n + k) (where k is the range of elements (max(arr) - min(arr) 1) Create a count array of range k, and store each value of input array 2) modify count array to store sum of previous counts 3) output each object from input sequence and decrease it's count by 1 arr = [ 5,3,1] 1) count = [0* for _ in range(len(arr))] for num in arr: count[num] += 1 2) for i in range(1, len(count)): counts[i] += counts[i-1] 3) output = [0 for i in range(len(arr))] for num in arr: output[ counts[num] - 1] = num counts[num] -= 1 [5,3,1] count = [0, 0, 0, 0, 0, 0] count = [0, 1, 0, 1, 0, 1] count = [0, 1, 1, 2, 2, 3] output = [1, 3 , 5] - can offset index to start at 0 https://stackoverflow.com/questions/49184457/counting-sort-negative-integers

Python heapq operations complexity (push, pop, pushpop, replace, nlargest, heapify )

push - lgn pop - lgn pushpop - lgn replace - lgn nlargest - Nlgk (if N>>>k) heapify - n merge - ?

heapq.nlargest min heap will get rid of largest or smallest values?

python heapq is a min heap, so if we call heapq.nlargest on an array, it will get rid of the smallest values and retain the nlargest values - to find the kth smallest value, we can call heapq.nsmallest, which will get rid of the largest elements and retain the nsmallest (internally it will negative all the values and use a min heap)

Queue Types: 2)Deques (double-ended queue)

queue that goes both ways. Can enqueue or dequeue from either end. -generalized version of both stacks and queues - since you can represent either of them with it: a)Stack - add and remove from the same end (either end) b)Queue - add on one end, remove on the other -Operations on right side ==> pop(), append(), extend() -Operations on left side ==> popleft(), appendleft(), extendleft()

Complexity - looping powersOf2 (halving a number until it reaches 0 or 1)

the number of times we can divide n by 2 until we get down to the base case. O(lgn) page 54 ctci

Python Data Structures / Algorithms

Ensembles d'études connexes

occ ch.3 SB

PREP U CHPT 4

UNIT 14: Real Estate Financing- Principles

Adult Endocrine Exam 3

EBIO 1220 Exam 3

Geology Exam

Personality Exam 1

Combo with "evolution" and 1 other

Management Chapter 10

MRO

Wrist/Hand Extensors & Forearm Supinators

Puritan Bennett 840

Finance 421 HW Ch: 1,4,5,6,7,9

Psychology Module 4

History-1300 Final Exam

Core 201 Week 3 Outlining a Speech Obj 2

Digital Campaign Strategies

UNIT 6 TEST

PCC IV Unit 3 Addiction

PSYC 345 CH 3 TAMU