Algorithms & Data Structures
Steps to resizing:
1. Double table size to nearest prime number 2. Re-hash items from old table into the new table.
Queue
A FIFO (First In First Out) data structure, where the first element added will be the first to be removed, and where a new element is added to the back, much like a waiting line.
Binary Tree
A data structure that consists of nodes, with one root node at the base of the tree, and two nodes (left child and right child) extending from the root, and from each child node.
Graphs
"Uber" data structure. Shows connections between objects. Can be displayed as either a matrix or linked list representation.
Tree Buckets
+--WC = O(logN) +--no wasting space +--dynamically sized -- more complicated than what's needed. --> insert with dups= O(1) --> W/o dups = O(N)
Chained bucket:
+--easy to implement +-- buckets can't overfill +-- buckets won't waste time. +-- buckets are dynamically sized.
Probe Hashing:
-> Hash it, and if it leads to a collision, use a separate equation to determine the step size and use that step size to find a new site.
Collision Hashing using Buckets
-Each element can solre than one item. -throw collisions into a bucket. -buckets aren't sorted.
HashCode Method:
-method of OBJECT class -Returns an int -default hash code is BAD-- computed from Object's memory address. --> must override
Unordered Linked List
Data structure with non-efficently supported operations. Is unordered. Has a worst case cost of search and insertion at N, an average case cost of insertion at N, and an average case cost of searching at N/2.
Collesion handeling:
How you handle the collisions so each element in the hittable stores only one item.
Static Memory
Memory allocated to an array, which cannot grow or shrink once declared.
Contiguous Memory
Memory that is "side-by-side" in a computer, typical of an array structure.
Doubly Linked List: memory
Memory: O(3n) (LL: O 2n)
Array: memory
Memory: O(n)
3-Way Quick Sort
Non-stable, in place sort with an order of growth between N and NlogN. Needs lgN of extra space. Is probabilistic and dependent on the distribution of input keys.
Bucket Sort
O(n+m) where m is the # of buckets.
Prime number Tables
Reduce the chance of collision.
Combinations
Repetition is Allowed: such as coins in your pocket (5,5,5,10,10) No Repetition: such as lottery numbers (2,14,15,27,30,33) https://www.mathsisfun.com/combinatorics/combinations-permutations.html
Replica
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
Insertion sort
Side-by-side comparison Best: O(n) Avg: O(n^2) Worst: O(n^2)
Parameter Passing
Small, no modification - value Large, no modification - CONST reference modified - pointer
Quick Sort
Unstable, O(n log n) for a good pivot,O(n^2) for a bad pivot Ω(n log n) : Uses partitioning O(n), Pick a median of 1st, middle, and last element for pivot. Random selection is also good, but expensive. Algorithm can be slow because of many function calls.
Memoization
What happens when a sub problem's solution is found during the process of Dynamic Programming. The solution is stored for future use, so that it may be reused for larger problems which contain this same subproblem. This helps to decrease run time.
Exception handling
When there's an error, the program makes an error object and passes it off to the runtime system, which looks for a method in the call stack to handle it.
Anagram
a word, phrase, or name formed by rearranging the letters of another, such as cinema, formed from iceman.
Algorithm Analysis
how long it takes a computer to do something
HashTable<K,V> & HashMap<K,V> class
java.util implements map<K,V> interface K-- type paramater for key and v-- type parameter for associated value Operations: lookup, insert, delete. Constructor lets you set init capacity and load factor handles collisions with chained buckets hash map only allows null for keys and values
The more items a table can hold, the () likely a collision will happen.
less
Hash function
takes an object and tells you where to put it.
Hash collision
two (or more) keys hash to same slot
What kind of Collection is Hashing?
value-orientated.
Open addressing
Uses probes to find an open location to store data.
Load Factor
#items(n) / table size
Array Bucket
-- a bucket of arrays. -Fixed in size. -size of about 3 work usually well.
Red Black Trees
1. Every node is Red or Black 2. The root is Black 3. If a node is red, it's children must be black 4. Every path from a node to a NULL pointer must contain the same number of black nodes
Good Hash Function qualities:
1. Must be deterministic: -> Key must ALWAYS generate the same Hash Index (excluding rehashing). 2. Must achieve uniformity -> Keys should be distributed evenly across hash table. 3. FAST/EASY to compute -> only use parts of the key that DISTINGUISH THE ITEMS FROM EACH OTHER 4. Minimize collisions:
Important Sorting Assumptions
1.Sorting array of integers 2. Length of array is n 3.Sorting least to greatest 4.Can access array element in constant time 5.Compare ints in array only with '<' 6.Focus on # of comparisons
Dictionary: definition
A data structure that maps keys to values.
1D Array
A linear collection of data items in a program, all of the same type, such as an array of integers or an array of strings, stored in contiguous memory, and easily accessed using a process called indexing.
Linked List
A linear data structure, much like an array, that consists of nodes, where each node contains data as well as a link to the next node, but that does not use contiguous memory.
Parent Node
A node, including the root, which has one or more child nodes connected to it.
cycle
A path of positive length that starts and ends at the same vertex and does not traverse the same edge more than once.
Pop
A process used in stack and queue processing where a copy of the top or front value is acquired, and then removed from the stack or queue (Dequeue).
Peek
A process used in stack and queue processing where a copy of the top or front value is acquired, without removing that item.
Push
A process used in stack and queue processing where a new value is inserted onto the top of the stack OR into the back of the queue (Enqueue).
Linear Data Structure
A programming data structure that occupies contiguous memory, such as an array of values.
Heap
A type of priority queue. Stores data which is order-able. O(1) access to highest priority item.
Head
A typical object variable identifier name used to reference, or point to, the first object in a linked list. The number one rule for processing linked lists is, 'Never let go of the head of the list!", otherwise all of the list is lost in memory. The number two rule when managing linked lists is, 'Always connect before you disconnect!'.
Array Index
A value that indicates the position in the array of a particular value. The last element in a zero-indexed array would be the length of the array, minus 1.
Array length
A value that represents the number of elements contained in an array. Often there is a process associated with an array that provides this value, such as list.length, or len(list).
Array: access, search, insert, delete
Access: O(1) Search: O(n) Insert: O(n) Delete: O(n)
ArrayLists: advantages, disadvantages
Advantage: advantages of an array, plus does not run out of space Disadvantage: inserting can be slower than an array
Graph: advantage, disadvantage
Advantage: best models real-world situations Disadvantage: can be slow and complex
Stack: advantage, disadvantage
Advantage: quick access Disadvantage: inefficient with an array
Array: advantage, disadvantage
Advantage: quick insert, quick access if index is known Disadvantage: slow search, slow delete, fixed size
Doubly Linked List: advantage, disadvantage
Advantage: quick insert, quick delete Disadvantage: slow search
Stack
An abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the last element that was added. LIFO - Last In First Out
2D Array
An array of an arrays, characterized by rows and columns, arranged in a grid format, but still stored in contiguous, or side-by-side memory, accessed using two index values.
Ragged Array
An array where the number of columns in each row may be different.
Row Major
An array where the two index values for any element are the row first, then the column.
Node
An object linked to other objects, representing some entity in that data structure.
Iterators
An object that knows how to "walk" over a collection of things. Encapsulates everything it needs to know about what it's iterating over. Should all have similar interfaces. Can read data, move, know when to stop.
set
An unordered collection (possibly empty) of distinct items called elements of the set.
Reduction
Analysis pattern. Use a well-known solution to some other problem as a subroutine.
Aggregate Data Types
Any type of data that can be referenced as a single entity, and yet consists of more than one piece of data, like strings, arrays, classes, and other complex structures.
Load Factor
Approximately how it's full... 0.7-0.8.
4 Rules of Recursion
Base Cases: You must always have some base cases, which can be solved without recursion. Making Progress: For the cases that are to be solved recursively, the recursive call must make progress to a base case. Design rule: Assume that all recursive calls work Compound Interest Rule: Never duplicate word by solving the same instance of a problem in separate recursive calls.
Extraction:
Breaking keys into parts and using the parts that uniquely identify with the item. 379452 = 394 121267 = 112
Linear Probing
Checks each spot in order to find available location, causes primary clustering.
Quadratic Probing
Checks the square of the nth time it has to check, causes secondary clustering. Not guaranteed to find an open table spot unless table is 1/2 empty.
Folding:
Combining parts of the key using operations like + and bitwise operations such as exclusive or. Key: 123456789 123 456 789 --- 1368 ( 1 is discarded)
Abstract Data Types
Consists of 2 parts: 1. Data it contains 2. Operations that can be performed on it
Hash Table
Constant access time (on average).
What would the Perfect Hash Function be?
Each Key maps to an unique Hash Index.
Weighting:
Emphasizing some parts of the key over another.
Compressing:
Ensuring the hash code is a valid index for the table size.
Collision
Entering into a space already in use.
Rehashing
Expanding the table: double table size, find closest prime number. Rehash each element for the new table size.
queue
FIFO list in which elements are added from one end of the structure and deleted from the other end.
Selection sort
Find smallest, put at beginning Best: O(n^2) Avg: O(n^2) Worst: O(n^2)
Queues
First in, first out. O(1)
Arithmetic progressions
For p < -1, this sum always converges to a constant.
Relaxation
Getting from A->C more cheaply by using B as an intermediary.
Trie
Has only part of a key for comparison at each node.
HashMap underlying structure:
HashTable with chained buckets
Idea of probing:
If you have a collision, search somewhere else on the table.
One-Sided Binary Search
In the absence of an upper bound, we can repeatedly test larger intervals (A[1], A[2], A[4], A[8], A[16], etc) until we find an upper bound, the transition point, p, in at most 2[log p] comparisons. One sided binary search is most useful whenever we are looking for a key that lies close to our current position.
Key
Information in items that is used to determine where the item goes into the table.
ArrayLists: insert
Insert: often O(1), sometimes more
Shellsort
Insertion sort over a gap Best: O(n log n) Avg: depends on gap sequence Worst: O(n^2)
Indirect Sorting
Involves the use of smart pointers; objects which contains pointers.
Stable Sorting Algorithm
Items with the same key are sorted based on their relative position in the original permutation
stack
LIFO list in which insertions/deletions are only done at one end.
Stack: definition
Last in, first out.
Standard data structure for solving complex bit manipulation
Lookup table
Lazy Deletion
Marking a spot as deleted in a hash table rather than actually deleting it.
Inversions
Min: 0 Max: n(n-1)/2 Swapping removes 1 inversion
HashMap complexity of basic operations:
O(1)
What is the worst case time complexity for: Insert, lookup, and delete, for hash functions?
O(1)
TreeMap complexity for iterating over associated values:
O(N)
Rehashing Complexity:
O(N)-- costly. Carefully select initial TS to avoid re-hashing.
Complexity for iterating over associated values:
O(T.S + N) --> worst case.
Treemap complexity of basic operations:
O(logN)
B-Trees
Popular in disk storage. Keys are in nodes. Data is in the leaves.
L R N
Postorder traversal (Reverse Polish)
N L R
Preorder traversal (Polish)
Bloom Filters
Probabilistic hash table. No means no. Yes means maybe. Multiple (different) hash functions. Can't resize table. Also can't remove elements.
Quadratic Probing:
Probe Sequence is (Hk+1)^2. Minimizes clustering better at distinguishing items across table.
TreeMap underlying Structure:
RBT
Mergesort
Stable sort which is not in place. It has an order of growth of NlogN and requires N amount of extra space. Works by dividing an array in half continuously into smaller and smaller arrays. At the lowest level, these arrays are sorted and then merged together after sorting in the reverse order they were divided apart in.
Merge Sort
Stable, O(n log n), Ω(n log n): Use recursion to split arrays in half repeatedly. An array with size 1 is already sorted.
Bubble Sort
Stable, O(n^2), Ω(n) : Compares neighboring elements to see if sorted. Stops when there's nothing left to sort.
Insertion Sort
Stable, O(n^2), Ω(n) : Swapping elements one at a time starting at the beginning.
Linear Probing:
Step size is 1. Find the index, and keep incrementing by one until you find a free space.
Palindrome
String that reads the same forwards as backwards
Little-Oh
T(n) = 0(f(n)) if T(n) = O(f(n)) and T(n) != Ω(f(n))
Big-Oh
T(n) = O(f(n)) if there are positive constants c & n° such that T(n) <= c * f(n) for all n >= n°
Divide-and-Conquer Recurrances
T(n) = aT(n/b) + f(n)
Big Omega
T(n) = Ω(f(n)) if ∃ positive constants c & n° such that T(n) >= c * f(n) for all n >= n°
Double Hashing
The process of using two hash functions to determine where to store the data.
Articulation Vertex
The weakest point in a graph
Heap Sort
Unstable, O(n log n), Ω(n log n): Make a heap, take everything out.
Selection Sort
Unstable, O(n^2), Ω(n^2) : Iterates through every elements to ensure the list is sorted.
Separate Chaining
Uses a linked list to handle collisions at a specific point.
Collisions:
When the Hash Function returns the same index for different keys.
list
a collection of data items arranged in a certain linear order
data structure
a particular scheme organizing related data items.
priority queue
collection of data items from a totally ordered universe
Algorithm
has input, produces output, definite, finite, operates on the data it is given
L N R
in-order traversal
Chaining
make each slot is the head of a linked list
Average Lower bound for adjacent swaps
n(n-1)/4 Ω(n^2)
Recursive Algorithms
solve a problem by solving smaller internal instances of a problem -- work towards a base case.
Mergesort
split into sub-arrays Best: O(n log n) Avg: O(n log n) Worst: O(n log n)
Mnemonic
A device such as a pattern of letters, ideas, or associations that assists in remembering something
Hash Function
A function that takes in the key to compute a specific Hash Index.
Topological Sort
A linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering.
Set Partition
A partitioning of elements of some universal set into a collection of disjointed subsets. Thus, each element must be in exactly one subset.
Permutations
A permutation is an ordered combination. Repetition is Allowed: such as the lock above. It could be "333". No Repetition: for example the first three people in a running race. You can't be first and second. https://www.mathsisfun.com/combinatorics/combinations-permutations.html
Greedy Algorithms
Algorithm design patterns. Compute a solution in stages, making choices that are local optimum at step; these choices are never undone.
hash Table:
An array that stores a collection of items.
Binary Search Tree
Avg height: O(log n) Worst height: O(n)
What is the goal of Hashing?
Do faster than O(LogN) time complexity for: lookup, insert, and remove operations. To achieve O(1)
Sharding
Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.
Euclidean Algorithm GCD
def gcd(a, b): while a: b, a = a, b%a return b
Quicksort
partitioning Best: O(n log n) (or O(n) three-way) Avg: O(n log n) Worst: O(n^2)
Pre-Order Traversal
1. Process self. 2. Process left child. 3. Process right child.
How does Hashing work?
1. you have a key for the item. 2. the item's key gets churned within the hash function to form the Hash index. 3. The hash index can be applied to the data array, and so, the specific data is found.
memoization
An optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
Dynamic Programming
Break down a problem into smaller and smaller subproblems. At their lowest levels, the subproblems are solved and their answers stored in memory. These saved answers are used again with other larger (sub)problems which may call for a recomputation of the same information for their own answer. Reusing the stored answers allows for optimization by combining the answers of previously solved subproblems.
Double Checked Locking
Double-checked locking is a software design pattern used to reduce the overhead of acquiring a lock by first testing the locking criterion (the "lock hint") without actually acquiring the lock. Only if the locking criterion check indicates that locking is required does the actual locking logic proceed. (Often used in Singletons, and has issues in C++).
Depth-First Search
Explore newest unexplored vertices first. Placed discovered vertices in a stack (or used recursion). Partitions edges into two classes: tree edges and back edges. Tree edges discover new vertices; back edges are ancestors.
Radix Sort
Non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts.
Heap Sort
Non-stable, in place sort which has an order of growth of NlogN. Requires only one spot of extra space. Works like an improved version of selection sort. It divides its input into a sorted and unsorted region, and iteratively shrinks the unsorted region by extracting the smallest element and moving it into the sorted region. It will make use of a heap structure instead of a linear time search to find the minimum.
Selection Sort
Non-stable, in place sort. Has an N-squared order of growth, needs only one spot of extra space. Works by searching the entire array for the smallest item, then exchanging it with the first item in the array. Repeats this process down the entire array until it is sorted.
Typical runtime of a recursive function with multiple branches
O( branches^depth )
Preemption
Preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch.
Compute XOR of every bit in an integer
Similar to addition, XOR is associative and communicative, so, we need to XOR every bit together. First, XOR the top half with the bottom half. Then, XOR the top quarter of the bottom half with the bottom quarter of the bottom half... x ^= x >> 32 x ^= x >> 16 x ^= x >> 8 x ^= x >> 4 x ^= x >> 2 x ^= x >> 1 x = x & 1
Insertion Sort
Stable, in place sort with an order of growth which is between N and N-squared, needs only one spot of extra space and is dependent on the order of the items. Works by scanning over the list, then inserting the current item to the front of the list where it would fit sequentially. All the items to the left of the list will be sorted, but may not be in their final place as the larger items are continuously pushed back to make room for smaller items if necessary.
Heapify (bubble down)
Swap a node with one of its children, calling bubble_down on the node again until it dominates its children. Each time, place a node that dominates the others as the parent node.
How do you insert a value within the hash table?
Table[Hash(key)]=data;
Table Size(TS)
The Array's Length
Replace the lowest bit that is 1 with 0
x & (x - 1)
Compute x modulo a power of 2 (y)
x & (y - 1)
Prim's Algorithm
(Minimum Spanning Trees, O(m + nlogn), where m is number of edges and n is the number of vertices) Starting from a vertex, grow the rest of the tree one edge at a time until all vertices are included. Greedily select the best local option from all available choices without regard to the global structure.
Geometric series
.
Rolling hash function
A rolling hash (also known as a rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below.
Divide-and-conquer
Algorithm design patterns. Divide the problem into two or more smaller independent subproblems and solve the original problem using solutions to the subproblems.
Invariants
Algorithm design patterns. Identify an invariant and use it to rule out potential solutions that are suboptimal/dominated by other solutions.
Recursion
Algorithm design patterns. If the structure of the input is defined in a recursive manner, design a recursive algorithm that follows the input definition.
Sorting
Algorithm design patterns. Uncover some structure by sorting the input.
Floyd-Warshall Algorithm
An algorithm for finding shortest paths in a weighted graph with positive or negative edge weights (but with no negative cycles). A single execution of the algorithm will find the lengths (summed weights) of the shortest paths between all pairs of vertices, though it does not return details of the paths themselves.
Dijkstra's Algorithm
An algorithm for finding the shortest paths between nodes in a weighted graph. For a given source node in the graph, the algorithm finds the shortest path between that node and every other. It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined. Its time complexity is O(E + VlogV), where E is the number of edges and V is the number of vertices.
Trie
In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree.
ShellSort
Non-stable, in place sort with an order of growth which is undetermined, though usually given at being N-to-the 6/5. Needs only one spot of extra space. Works as an extension of insertion sort. It gains speed by allowing exchanges of entries which are far apart, producing partially sorted arrays which are eventually sorted quickly at the end with an insertion sort. The idea is to rearrange the array so that every h-th entry yields a sorted sequence. The array is h-sorted.
Red-black Tree
Worst height: 2 log n
How do you delete a value within the hash table?
You just set Table[hash(Key)] = null
Greedy Algorithm
an algorithm that follows problem solving heuristic of making optimal choices at each stage. Hopefully finds the global optimum. An example would be Kruskal's algorithm.
Counting Sort
An algorithm for sorting a collection of objects according to keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently.[1][2][3] Because counting sort uses key values as indexes into an array, it is not a comparison sort, and the Ω(n log n) lower bound for comparison sorting does not apply to it.
Divide and Conquer
works by recursively breaking down a problem into two or more sub problems until the problems become simple enough to be solved directly. An example would be mergesort.
If we have UW-Madison student ID's, and we wanted the ideal hash functions, how would we do it, and why would there be a problem
-> We'd simply count each one as an index -> Hash table would be huge.
Bit Array
A bit array is a mapping from some domain (almost always a range of integers) to values in the set {0, 1}. The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed by applying an index to the array. Assuming its size (or length) to be n bits, the array can be used to specify a subset of the domain (e.g. {0, 1, 2, ..., n−1}), where a 1-bit indicates the presence and a 0-bit the absence of a number in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).
Non-Linear Data Structure
A data structure that does not occupy contiguous memory, such as a linked list, graph, or tree.
linked list
A sequence of zero or more nodes containing some data and pointers to other nodes of the list.
Data Structure
A way of organizing data in a computer so that it can be used efficiently, such as an array, linked list, stack, queue, or binary tree.
Doubly Linked List: access, search, insert, delete
Access: O(n) Search: O(n) Insert: O(1) Delete: O(1)
Recurrence Relation
An equation that is defined in terms of itself. Any polynomial or exponential can be represented by a recurrence.
Concrete Examples
Analysis pattern. Manually solve concrete instances of the problem and then build a general solution
Rabin-Karp
Compute hash codes of each substring whose length is the length of s, such as a function with the property that the hash code of a string is an additive function of each individual character. Get the hash code of a sliding window of characters and compare if the hash matches.
Dynamic Memory
Memory that is allocated as needed, and NOT contiguous (side-by-side), specifically during the implementation of a linked list style data structure, which also includes binary trees and graphs.
Topological Sorting
Receives a DAG as input, outputs the ordering of vertices. Selects a node with no incoming edges, reads it's outgoing edges.
Tree Sort
Stable, O(n log n), Ω(n log n) : Put everything in the tree, traverse in-order.
Hamming Weight
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used (also called the population count, popcount or sideways sum). Algorithm: - Count the number of pairs, then quads, then octs, etc, adding and shifting. v = v - ((v>>1) & 0x55555555); v = (v & 0x33333333) + ((v>>2) & 0x33333333); int count = ((v + (v>>4) & 0xF0F0F0F) * 0x1010101) >> 24;
Insertion & Quick Sort
Using both algorithms together is more efficient since O(n log n) is only for large arrays.
Why do we use prime numbers for table size?
We mod often, and prime numbers give us the most unique numbers. (2*ts+1)
Selection Sort
An in-place comparison sort algorithm, O(n^2). The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right
Post-Order Traversal
1. Process left child. 2. Process right child. 3. Process self.
In-Order Traversal
1. Process left child. 2. Process self. 3. Process right child.
Fibonacci Heap
A data structure that is a collection of trees satisfying the minimum-heap property, that is, the key of a child is always greater than or equal to the key of the parent. This implies that the minimum key is always at the root of one of the trees. The trees do not have a prescribed shape and in the extreme case the heap can have every element in a separate tree. This flexibility allows some operations to be executed in a "lazy" manner, postponing the work for later operations. For example, merging heaps is done simply by concatenating the two lists of trees, and operation decrease key sometimes cuts a node from its parent and forms a new tree. For the Fibonacci heap, the find-minimum operation takes constant (O(1)) amortized time. The insert and decrease key operations also work in constant amortized time. Deleting an element (most often used in the special case of deleting the minimum element) works in O(log n) amortized time, where n is the size of the heap. This means that starting from an empty data structure, any sequence of a insert and decrease key operations and b delete operations would take O(a + b log n) worst case time, where n is the maximum heap size. In a binary or binomial heap such a sequence of operations would take O((a + b) log n) time. A Fibonacci heap is thus better than a binary or binomial heap when b is smaller than a by a non-constant factor. It is also possible to merge two Fibonacci heaps in constant amortized time, improving on the logarithmic merge time of a binomial heap, and improving on binary heaps which cannot handle merges efficiently. Using Fibonacci heaps for priority queues improves the asymptotic running time of important algorithms, such as Dijkstra's algorithm for computing the shortest path between two nodes in a graph, compared to the same algorithm using other slower priority queue data structures.
Quick Select
A selection algorithm to find the kth smallest element in an unordered list. Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side - the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n). Partition algorithm:
Inverted Index
An index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents (named in contrast to a Forward Index, which maps from documents to content). The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.
Counting Sort (Key Indexed sort)
An integer sorting algorithm which counts the number of objects that have a distinct key value, and then used arithmetic on those countes to determine the positions of each key value in the output array. It cannot handle large keys efficiently, and is often used as a subroutine for other sorting algorithms such as radix sort. Has a time complexity of N.
Internal Sorting
An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory. For sorting larger datasets, it may be necessary to hold only a chunk of data in memory at a time, since it won't all fit. The rest of the data is normally held on some larger, but slower medium, like a hard-disk. Any reading or writing of data to and from this slower media can slow the sortation process considerably.
Binary Search
An ordered array of data which has efficiently supported operations. The worst and average case of a search using this structure is lgN. The Worst case of an insertion is N, and the average case of an insertion is N/2.
Iterative Refinement
Analysis pattern. Most problems can be solved using s brute-force approach. Find such a solution and improve upon it.
Case Analysis
Analysis pattern. Split the input/execution into a number of cases and solve each case in isolation
Heuristics
Any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, stereotyping, profiling, or common sense
Transitive Closure
Can one get from node a to node d in one or more hops? A binary relation tells you only that node a is connected to node b, and that node b is connected to node c, etc. After the transitive closure is constructed one may determine that node d is reachable from node a. (use Floyd-Warshall Algorithm)
Solving Divide-and-Conquer Recurrances
Case 1: Too many leaves. Case 2: Equal work per level. Case 3: Too expensive a root
Breadth-First Search
Explores the oldest unexplored vertices first. Places discovered vertices in a queue. In an undirected graph: Assigns a direction to each edge, from the discoverer to the discovered, and the discoverer is denoted to be the parent.
External Sorting
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file. Mergesort is typically preferred.
Quick Sort
Non-stable, in place sort with an order of growth of NlogN. Needs lgN of extra space. It has a probabilistic guarantee. Works by making use of a divide and conquer method. The array is divided into two parts, and then the parts are sorted independently. An arbitrary value is chosen as the partition. Afterwards, all items which are larger than this value go to the right of it, and all items which are less than this value go to the left of it. We arbitrarily choose a[lo] as a partitioning item. Then we scan from the left end of the array one by one until we find an entry that is greater than a[lo]. At the same time, we are scanning from a[lo] to the right to find an entry that is less than or equal to a[lo]. Once we find these two values, we swap them.
Load Factor (LF)
Number of items/Table size. For instance, a load factor of 1 = 100% of the items are used.
How do you look up a value within the hash table?
return Table[Hash(key)];
Type erasure
Type erasure is any technique in which a single type can be used to represent a wide variety of types that share a common interface. In the C++ lands, the term type-erasure is strongly associated with the particular technique that uses templates in the interface and dynamic polymorphism in the implementation. 1. A union is the simplest form of type erasure. - It is bounded, and all participating types have to be mentioned at the point of declaration. 2. A void pointer is a low-level form of type erasure. Functionality is provided by pointers to functions that operate on void* after casting it back to the appropriate type. - It is unbounded, but type unsafe. 3. Virtual functions offer a type safe form of type erasure. The underlying void and function pointers are generated by the compiler. - It is unbounded, but intrusive. - Has reference semantics. 4. A template based form of type erasure provides a natural C++ interface. The implementation is built on top of dynamic polymorphism. - It is unbounded and unintrusive. - Has value semantics.
Right propagate the rightmost set bit in x
x | (x & ~(x - 1) - 1)
Binary Search Tree
Will have a best case high of lgN. This is also its expected height. In the worst case, it will have a height of N, and thus become similar to a linked list. Works by inserting nodes of lesser values to the left of a node, and inserting greater values to the right of the node, traversing down the tree until we reach a blank spot to insert. Has a worst case cost of N to search and insert node. The average case of searching will be 1.39lgN compares