Queue, GRAPHS, Data Structures, Data Structures, Algorithms & Data S
Shell Sort
-Uses insertion sort with a gap size -Gap is initially n/2 -Each iteration reduces the gap -Repeat until gap=1 -Difficult to analyze runtime
In Place
-Uses less than O(n) memory overhead -Merge is the only method we know that is not in place
Overheads of Recursion
-Memory -Runtime -Allocation of activation records -System calls
Quick Sort
-Take median of 3 elements to select pivot -Split array about pivot -Once array is broken down enough, use insertion sort for fastest runtime. -O(nlogn) best case -O(n^2) worst case
Implementation
-The how of a data structure -Data members -Code for operations
Dynamic Binding
-The method code executed @ runtime depends on Object type, not reference type
Encapsulation
Objects combine data and operations
Binary Search Tree: search, insert, delete
Search: O(h) / balanced, O(lg n) Insert: O(h) / balanced, O(lg n) Delete: O(h) / balanced, O(lg n)
Field
an item of data
Vertex
an object in a graph - also known as a node
topological sort
an ordering of vertices in a directed acyclic graph (edges have direction, no cycles exist to return to a given vertex after leaving it)
What operation retrieves the front item of the queue?
dequeue()
B-Tree Insertion (Average)
O(log(n))
Cartesian Tree Insertion (Average)
O(log(n))
Red-Black Tree Search (Average)
O(log(n))
AVL Tree Space Complexity
O(n)
What is the goal of Hashing?
Do faster than O(LogN) time complexity for: lookup, insert, and remove operations. To achieve O(1)
What would the Perfect Hash Function be?
Each Key maps to an unique Hash Index.
leaf
A vertex with no children
ADT
Abstract Data Types (add, remove, change) collevtion of data and set of operations
greedy algorithm
Dijkstra's, Prim's, Kruskal's
Big-oh: Worst case
Longest time for execution: f(n) is O(g(n))
The number of edges in a tree
One less than the number of vertices
Direct-access table: search
Search: O(1)
Basic Sort
Selection/Bubble/Insertion
Abstraction
Separate purpose of module from implementation. Able to use without implementation
Multiway trees
Ternary tree K-ary tree And-or tree (a,b)-tree Link/cut tree SPQR-tree Spaghetti stack Disjoint-set data structure Fusion tree Enfilade Exponential tree Fenwick tree Van Emde Boas tree Rose tree
Root
The base level node in a tree; the node that has no parent.
Hashing
The implementation of hash tables
Length
The length of a path or a cycle is its number of edges.
depth of a vertex v
The length of the simple path from the root to v
Out degree
The number of children for that node
Legnth
The number of edges it contains
Articulation Vertex
The weakest point in a graph
Separate Chaining
Uses a linked list to handle collisions at a specific point.
position
a place within a list where a single element is stored. Not equivalent to index.
Mergesort
a recursive sorting algorithm that runs in O(NlogN)
linear data structures
arrays and lists. its elements form a sequence
Algorithm Analysis
how long it takes a computer to do something
dictionary ADT
isEmpty; getNumberOf; add; remove; clear; getValue; contains; traverse;
1 + 2 + ... + n = ?
n(n+1)/2
last node
the rightmost node of maximum depth
stack operations
push(e), pop(), top(), size(), isEmpty()
R-tree*
space partitioning or binary space partitioning data structure. tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984[1] and has found significant use in both theoretical and applied contexts.[2] A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" (to display them in a navigation system) or "find the nearest gas station" (although not taking roads into account). The R-tree can also accelerate nearest neighbor search[3] for various distance metrics, including great-circle distance. -The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the "R" in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation of an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set. -Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same height), organizes the data in pages, and is designed for storage on disk (as used in databases). Each page can contain a maximum number of entries, often denoted as {\displaystyle M} M. It also guarantees a minimum fill (except for the root node), however best performance has been experienced with a minimum fill of 30%-40% of the maximum number of entries (B-trees guarantee 50% page fill, and B*-trees even 66%). The reason for this is the more complex balancing required for spatial data as opposed to linear data stored in B-trees. -As with most trees, the searching algorithms (e.g., intersection, containment, nearest neighbor search) are rather simple. The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search. Like B-trees, this makes R-trees suitable for large data sets and databases, where nodes can be paged to memory when needed, and the whole tree cannot be kept in main memory. -The key difficulty of R-trees is to build an efficient tree that on one hand is balanced (so the leaf nodes are at the same height) on the other hand the rectangles do not cover too much empty space and do not overlap too much (so that during search, fewer subtrees need to be processed). For example, the original idea for inserting elements to obtain an efficient tree is to always insert into the subtree that requires least enlargement of its bounding box. Once that page is full, the data is split into two sets that should cover the minimal area each. Most of the research and improvements for R-trees aims at improving the way the tree is built and can be grouped into two objectives: building an efficient tree from scratch (known as bulk-loading) and performing changes on an existing tree (insertion and deletion). -R-trees do not guarantee good worst-case performance, but generally perform well with real-world data.[5] While more of theoretical interest, the (bulk-loaded) Priority R-tree variant of the R-tree is worst-case optimal,[6] but due to the increased complexity, has not received much attention in practical applications so far. -When data is organized in an R-tree, the k nearest neighbors (for any Lp-Norm) of all points can efficiently be computed using a spatial join.[7] This is beneficial for many algorithms based on the k nearest neighbors, for example the Local Outlier Factor. DeLi-Clu,[8] Density-Link-Clustering is a cluster analysis algorithm that uses the R-tree structure for a similar kind of spatial join to efficiently compute an OPTICS clustering.
Merging
take one heap and combine it with another
Linear Probing
Checks each spot in order to find available location, causes primary clustering.
queue insertion
FIFO( first in first out)
Insertion sort
Side-by-side comparison Best: O(n) Avg: O(n^2) Worst: O(n^2)
Topographical sort
Some things have to come before others Ex: getting dressed, course prereqs Not necessarily unique
Analyze Algorithms
Space -> Memory Time-> Time to execute program
What is a class?
Template for creating objects of the same time.
tree
a set of nodes storing elements such that the nodes have a parent-child relationship
Array
a set of related data items stored under a single identifier. Can work on one or more dimensions
Interupt
a signal sent by a device or program requesting its attention
Selection Sort
Finds largest number and moves it to end of array(sorting method) Good choice when data moves are costly, but comparisons are not. O(n^2)
Examples of queue
People waiting bus Printing jobs sharing one printer
3. Knowing that hash table size is 10 and knowledge of the expected keys is 10, 20, 30, 40, .... How do you think the hash function key % 10 will perform (well or poor)? Why?
Poor, all these keys result in collision (any of those numbers mod 10 will be equal to 0). The hash function key should be a prime number to reduce collisions.
B-Trees
Popular in disk storage. Keys are in nodes. Data is in the leaves.
L R N
Postorder traversal (Reverse Polish)
What is the efficiency of Quick Sort?
Quick sort is O(n log n) in the average case but O(n^2) in the worst case. The choice of pivots affects its behavior.
TreeMap underlying Structure:
RBT
height of a tree
maximum depth of any node
Hopscotch hashing
new algorithm that tries to improve on the classic linear probing algorithm.
unique_pointer
no other pointer can reference same object c++ 11 smart pointer
internal node
node with at least one child
external node
node without children (leaf)
depth of a node
number of ancestors
Directory
number of entries of bits used by the roots
Recursive Algorithms
solve a problem by solving smaller internal instances of a problem -- work towards a base case.
subtree
tree consisting of a node and its descendants
Range of |E|
zero to |V|^2 - |V|
Length
the number of edges in the path
What is the relative magnitudes of common growth rate function as supplied in the text? (10)
1 < log(log n) < log n < log^2 n < n < n log n < n^2 < n^3 < 2^n < n!
Acyclic
A graph without cycles
Pseudo graph
Have amulipleedges and loops
XOR linked list*
List data structure. a data structure that takes advantage of the bitwise XOR operation to decrease storage requirements for doubly linked lists. A bitwise XOR takes two bit patterns of equal length and performs the logical exclusive OR operation on each pair of corresponding bits. The result in each position is 1 if only the first bit is 1 or only the second bit is 1, but will be 0 if both are 0 or both are 1. In this we perform the comparison of two bits, being 1 if the two bits are different, and 0 if they are the same. 0101 (decimal 5) XOR 0011 (decimal 3) = 0110 (decimal 6) The bitwise XOR may be used to invert selected bits in a register (also called toggle or flip). Any bit may be toggled by XORing it with 1. For example, given the bit pattern 0010 (decimal 2) the second and fourth bits may be toggled by a bitwise XOR with a bit pattern containing 1 in the second and fourth positions: 0010 (decimal 2) XOR 1010 (decimal 10) = 1000 (decimal 8) This technique may be used to manipulate bit patterns representing sets of Boolean states. -Assembly language programmers sometimes use XOR as a short-cut to setting the value of a register to zero. Performing XOR on a value against itself always yields zero, and on many architectures this operation requires fewer clock cycles and memory than loading a zero value and saving it to the register. -An ordinary doubly linked list stores addresses of the previous and next list items in each list node, requiring two address fields: ... A B C D E ... -> next -> next -> next -> <- prev <- prev <- prev <- An XOR linked list compresses the same information into one address field by storing the bitwise XOR (here denoted by ⊕) of the address for previous and the address for next in one field: ... A B C D E ... <-> A⊕C <-> B⊕D <-> C⊕E <-> When you traverse the list from left to right: supposing you are at C, you can take the address of the previous item, B, and XOR it with the value in the link field (B⊕D). You will then have the address for D and you can continue traversing the list. The same pattern applies in the other direction. -To start traversing the list in either direction from some point, you need the address of two consecutive items, not just one. If the addresses of the two consecutive items are reversed, you will end up traversing the list in the opposite direction. -Drawbacks: ...General-purpose debugging tools cannot follow the XOR chain, making debugging more difficult; [1] ...The price for the decrease in memory usage is an increase in code complexity, making maintenance more expensive; ...Most garbage collection schemes do not work with data structures that do not contain literal pointers; ...XOR of pointers is not defined in some contexts (e.g., the C language), although many languages provide some kind of type conversion between pointers and integers; ...The pointers will be unreadable if one isn't traversing the list — for example, if the pointer to a list item was contained in another data structure; ...While traversing the list you need to remember the address of the previously accessed node in order to calculate the next node's address. ...XOR linked lists do not provide some of the important advantages of doubly linked lists, such as the ability to delete a node from the list knowing only its address or the ability to insert a new node before or after an existing node when knowing only the address of the existing node. -Computer systems have increasingly cheap and plentiful memory, and storage overhead is not generally an overriding issue outside specialized embedded systems. Where it is still desirable to reduce the overhead of a linked list, unrolling provides a more practical approach (as well as other advantages, such as increasing cache performance and speeding random access).
What defines a complete binary tree?
A complete binary tree is full to its next-to-last level, and its leaves on the last level are filled from left to right.
tree
A connected, acyclic graph
subgraph
A subgraph is a subset of a graph's edges (and associated vertices) that constitutes a graph.
Array: access, search, insert, delete
Access: O(1) Search: O(n) Insert: O(n) Delete: O(n)
Doubly Linked List: access, search, insert, delete
Access: O(n) Search: O(n) Insert: O(1) Delete: O(1)
Stacks ADT
Add new item/Remove most recent (isEmpty/push/pop/peak)
hash Table:
An array that stores a collection of items.
Hash tables: memory
An implementation of a dictionary. Memory: O(n)
strongly connected digraph
A digraph is strongly connected if there is a directed path from every vertex to every other vertex.
directed acyclic graph
A directed acyclic graph (or DAG) is a digraph with no directed cycles.
directed cycle
A directed cycle is a directed path (with at least one edge) whose first and last vertices are the same.
directed path
A directed path in a digraph is a sequence of vertices in which there is a (directed) edge pointing from each vertex in the sequence to its successor in the sequence.
Sparse Graph
A graph in which the number of edges is close to the minimal number of edges. Sparse graphs can be a disconnected
connected components
A graph that is not connected consists of a set of connected components, which are maximal connected subgraphs.
sparse graph
A graph with few edges relative to the number of vertices
How many interfaces can a class implement?
A class can implement more than one interface
Weighting:
Emphasizing some parts of the key over another.
Graph: definition
Finite set of vertices connected by edges, directed or not.
Queue
First In First Out; New elements enter at back, items leave from the front
graph (formal)
G = (V,E). V: finite, nonempty set of vertices. E: set of pairs of V, called edges.
Hash tree
Hash data structure. may refer to: -Hashed array tree -Hash tree (persistent data structure), an implementation strategy for sets and maps -Merkle tree
Complete Binary Tree
In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.
General Tree Implementations
List of Children, The Left-Child/Right-Sibling, Dynamic Node
OOAD
Object Oriented Analysis and Design; Consists of interacting classes and objects
OOD
Object Oriented Design; describes solution to problem; express solution in terms of objects
How does a selection sort work?
The lowest order item is swapped with location of the first element. Rinse and repeat until everything is in order.
Connected Components
The maximally connected subgraphs of an undirected graph are called ________
Degree of a Vertex
The number of edges incident of the vertex, with loops counted twice
outdegree and indegree
The outdegree of a vertex is the number of edges pointing from it. The indegree of a vertex is the number of edges pointing to it.
Children
The term used in trees to indicate a node that extends from another node, such as left child and right child in a binary tree.
Depth-first search
Use a stack or recursion to search tree
Lower order terms
When given an approximation of the rate of growth, we tend to drop lower order terms because they are less significant. y=x^2+x+1
cache
a high-speed temporary area of memory
function template
a pattern for a function that can work with many data types.
linked list
an alternative to array for storing a sequence of objects
If a graph's edges are unordered [ (u,v) == (v,u)], then the vertices u and v are connected by
an undirected edge (u,v).
Inversion
any pair of values where the value closer to the beginning of the list is larger than value on in the list
Variable-length array (VLA or also called variable-sized, runtime-sized)
array data structure. an array data structure of automatic storage duration whose length is determined at run time (instead of at compile time). supported in older programming languages like Ada, Algol 68, APL, C99, etc.
Ω-notation
asymptotic lower bound
Threaded binary tree
binary tree data structure. a binary tree variant that allows fast traversal: given a pointer to a node in a threaded tree, it is possible to cheaply find its in-order successor (and/or predecessor). -types of threaded binary trees: 1) Single Threaded: each node is threaded towards either the in-order predecessor or successor (left or right). 2) Double threaded: each node is threaded towards both the in-order predecessor and successor (left and right).
priority queue
collection of data items from a totally ordered universe
pseudocode
high-level description of an algorithm
big-Theta notation
is equal to the growth rate
queue size
(r - f + N) mod N
Graphs
"Uber" data structure. Shows connections between objects. Can be displayed as either a matrix or linked list representation.
Load Factor
#items(n) / table size
Adjacency List
- an array of linked lists - each list represents a vertex and vertices it connects to
digraph
A graph whose every edge is directed
Queue Data Structure
-First in first out -Operations: -enqueue -dequeue -get front -Can implement a singularly linked chain or a circular linked chain -Can use to accomplish level order traversal
Array Bucket
-- a bucket of arrays. -Fixed in size. -size of about 3 work usually well.
Kruskal's Minimum Spanning Tree Algorithm
1) Sort Edges 2) Add the shortest edge to MST, if cycle is created, remove that edge 3) Repeat 2 until |v|-1 edges are successfully added
# of elements in a binary tree
2^(# of rows)
dense graph
A graph with very few possible edges missing
Heap Binary Tree: definition
A binary tree with two additional constraints: Shape - complete tree Heap property - max/min heap
forest
A forest is a disjoint set of trees.
entry
A key-value pair
Rooted Binary Tree
A rooted binary tree has a root node and every node has at most two children.
ordered tree
A rooted tree in which all children of each vertex are ordered. (Usually left to right)
Heap
A type of priority queue. Stores data which is order-able. O(1) access to highest priority item.
What is a stable sort?
A type of sort that will not swap elements that are equal with one another.
parental
A vertex with at least one child
Counting Sort (Key Indexed sort)
An integer sorting algorithm which counts the number of objects that have a distinct key value, and then used arithmetic on those countes to determine the positions of each key value in the output array. It cannot handle large keys efficiently, and is often used as a subroutine for other sorting algorithms such as radix sort. Has a time complexity of N.
Vertex
An object in a graph.
Base case for calculating n! (n factorial)
if (n ==0) return 1
Two vertices
Are connected with edge
yes
Are multiple items with the same key allowed in dictionaries ?
Algorithm efficiency calculation
Calculate number of operations for minimum, average and maximum
Data Structures
Collection of data items in memory of a running program that are organized in some fashion that allows items to be stored and retrieved by some fixed methods
Each edge
Connect two vertices
Union Find
Data structure used to make sure a cycle is not created in a MST.
Collision
Entering into a space already in use.
Selection sort
Find smallest, put at beginning Best: O(n^2) Avg: O(n^2) Worst: O(n^2)
Queue
First in first out fifo
Pseudo graphs
Graph include loops and possibly multiple edges connecting the same pairs of vertices
What's the difference between Overloading and Overriding?
If a method in a subclass does not math the method of its superclass exactly in its parameter list, then it only overloads the method. Both methods can be accessed.
heap vs BST
If one knows maximum number of items in the priority queue, heap is better implementation. Place in same order as encountered.
Directed graph
If u and v are vertexsu called adjacent to v And b called adjacent from u U called initial vertex B called terminal or end vertex
Neighbors
If vertices are adjacent
Idea of probing:
If you have a collision, search somewhere else on the table.
Indirect Sorting
Involves the use of smart pointers; objects which contains pointers.
Kruskal's algorithm
Keep adding smallest edges
Tries: definition
Key-value storage; a kind of tree Key -not- stored in node, value stored in node Node variables : Boolean isNode, String value, array Edges
Algorithm
Logical sequence of discrete steps that describe a complete solution to a given problem commutable in a finite amount of time and sapce
Coupling
More adaptable, easy to understand, re usability, increase cohesion
Program Structure
Program->Modules->Functions/Methods
Edge
The connection in a graph between two vertices.
Gtaph G
U-------e-----------g I is vertex G vertex E edge that connect them
DAG
directed acyclic graph
heap
binary tree storing keys at its nodes
Hashing algorithm
code that creates a unique index from given items of key data
acyclic graph
has no cycle
Euclidean Algorithm GCD
def gcd(a, b): while a: b, a = a, b%a return b
Fibonacci heap
heap data structure. a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. They named Fibonacci heaps after the Fibonacci numbers, which are used in their running time analysis. -For the Fibonacci heap, the find-minimum operation takes constant (O(1)) amortized time.[1] The insert and decrease key operations also work in constant amortized time.[2] Deleting an element (most often used in the special case of deleting the minimum element) works in O(log n) amortized time, where n is the size of the heap.[2] This means that starting from an empty data structure, any sequence of a insert and decrease key operations and b delete operations would take O(a + b log n) worst case time, where n is the maximum heap size. In a binary or binomial heap such a sequence of operations would take O((a + b) log n) time. A Fibonacci heap is thus better than a binary or binomial heap when b is smaller than a by a non-constant factor. It is also possible to merge two Fibonacci heaps in constant amortized time, improving on the logarithmic merge time of a binomial heap, and improving on binary heaps which cannot handle merges efficiently. -Using Fibonacci heaps for priority queues improves the asymptotic running time of important algorithms, such as Dijkstra's algorithm for computing the shortest path between two nodes in a graph, compared to the same algorithm using other slower priority queue data structures. |Structure| -A Fibonacci heap is a collection of trees satisfying the minimum-heap property, that is, the key of a child is always greater than or equal to the key of the parent. This implies that the minimum key is always at the root of one of the trees. Compared with binomial heaps, the structure of a Fibonacci heap is more flexible. The trees do not have a prescribed shape and in the extreme case the heap can have every element in a separate tree. This flexibility allows some operations to be executed in a lazy manner, postponing the work for later operations. For example, merging heaps is done simply by concatenating the two lists of trees, and operation decrease key sometimes cuts a node from its parent and forms a new tree. -However at some point some order needs to be introduced to the heap to achieve the desired running time. In particular, degrees of nodes (here degree means the number of children) are kept quite low: every node has degree at most O(log n) and the size of a subtree rooted in a node of degree k is at least Fk+2, where Fk is the kth Fibonacci number. This is achieved by the rule that we can cut at most one child of each non-root node. When a second child is cut, the node itself needs to be cut from its parent and becomes the root of a new tree (see Proof of degree bounds, below). The number of trees is decreased in the operation delete minimum, where trees are linked together. -As a result of a relaxed structure, some operations can take a long time while others are done very quickly. For the amortized running time analysis we use the potential method, in that we pretend that very fast operations take a little bit longer than they actually do. This additional time is then later combined and subtracted from the actual running time of slow operations. The amount of time saved for later use is measured at any given moment by a potential function. The potential of a Fibonacci heap is given by Potential = t + 2m where t is the number of trees in the Fibonacci heap, and m is the number of marked nodes. A node is marked if at least one of its children was cut since this node was made a child of another node (all roots are unmarked). The amortized time for an operation is given by the sum of the actual time and c times the difference in potential, where c is a constant (chosen to match the constant factors in the O notation for the actual time). -Thus, the root of each tree in a heap has one unit of time stored. This unit of time can be used later to link this tree with another tree at amortized time 0. Also, each marked node has two units of time stored. One can be used to cut the node from its parent. If this happens, the node becomes a root and the second unit of time will remain stored in it as in any other root.
How do you look up a value within the hash table?
return Table[Hash(key)];
Variable
storage location in memory
Adaptive k-d tree
space partitioning or binary space partitioning data structure. a tree for multidimensional points where successive levels may be split along different dimensions.
Dijkstra's Single Point Shortest Path
- find the least cost path in a weighted graph where the edges have non-negative weights - greedy algorithm (when given a choice, pick the best choice at that time) - for a given vertex, look at adjacent vertices to identify possible paths or improvements to existing paths - choose the next step in the shortest path and repeat the previous step
Dijkstra's Minimum Spanning Tree Algorithm
- for each edge, add it to the MST, if a cycle is created, remove the heaviest edge
Tree Iterator
-Needs to be iterative -Does this by creating a stack that mimics the runtime stack
Post fix notation
-Operator does operation on the next two operands
How does Hashing work?
1. you have a key for the item. 2. the item's key gets churned within the hash function to form the Hash index. 3. The hash index can be applied to the data array, and so, the specific data is found.
How does an Insertion Sort work?
Move through the list by element, inserting the element before the next lowest element on the left. Think of the array in 2 partitions, where the left is sorted. The first unsorted item is moved to the sorted side into its proper place. Rinse and repeat.
Cohesion
Each module should perform one well defined task (self-documenting, reusability, robust)
Run-time variable
MagicBox<std::string>* myBox
Lazy Deletion
Marking a spot as deleted in a hash table rather than actually deleting it.
Traversals that still work for general trees
Preorder and postorder traversals.
|V|
Number of vertices
Binary Search Tree Insertion (Worst)
O(n)
Binary Search Tree Search (Worst)
O(n)
Binary Search Tree Space Complexity
O(n)
depth first search
goes as far as possible from a vertex before backing up; lastVisited->firstExplored
Subgraph S
is formed from graph G by selecting a subset Vs of G's vertices and a subset Es of G's edges such that for every edge E, both of E's vertices are in Vs.
tail recursion
linear recursive method makes its recursive call as it last step - easily transformed into non-recursive.
Node depth
number of edges on the longest downward path between node and the root
Free store object
on application heap; new MagicBox<std::string>();
If a graph's edges are ordered [ (u,v) != (v,u)], then the edge (u,v) is _ from _ to _
directed from u to v
FIFO
first in first out data structure such as a queue
dynamic binding
functions are bound at run time to function they call. at runtime C++ determines the type of object that is making the call and binds the function to the appropriate version of the function.
Bubble Sort
Stable, O(n^2), Ω(n) : Compares neighboring elements to see if sorted. Stops when there's nothing left to sort.
Replace the lowest bit that is 1 with 0
x & (x - 1)
Adjacency matrix
|V| * |V| array
Black height
# of black nodes, including nil, on the path from given node to a leaf, not inclusive; any node with height h has black-height >= h/2
Rehashing
the process of running the hashing algorithm when a collision occurs
Stable
-Two equal elements will remain in the same order -Quick, shell, and heap are NOT stable
Average Case Analysis
-sum of the probability times the cost of each element
Red Black Trees
1. Every node is Red or Black 2. The root is Black 3. If a node is red, it's children must be black 4. Every path from a node to a NULL pointer must contain the same number of black nodes
Combinations of binary tree traversal sequences that uniquely identify a tree
1. Inorder and preorder. 2. Inorder and postorder. 3. inorder and level-order.
Labeled Graph
A graph with labels associated with its vertices
What parts make up an ADT Dictionary?
A keyword (search key) and a value.
What requires more memory, a linked list or array implementation of a bag?
A linked list.
Strongly connected
A directed graph in which there exists a path between every pair of vertices.
7. What graph traversal algorithm uses a queue to keep track of vertices which need to be processed? A. Breadth-first search. B. Depth-first search.
A. Breadth-first search is the correct answer. B. Depth-first search uses a stack.
Application-specific trees
Abstract syntax tree Parse tree Decision tree Alternating decision tree Minimax tree Expectiminimax tree Finger tree Expression tree Log-structured merge-tree
Heap-sort: definition
Array size doesn't change, but heap size does Take off bottom, reshuffle, repeat Less efficient than max-heapify because it sorts from the top instead of the bottom
The vertices u and v of the undirected edge(u,v) are the _ of the edge
endpoints
What operation adds a new item to the back of a queue?
enqueue()
Priority Queue
finite distinct priority values.
Virtual Member Function
function in base class that expects to be re-defined in derived class
BagFunctions
getCurrentSize; isEmpty; add; remove; clear; getFrequencyOf; contains; toVector;
Which deque operation is synonymous with peek()?
getFront(). It does not modify the deque.
Separate Chaining
keep a list of all elements that hash to the same value
Map
like an undirected graph, you can travel either way down the road, versus a directed graph where edges can only be followed in one direction
Chaining
make each slot is the head of a linked list
Typical runtime of a recursive function with multiple branches
O( branches^depth )
Cartesian Tree Search (Worst)
O(n)
28. What is the worst-time complexity on looking for an element on a binary search tree?
O(n), worst case scenario the values come in inorder and all the nodes are on one side.
Bucket Sort
O(n+m) where m is the # of buckets.
Pointer Variable
often called a pointer, is a variable that holds an address
Record
one line of a text file
static variable
one variable shared among all objects of a class
resolve collisions
open addressing: linear probing, quadratic probing, double hazing, increasing size of hash table. restructuring: buckets, separate chaining.
transitive property
x <= y ^ y <= z -> x <= z
Dynamic Programming
Break down a problem into smaller and smaller subproblems. At their lowest levels, the subproblems are solved and their answers stored in memory. These saved answers are used again with other larger (sub)problems which may call for a recomputation of the same information for their own answer. Reusing the stored answers allows for optimization by combining the answers of previously solved subproblems.
Extraction:
Breaking keys into parts and using the parts that uniquely identify with the item. 379452 = 394 121267 = 112
Dijkstra's algorithm
Marking nodes as they are added to tree, update reachable unmarked nodes with weight from beginning. Take smallest total weight. Repeat.
Simple cycle
The path is simple, except for the first and last vertices being the same
Double Hashing
The process of using two hash functions to determine where to store the data.
generics framework
allows the use of generic version of methods
descendant of a node
child, grandchild, grand-grandchild
siblings
children of the same parent
Key
use to determine where an item goes on the list
Information Hiding
Ensures modules can't interact and data is hidden
Compressing:
Ensuring the hash code is a valid index for the table size.
MST
Minimum spanning tree Least weight that connects all nodes No cycles
Convex combination
a method of multiplying vectors that produces a resulting vector within the convex hull
AList
b-tree data structure. No article.
B sharp tree
b-tree data structure. No article.
Degree of vertex
deg (v)
big-Oh notation
gives upper bound on the growth rate
K-ary tree (also sometimes known as a k-way tree, an N-ary tree, or an M-ary tree)
multiway tree data structure. a rooted tree in which each node has no more than k children. A binary tree is the special case where k=2.
data hiding
restricting access to certain members of an object
Latency
the time delay that occurs when transmitting data between devices
weighted graph
A graph with numbers assigned to its edges.
Sparse
A graph with relatively few edges
Low weight
A letter with high weight should have ________ __________
Define level-order traversal of a binary tree?
A level-order traversal—the last traversal we will consider—begins at the root and visits nodes one level at a time. Within a level, it visits nodes from left to right.
1D Array
A linear collection of data items in a program, all of the same type, such as an array of integers or an array of strings, stored in contiguous memory, and easily accessed using a process called indexing.
Direct-access table: definition
An element key k is stored in slot k.
Internal Node
An existing node in a tree, either the root or any one of the children in the tree.
Minimax tree (sometimes MinMax or MM)
Application-specific tree data structure. a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. Originally formulated for two-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.
Count-Min sketch (CM sketch)
Hash data structure. a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count-min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan[1] and described by them in a 2005 paper.[2] Count-min sketches are essentially the same data structure as the counting Bloom filters introduced in 1998 by Fan et al.[3] However, they are used differently and therefore sized differently: a count-min sketch typically has a sublinear number of cells, related to the desired approximation quality of the sketch, while a counting Bloom filter is more typically sized to match the number of elements in the set.
Multigraphs
Have multiple edges connected the same vertices
What is queue?
List that restrict insertion to one end called rear and deletion from the other end called front of list structure
edges in complete directed graph
N*(N-1) = O(N^2)
AVL Tree Access (Average)
O(log(n))
AVL Tree Deletion (Average)
O(log(n))
AVL Tree Insertion (Average)
O(log(n))
Red-Black Tree Deletion (Average)
O(log(n))
Red-Black Tree Insertion (Average)
O(log(n))
Quicksort
The basic concept of quicksort is to chose an item from the list, then create 3 sub lists based on this number. Then the lists are combined to provide the sorted list. The average running time is O(N log N), and worst case is O(N2)
What is the most significant contributor to the total time requirement of an algorithm?
The basic operation. A search of an array will require more time than an addition operation, say.
Vector space
a collection of elements that can be formed by adding or multiplying vectors togetehr
File
a collection of related data
Abstract data type
a conceptual model of how data can be stored and the operations that can be carried out of the data
Tree
a data structure similar to a graph, with no loops
Binary Tree
-Nodes are restricted to have two children at max
Wrappers
-Use composition to adapt and existing class
What 2 steps does a hash function perform?
1. Convert the search key to an integer called the hash code. 2. Compress the hash code into the range of indices for the hash table. (i % tableSize).
Algorithms Cost
1: Computing time 2: Memory required 3: User difficulty 4: Consequence of incorrect action by program
Sorting Algorithm
1: Selection 2: Bubble 3: Insertion 4: Merge 5: Quick 6: Radix 7: Heap 8: Tree
Hash Function
A function that takes in the key to compute a specific Hash Index.
Complete
A graph containing all possible edges
Leaf
A node in a tree data structure that has no children, and is at the end of a branch in a tree.
Simple
A path is _________ if all vertices on the path are distinct
Minimum Spanning Tree
Acyclic, contain all vertexes. Can be approached with either Prim's or Kruskal's method.
Adjacency list: add vertex/edge, delete vertex/edge
Add vertex: O(1) Add edge: O(1) Delete vertex: O(|E|) Delete edge: O(|E|)
Tree
All nodes contain a value. All nodes left of node contain value less than or equal to current node. All nodes right of node contain node more than or equal to current value.
Bucket Sort
All numbers need to be positive
What defines a full binary tree?
All possible nodes for that height are present.
Describe the primary requirement to use the Radix Sort.
All values must be of the same length.
descendants of v
All vertices for which v is an ancestor in a tree
proper descendants
All vertices for which v is an ancestor in a tree, excluding v itself.
Stack
An abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the last element that was added. LIFO - Last In First Out
Recurrence Relation
An equation that is defined in terms of itself. Any polynomial or exponential can be represented by a recurrence.
Selection Sort
An in-place comparison sort algorithm, O(n^2). The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right
set
An unordered collection (possibly empty) of distinct items called elements of the set.
Splay Tree
Any valid BST. Amortized O(log n) access. M operations take O(m log n) for m being large #s. Any node getting inserted, removed, or accessed, get's splayed to the root.
arrays data structure
Array Bit array Bit field Bitboard Bitmap Circular buffer Control table Image Dope vector Dynamic array Gap buffer Hashed array tree Heightmap Lookup table Matrix Parallel array Sorted array Sparse array Sparse matrix Iliffe vector Variable-length array
Big-theta: Average case
Average time for execution, f(n) is theta(g(n))
Binary Search Tree
Avg height: O(log n) Worst height: O(n)
B-trees
B-tree B+ tree B*-tree B sharp tree Dancing tree 2-3 tree 2-3-4 tree Queap Fusion tree Bx-tree AList
2-3 Tree
Balanced tree data structure with logN complexities on searching, inserting, and deleting in both the worst and average cases. In this data structure, every node with children has either two children and one data element, or three children and two data elements. Leaf nodes will contain only one or two data elements.
4 Rules of Recursion
Base Cases: You must always have some base cases, which can be solved without recursion. Making Progress: For the cases that are to be solved recursively, the recursive call must make progress to a base case. Design rule: Assume that all recursive calls work Compound Interest Rule: Never duplicate word by solving the same instance of a problem in separate recursive calls.
Why can you consider an array resizing as o(1)?
Because you amortize the cost over all additions, and though it is o(n) for that particular operation, since the resize permanently increases capacity it is very near o(1) on average.
Dijkstra's Method
Calculates the shortest path to all vertices in a single source shortest path using a priority queue, or a heap. Check's "frontier" based on cost. The distance to any node is known once it has been "visited".
Graph with finite vertex set and finitebedgea
Called finite graph
Edge that connect two vertices
Called incident with vertices that connected with each other
All Pairs Shortest Path
Can be solved using Floyd-Warshall.
Transitive Closure
Can one get from node a to node d in one or more hops? A binary relation tells you only that node a is connected to node b, and that node b is connected to node c, etc. After the transitive closure is constructed one may determine that node d is reachable from node a. (use Floyd-Warshall Algorithm)
no
Can two distinct entries in a map have the same key ?
yes
Can two distinct entries in a priority queue have the same key ?
Interface
Class like construct that contains only constants and abstract methods. All data fields have to be public static final and all methods have to be public abstract. This creates a way for different classes to associate and communicate with each other using a singular system. Like the comparable method that requires a compareTo.
What is folding?
Combining longer search keys into a smaller hash. Do not ignore any part of the key! (int)(key ^ (key >> 32)) This folds a long int into a 32 bit int, with the left 32 bits discarded by the cast to int.
Application of graph
Computer network
Each edge
Connect to an unordered pairs of vertices
Abstract Data Types
Consists of 2 parts: 1. Data it contains 2. Operations that can be performed on it
9. Which indicates pre-order traversal? A. Left sub-tree, Right sub-tree and root B. Right sub-tree, Left sub-tree and root C. Root, Left sub-tree, Right sub-tree D. Right sub-tree, root, Left sub-tree
Correct answer is C A. Left sub-tree, Right sub-tree and root // postorder B. Right sub-tree, Left sub-tree and root / / right never goes before left D. Right sub-tree, root, Left sub-tree / / right never goes before left
What are the different types a variable can be associated with?
Declared type - the type listed in variable declaration Actual type - the type of object that variable refrences
Graphs
Discrete structure consists of vertices and edged that connects these vertices
Cartesian Tree Difference between Average and Worst
Every operation
External Sorting
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file. Mergesort is typically preferred.
queue
FIFO list in which elements are added from one end of the structure and deleted from the other end.
Queues
First in, first out. O(1)
Balanced Binary Tree
For each node in the tree, the difference in the height of its left and right subtrees is at most one.
Arithmetic progressions
For p < -1, this sum always converges to a constant.
Directed Graph
Given a path connecting vertices A and B you can only travel in 1 direction
Bipartite Graph
Graph can be colored without conflicts while using only two colors.
Weighted
Graphs whose edges have weights
11. Draw the AVL Tree resulted from reading this sequence of numbers 10, 11, 12, 13, 14, 15, 16, 17
Hand draw this one
4. Draw the directed graph that corresponds to this adjacency matrix: 0 1 2 3 0 | true false true false 1 | true false false false 2 | false false false true 3 | true false true false
Hand draw this one
Trie
Has only part of a key for comparison at each node.
Rolling hash (also known as recursive hashing or rolling checksum)
Hash data structure. a hash function where the input is hashed in a window that moves through the input. -A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. -One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below. -Another popular application is rsync program which uses a checksum based on Mark Adler's adler-32 as its rolling hash. -Another application is the Low Bandwidth Network Filesystem (LBFS), which uses a Rabin fingerprint as its rolling hash. -At best, rolling hash values are pairwise independent[1] or strongly universal. They cannot be 3-wise independent, for example.
Dynamic perfect hash table
Hash data structure. a programming technique for resolving collisions in a hash table data structure.[1][2][3] While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.
HashMap underlying structure:
HashTable with chained buckets
Heap Priority Queue: insert, max, extract max, increase value
Heap-insert: O(lg n) Heap-maximum: O(1) Heap-extract-max: O(lg n) Heap-increase-value: O(lg n)
Heap v.s. BST (definition)
Heap: Parent is less than both left and right children A binary heap is a complete tree BST: Parent is greater than left child, less than right child
Define the height of a tree?
Height of tree T = 1 + height of the tallest subtree of T. The root counts as its own level. Only an empty tree has height 0.
Red Black Tree: height
Height: O(lg n)
What is the Big O complexity of the Merge Sort?
If n is 2^k, then there will be k recursive calls to mergesort. The merges themselves must compare n elements each time. So the Big O is O(n log n). This is for best, worst and average cases.
Depth
In tree data structure, expressed as the number of steps from the root of the tree to the farthest outlying node in the tree. Height is also used to mean the same thing.
Internal Path Length
In tree processing, this is the sum of all the lengths from the root to the external nodes in a tree.
Priority queue
Is a queue in which the eateries are inserted into a position according to some priority criterion rather than the arrival order only P eriority queue called also heaps Push (x) insert object x from the end of the deque Pop () remove the front object of the dequeue and return the value Inject (x) insert the object x on the rear end of the dequeue Eject () remove the rear object and return the value Empty () true if the dequeue is empty,else false
Loop
Is edge that connect vertex to itself
The degree of avertex
Is number of edges incident with it except that sloop at vertex contribute twice to the degree of that vertex
Degree of vertex
Is numbers of edges that connect to it
stack insertion
LIFO ( last in first out)
stack
LIFO list in which insertions/deletions are only done at one end.
Stacks
Last In - First Out; Ex: undo/backspace/palindrome
Big-omega: Best case
Least time for execution: f(n) is omega(g(n))
Equation of handshaking theorem
Let G=(V,E) be undirected graph with m edges 2m=£ deg (v) v €V
Array list (dynamic array?)
List data structure.
Standard data structure for solving complex bit manipulation
Lookup table
Directed graph may contain
Loops and multiple graphs that edges start andd and at the same vertices
Divide and Conquer
Maximum Sub-sequence sum, Linear time tree traversal, mergesort/quicksort. Divide the problem into smaller component
The set of vertices
Maybe infinite
Dynamic Memory
Memory that is allocated as needed, and NOT contiguous (side-by-side), specifically during the implementation of a linked list style data structure, which also includes binary trees and graphs.
Binary Search Tree: memory
Memory: O(n)
Direct-access table: memory
Memory: O(n)
Are constructors of the superclass inherited by subclasses?
No, but they can be invoked using the "super" keyword from within the subclass constructor.
Radix Sort
Non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts.
3-Way Quick Sort
Non-stable, in place sort with an order of growth between N and NlogN. Needs lgN of extra space. Is probabilistic and dependent on the distribution of input keys.
|E|
Number of edges
data types
Primitive types, Composite types, Abstract data types. a classification identifying one of various types of data that determines the possible values for that type, the operations that can be done on values of that type, the meaning of the data, and the way values of that type can be stored. Data types are used within type systems, which offer various ways of defining, implementing and using them. Different type systems ensure varying degrees of type safety.
Bloom Filters
Probabilistic hash table. No means no. Yes means maybe. Multiple (different) hash functions. Can't resize table. Also can't remove elements.
Quadratic Probing:
Probe Sequence is (Hk+1)^2. Minimizes clustering better at distinguishing items across table.
Queues as data structure
Queue is data structure that follow the first in first out principle Objects added to the rear of queue and are removed from the front of queue
By which we can implement queues
Queues implement by arrays and linked lists
What is the difference between a reference variable and primitive variable?
Reference variables hold a pointer or references to memory locations of the actual object. If you assign one ref. variable to another, you pass their reference and not the actual value - both variables will be pointing to the exact same memory location and hence the same object. If you modify one of them, the other will be modified as well. Primitive types hold the value of the data and pass by value.
What list operation is not available in the sort list?
Replace by location. Its sorted, how would this even make sense?
Replica
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
first child-next sibling representation
Representation used for ordered trees with a potentially varying amount of children per parent node.
Red Black Tree: search, insert, delete
Search: O(lg n) Insert: O(lg n) Delete: O(lg n)
Vertex-coloring
Seeks to assign a label (or color) to each vertex of a graph such that no edge links any two vertices of the same color.
Splay Tree
Self-adjusting binary search tree where recently accessed elements are quick to access again
Stack and Queue
Stacks can be implemented with either an array or a linked list. Each stack operation is O(1): Push, Pop, Top, Empty. Stacks are ocassionaly known as LIFO queues. Queues can be implemented with either a linked list (with tail pointer) or an array. Each queue operation is O(1): Enqueue, Dequeue, Empty.
Linear Probing:
Step size is 1. Find the index, and keep incrementing by one until you find a free space.
MSD
String/character sort from left to right, must keep appending from previous letters to keep order
LSD
String/character sort from right to left
Little-Oh
T(n) = 0(f(n)) if T(n) = O(f(n)) and T(n) != Ω(f(n))
Big-Oh
T(n) = O(f(n)) if there are positive constants c & n° such that T(n) <= c * f(n) for all n >= n°
Divide-and-Conquer Recurrances
T(n) = aT(n/b) + f(n)
Big Omega
T(n) = Ω(f(n)) if ∃ positive constants c & n° such that T(n) >= c * f(n) for all n >= n°
How do you insert a value within the hash table?
Table[Hash(key)]=data;
Insertion Sort
Take each item from unsorted region and insert it into sorted region. Drawback is array may have to be shifted when elements are inserted. O(n^2) Omega(n). Unsuitable for large arrays
Hamming Weight
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used (also called the population count, popcount or sideways sum). Algorithm: - Count the number of pairs, then quads, then octs, etc, adding and shifting. v = v - ((v>>1) & 0x55555555); v = (v & 0x33333333) + ((v>>2) & 0x33333333); int count = ((v + (v>>4) & 0xF0F0F0F) * 0x1010101) >> 24;
Describe the Quick Sort?
The array is PARTITIONED into 2 pieces, though not necessarily equal. The division point is the "pivot". It is placed in its final spot in the final array. Elements less than the pivot are on the left of it, elements greater on the right. Rinse and repeat for the smaller arrays.
What is a downside of Quick Sort?
The choice of pivots affects quick sort's efficiency. Some pivot-selection schemes can lead to worst-case behavior if the array is already sorted or nearly sorted.
degree
The degree of a vertex is the number of edges incident on it.
Insertion Sort
The process behind insertion sorting involves using the fact that the items we have already sorted are in the correct positions, so finding where the next item needs to go is a matter of finding the two items it fits between. O(N^2)
In-Order Traversal
The process of systematically visiting every node in a tree once, starting at the root and proceeding left down the tree, accessing the first node encountered at its "center", proceeding likewise along the tree, accessing each node as encountered at the "center".
What is the ideal hash table size?
The size of a hash table should be a prime number n greater than 2. Then, if you compress a positive hash code c into an index for the table by using c % n, the indices will be distributed uniformly between 0 and n - 1.
length of a path
The total number of vertices in the vertex sequence defining the path - 1.
Strongly Connected Graphs
There lies a path between any two vertices on a directed graph.
Space-partitioning trees
These are data structures used for space partitioning or binary space partitioning. Segment tree Interval tree Range tree Bin Kd-tree Implicit kd-tree Min/max kd-tree Relaxed Kd-tree Adaptive k-d tree Quadtree Octree Linear octree Z-order UB-tree R-tree R+ tree R* tree Hilbert R-tree X-tree Metric tree Cover tree M-tree VP-tree BK-tree Bounding interval hierarchy BSP tree Rapidly exploring random tree
5. Consider this graph: v0 <------- v2 / \ / \ -> v1 <-/ \-> v4 / \ / \ / \->v3 -------> v5 / / / / v6 <---------/ In what order are the vertices visited for a depth-first search that starts at v0? In what order are the vertices visited for a breadth-first search that starts at v0?
This is a directed graph df: v0 v1 v3 v6 v5 v4 note: v2 will not be visited if we start at v0 bf: v0 v1 v4 v3 v6 v5 note: v2 will not be visited if we start at v0
Describe the Radix Sort?
This is where you work through each place value, from right to left, sorting into buckets based upon the given place value. Each bucket retains the order that it received the values.
True of False: A subclass can access all protected data fields and methods of the superclass
True, only private data fields are inaccessible and by subclasses
Parallel
Two edges are parallel if they connect the same pair of vertices.
Simple graph
Undirected and no multiple edgesbor loops
Multiple graphs
Undirected and no.multiple edges ornloopd
2 Ways to Improve Disjoint Sets
Union By Rank - make smaller tree point to larger tree. Path Compression - Updating parent pointer directly to root.
Quick Sort
Unstable, O(n log n) for a good pivot,O(n^2) for a bad pivot Ω(n log n) : Uses partitioning O(n), Pick a median of 1st, middle, and last element for pivot. Random selection is also good, but expensive. Algorithm can be slow because of many function calls.
Friendship graphs use
Use to known of two people known each other or not
Graphs
Used to construct data communication center
Why do we use prime numbers for table size?
We mod often, and prime numbers give us the most unique numbers. (2*ts+1)
reachable
We say that a vertex w is reachable from a vertex v if there exists a directed path from v to w.
connected vertices
We say that one vertex is connected to another if there exists a path that contains both of them.
strongly connected vertices
We say that two vertices v and w are strongly connected if they are mutually reachable: there is a directed path from v to w and a directed path from w to v.
Memoization
What happens when a sub problem's solution is found during the process of Dynamic Programming. The solution is stored for future use, so that it may be reused for larger problems which contain this same subproblem. This helps to decrease run time.
SimpleDirected graph
When not contain.loops or multiple directednedges
Collisions:
When the Hash Function returns the same index for different keys.
Single Linked List
Without tail PushFront(Key) O(1) TopFront() O(1) PopFront() O(1) PushBack(Key) O(n) TopBack() O(n) PopBack() O(n) Find(Key) O(n) Erase(Key) O(n) Empty() O(1) AddBefore(Node, Key) O(n) AddAfter(Node, Key) O(1) With tail PushBack(Key) O(1) TopBack() O(1)
25. Given the following elements, insert them into an empty min heap: 12, 5, 15, 9, 13, 7, 15, 10, 3, 20, 4
[12] insert 5 [12, 5] 12 ? 5 = 5 swap [5, 12] insert 15 [5, 12, 15] 5 ? 12 ? 15 = 5 ok, insert 9 [5, 12, 15, 9] 12 ? 9 = 9 swap [5, 9, 15, 12] insert 13 [5, 9, 15, 12, 13] 9 ? 12 ? 13 = 9 ok, insert 7 [5, 9, 15, 12, 13, 7] 15 ? 7 = 7 swap , insert 15 → [5, 9, 7, 12, 13, 15, 15] ←
Circular queue
a FIFO structure implemented as a ring where the front and rear pointers can wrap around from the end to the start of the array
Linear queue
a FIFO structure organised as a line of data
abstract class
a class that may have abstract methods. You cannot instantiate it. Used only as superclass
Stack
a data structure where the last item added is the first item removed
Huffman codin
assigns codes to characters such that the length of the code depends on the relative frequency or weight of the corresponding character. AKA variable-length code
Cartesian Tree
binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that the symmetric (in-order) traversal of the tree returns the original sequence
What are some primitive data types?
boolean byte char short int long float double
Asymptotic analysis
determines the running time of algorithm in Big-Oh notation
The vertices u and v of the undirected edge(u,v) are _ to the edge
incident
Graphs ADT
isEmpty;vertices;edges;edgeExists;addVertex;addEdge;removeEdge;removeVertex;retrieveVertex;
Greedy Algorithm
operate by finding the minimum
Perfect hashing
the primary hash table can
Hash Function
turn the key that is provided into the index in the array where we want to put the data
Divide and Conquer
works by recursively breaking down a problem into two or more sub problems until the problems become simple enough to be solved directly. An example would be mergesort.
Tree Buckets
+--WC = O(logN) +--no wasting space +--dynamically sized -- more complicated than what's needed. --> insert with dups= O(1) --> W/o dups = O(N)
External Node
A potential node in a tree, where currently either the left or right child pointer of a node is pointing to null, but potentially could reference another node.
Path
A sequence of vertices v1,v2, ... vn forms a ____ of length n - 1 if there exists edges from vi to vi+1 for 1 <= i < n
Array Index
A value that indicates the position in the array of a particular value. The last element in a zero-indexed array would be the length of the array, minus 1.
Q: For every vertices u, v in a tree, there exists:
A: Exactly one simple path from u to v.
Binary Tree: advantage, disadvantage
Advantage: quick search, delete, insert Disadvantage: complex deletion
Direct-access table: advantage, disadvantage
Advantage: quick search, quick insert and delete Disadvantage: lots of wasted memory, keys must be unique, keys should be dense
Binary Search Tree: advantage, disadvantage
Advantage: quick search, quick insert and delete Disadvantage: slower than hash table
Greedy Algorithms
Algorithm design patterns. Compute a solution in stages, making choices that are local optimum at step; these choices are never undone.
Depth-First Search
Explore newest unexplored vertices first. Placed discovered vertices in a stack (or used recursion). Partitions edges into two classes: tree edges and back edges. Tree edges discover new vertices; back edges are ancestors.
Hash trie
Hash data structure. may refer to: -Hash tree (persistent data structure), a trie used to map hash values to keys -A space-efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. (The name is suggested by a similarity to a closed hash table.) -A data structure which "combines features of hash tables and LC-tries (Least Compression tries) in order to perform efficient lookups and updates"
Mixed graph
Have both directed and undetected edges
Connected Component
In an undirected graph, a connected component is a maximal set of vertices such that there is a path between every pair of vertices (the example shows 3 connected components).
One-Sided Binary Search
In the absence of an upper bound, we can repeatedly test larger intervals (A[1], A[2], A[4], A[8], A[16], etc) until we find an upper bound, the transition point, p, in at most 2[log p] comparisons. One sided binary search is most useful whenever we are looking for a key that lies close to our current position.
ArrayLists: insert
Insert: often O(1), sometimes more
ADT Priority Queue
Items in queue get assigned priority value. Adds to queue in sorted position. Remove/get entry with highest priority.
Self-organizing list
List data structure. a list that reorders its elements based on some self-organizing heuristic to improve average access time. The aim of a self-organizing list is to improve efficiency of linear search by moving more frequently accessed items towards the head of the list. A self-organizing list achieves near constant time for element access in the best case. A self-organizing list uses a reorganizing algorithm to adapt to various query distributions at runtime.
Prim's Algorithm
MST builder/Greedy Algorithm which works by taking a starting vertex and then successively adding the neighbor vertices which have the lowest cost of addition and don't create cycles upon their addition. Time complexity ElogE
Adjacency matrix: memory
Memory: O(|V|^2)
Adjacency list: query for adjacency
Query for adjacency: O(|V|)
Topological Sorting
Receives a DAG as input, outputs the ordering of vertices. Selects a node with no incoming edges, reads it's outgoing edges.
What is linear probing?
When you find a hash collision you start moving down the index to find the next open "bucket". The search area of addresses compose the probe sequence.
Binary tree
a tree where each node can only have up to two nodes attached to it
Associative array
a two-dimensional structure containing key/value pairs of data
Parent
a type of node in a tree , where there are further nodes below it
Priority queue
a variation of a FIFO structure where some data may leave out of sequence where it has a higher priority than other data items
pure virtual function
a virtual function that MUST be overridden in a derived class that has objects.
Extending hashing
allow a search to be performed in two disk accesses
Universal hash function
allows us to choose the hash function randomly.
spanning trees
an undirected connected graph without cycles
Image (or system image)
array data structure. a serialized copy of the entire state of a computer system stored in some non-volatile form such as a file. A system is said to be capable of using system images if it can be shut down and later restored to exactly the same state. In such cases, system images can be used for backup. Hibernation is an example that uses an image of the entire machine's RAM. -If a system has all its state written to a disk, then a system image can be produced by simply copying that disk to a file elsewhere, often with disk cloning applications. On many systems a complete system image cannot be created by a disk cloning program running within that system because information can be held outside of disks and volatile memory, for example in non-volatile memory like boot ROMs.
Randomized binary search tree
binary tree data structure. introduced by Martínez and Roura subsequently to the work of Aragon and Seidel on treaps, stores the same nodes with the same random distribution of tree shape, but maintains different information within the nodes of the tree in order to maintain its randomized structure. Rather than storing random priorities on each node, the randomized binary search tree stores a small integer at each node, the number of its descendants (counting itself as one); these numbers may be maintained during tree rotation operations at only a constant additional amount of time per rotation. When a key x is to be inserted into a tree that already has n nodes, the insertion algorithm chooses with probability 1/(n + 1) to place x as the new root of the tree, and otherwise it calls the insertion procedure recursively to insert x within the left or right subtree (depending on whether its key is less than or greater than the root). The numbers of descendants are used by the algorithm to calculate the necessary probabilities for the random choices at each step. Placing x at the root of a subtree may be performed either as in the treap by inserting it at a leaf and then rotating it upwards, or by an alternative algorithm described by Martínez and Roura that splits the subtree into two pieces to be used as the left and right children of the new node. -The deletion procedure for a randomized binary search tree uses the same information per node as the insertion procedure, and like the insertion procedure it makes a sequence of O(log n) random decisions in order to join the two subtrees descending from the left and right children of the deleted node into a single tree. If the left or right subtree of the node to be deleted is empty, the join operation is trivial; otherwise, the left or right child of the deleted node is selected as the new subtree root with probability proportional to its number of descendants, and the join proceeds recursively.
Leonardo Heap (smoothsort)
heap data structure. a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981.[1] Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n),[2] but it is not a stable sort.[3][self-published source?] The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.
D-ary heap (d-heap)
heap data structure. a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan[2] and Jensen et al.,[4] d-ary heaps were invented by Donald B. Johnson in 1975.[1] This data structure allows decrease priority operations to be performed more quickly than binary heaps, at the expense of slower delete minimum operations. This tradeoff leads to better running times for algorithms such as Dijkstra's algorithm in which decrease priority operations are more common than delete min operations.[1][5] Additionally, d-ary heaps have better memory cache behavior than a binary heap, allowing them to run more quickly in practice despite having a theoretically larger worst-case running time.[6][7] Like binary heaps, d-ary heaps are an in-place data structure that uses no additional storage beyond that needed to store the array of items in the heap
What is one way of approximating the median of a data set, cheaply?
median-of-three pivot selection. Take the median of the first, last and middle value in the array.
destructor
member function automatically called when an object is destroyed
constructor
member function automatically called when object is created
public interface
members of an object that are available outside the object. this allows object to provide some access to some data without sharing its internal details
Fenwick tree (binary indexed tree)
multiway tree data structure. a data structure that can efficiently update elements and calculate prefix sums in a table of numbers. This structure was proposed by Peter Fenwick in 1994 to improve the efficiency of arithmetic coding compression algorithms.[1] -There is a trade-off between the efficiency of element update and prefix sum calculation. In a flat array of n numbers, calculating prefix sum requires O(n) time, while in an array of prefix sums, updating elements requires O(n) time. Fenwick trees allow both operations to be performed in O(\log n) time. This is achieved by representing the numbers as a tree, where the value of each node is the sum of the numbers in that subtree. The tree structure allows operations to be performed using only O(\log n) node accesses.
Rose tree (multi-way tree)
multiway tree data structure. a tree data structure with a variable and unbounded number of branches per node.[1][better source needed] The name rose tree for this structure is prevalent in the functional programming community, e.g., in the context of the Bird-Meertens formalism.[2] It was coined by Lambert Meertens to evoke the similarly-named, and similarly-structured, common rhododendron
polymorphism
objects referenced by variables of the same reference type can have different forms
hash collision
occurs when multiple entries are mapped to the same entry with hash function
15. The following Java method uses recursion to search for a key in the binary search tree whose root node is referred to by the parameter root. If it finds the key, it returns a reference to the corresponding data item. If it doesn't find it, it returns null. public static Object searchTree(Node root, int key) { if (root == null) return null; else if (key == root.key) return root.data; else if (key < root.key) return searchTree(root.left, key); else return searchTree(root.right, key); } In the space below, rewrite the search() method so that it uses iteration instead of recursion:
public static Object searchTree(Node root, int key) { while (root != null) { if (key == root.key) // found it return root.data; else if (key < root.key) // less than so go left root = root.left; else // greater than so go right root = root.right; } return null; // we didn't find it or throw not found exception }
Linear octree
space partitioning or binary space partitioning data structure. an octree that is represented by a linear array instead of a tree data structure. -To simplify implementation, a linear octree is usually complete (that is, every internal node has exactly 8 child nodes) and where the maximum permissible depth is fixed a priori (making it sufficient to store the complete list of leaf nodes). That is, all the nodes of the octree can be generated from the list of its leaf nodes. Space filling curves are often used to represent linear octrees.
Binary File
stores data as a sequences of 0s and 1s
binary recursion
two recursive calls for each non-base case
for the directed edge (u,v), u is the _ and v is the _
u is the tail, v is the head.
interface
ultimate abstract class may have only public methods with no definition. Java's version of ADT
Randomized algorithm
uses random number at some point to make a determination about what to do next
Binary Search Tree: property
value[left[x]] <= value[x] value[right[x] >= value[x]
Increment/Decrement
y = ++x; adds one to x and saves new value to y, y = x++; saves old value to y and then increases x.
Hamiltonian Graphs
- class of graphs where path can be created from V(o) to V(k) given that all vertices are encountered one time - if V(o) != V(k), it is a hamiltonian trail(path) - if V(o) == V(k), it is a hamiltonian cycle(circuit) - star graph will never be hamiltonian - wheel/cycle graph always will be hamiltonian
Eulerian Graphs
- class of graphs where there exist one or more paths such that every edge in the graph is used once - Eulerian cycle exists if all vertices have even degrees - Eulerian path exists if all vertices have even except for two
Depth First Search
- explores as far as possible along each branch before backtracking - stack-base
Degenerate Binary Tree
A degenerate (or pathological) tree is where each parent node has only one associated child node.This means that performance-wise, the tree will behave like a linked list data structure.
Mnemonic
A device such as a pattern of letters, ideas, or associations that assists in remembering something
Undirected Graph
A graph whose edges are not directed
Linked List
A linear data structure, much like an array, that consists of nodes, where each node contains data as well as a link to the next node, but that does not use contiguous memory.
Cycle
A path of length 3 or more that connects some vertex to itself
cycle
A path of positive length that starts and ends at the same vertex and does not traverse the same edge more than once.
Perfect Binary Tree
A perfect binary tree is a binary tree in which all interior nodes have two children and all leaves have the same depth or same level.
Heapify
A process in Minimum Heap Trees where the new node is switched up until min heap state is achieved.
Complete Graph
A simple undirected graph in which every pair of distinct vertices is connected by a unique edge, in other words, every vertex is directly connected to every other vertex in the graph
Full Tree
A tree in which every level of the tree is completely full, with no missing nodes.
Binary Search Tree
A tree in which nodes are inserted systematically in natural order, with the final property of each left child being less than or equal to its parent, and each right child being greater than its parent. (Does not preserve the order in which nodes were added.
Data Structure
A way of organizing data in a computer so that it can be used efficiently, such as an array, linked list, stack, queue, or binary tree.
Priority Queue: advantage, disadvantage
Advantage: cheap way to sort priorities, sometimes you want to do things first Disadvantage: worse at inserting and searching than BST
Heap Binary Tree: advantage, disadvantage
Advantage: fast access, quick insert and delete Disadvantage: slow search, efficient memory if full
Stack: advantage, disadvantage
Advantage: quick access Disadvantage: inefficient with an array
Bellman-Ford Algorithm
Algorithm which computes shortest paths from a single vertex to all other vertices in a weighted digraph. Is slower than its counterpart, but is able to handle edge weights with negative values. Works by initially setting the distance to all nodes to infinity, and then iteratively relaxing the edges in an order which would maintain a shortest path from the starting edge to any other edge. Has a time complexity of O(EV)
Ancestors of a vertex v in a tree
All vertices on the simple path from the root to v
Inverted Index
An index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents (named in contrast to a Forward Index, which maps from documents to content). The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.
Iterative Refinement
Analysis pattern. Most problems can be solved using s brute-force approach. Find such a solution and improve upon it.
Case Analysis
Analysis pattern. Split the input/execution into a number of cases and solve each case in isolation
Modular Programming
Breaking a program up into smaller, manageable functions or modules. Improves stability and simplifies main program
17. Suppose that you would like to create an instance of a new Map that has an iteration order that is the same as the iteration order of an existing instance of a Map. Which concrete implementation of the Map interface should be used for the new instance? A. TreeMap B. HashMap C. LinkedHashMap D. The answer depends on the implementation of the existing instance.
C. LinkedHashMap // LinkedHashMap preserves the insertion order.
Folding:
Combining parts of the key using operations like + and bitwise operations such as exclusive or. Key: 123456789 123 456 789 --- 1368 ( 1 is discarded)
Directed graphs
Consists of NONempty set of vertices C and a set of directed edges E Each edge is associated with an orders pairs of vertices
Hash Table
Constant access time (on average).
13. A graph implementation that uses a two-dimensional array to represent the edges would be most reasonable for which of the following cases? A. 1000 nodes, 1200 edges B. 100 nodes, 4000 edges C. 1000 nodes, 10000 edges D. 10 nodes, 20 edges E. none of these, since a graph can only be represented by a linked structure.
D. 10 nodes, 20 edges 102 -20 = 80 wasted
Negative Edge Costs
Dijkstra's cannot solve. Requires Bellman-Ford.
Simple directed graph
Directed edge and no multiple edgesor loops
Breadth-First Search
Explores the oldest unexplored vertices first. Places discovered vertices in a queue. In an undirected graph: Assigns a direction to each edge, from the discoverer to the discovered, and the discoverer is denoted to be the parent.
Undirected Graph
Given any path connecting vertices A and B, you can travel from A to B or B to A
Graps and social networks
Graphs are always used to model social structure based on different kinds of relationships between people of group of people Called social structure
Koorde
Hash data structure. In peer-to-peer networks, Koorde is a Distributed hash table (DHT) system based on the Chord DHT and the De Bruijn graph (De Bruijn sequence). Inheriting the simplicity of Chord, Koorde meets O(log n) hops per node (where n is the number of nodes in the DHT), and O(log n/ log log n) hops per lookup request with O(log n) neighbors per node. -The Chord concept is based on a wide range of identifiers (e.g. 2^160) in a structure of a ring where an identifier can stand for both node and data. Node-successor is responsible for the whole range of IDs between itself and its predecessor.
Collesion handeling:
How you handle the collisions so each element in the hittable stores only one item.
Floyd-Warshall
Implicitly determines shortest paths taking into account all vertexes.
Implement queue by array
Initialise () Size=0 Front =1 Rear=0 Check empty by Boolean empty () Return size=0 Boolean full () Return size =max Enqueue () If size <max then Size=size+1 Rear =(Rear mod max)+1 Arrayqueue [Rear ]=x End if Dequeue (objectx) If size>0 then Size=size-1 X=arrayQueu [ftont] front=(front mod max)
Priority Queue
Insert and ExtractMax. In an array/list implementation one operation is very fast (O(1)) but the other one is very slow (O(n)). Binary heap gives an implementation where both operations take O(log n) time.
Stable Sorting Algorithm
Items with the same key are sorted based on their relative position in the original permutation
Influence graphs
Model the behaviour that observed of certain people can influence the thinking of other people
Non-valid variable names
No!Exclamation; 8addName;no-hyphen
Complexity for iterating over associated values:
O(T.S + N) --> worst case.
B-Tree Search (Average)
O(log(n))
Binary Search Tree Access (Average)
O(log(n))
Treemap complexity of basic operations:
O(logN)
B-Tree Space Complexity
O(n)
Binary Search Tree Access (Worst)
O(n)
N L R
Preorder traversal (Polish)
Prime number Tables
Reduce the chance of collision.
Combinations
Repetition is Allowed: such as coins in your pocket (5,5,5,10,10) No Repetition: such as lottery numbers (2,14,15,27,30,33) https://www.mathsisfun.com/combinatorics/combinations-permutations.html
Heapify (bubble down)
Swap a node with one of its children, calling bubble_down on the node again until it dominates its children. Each time, place a node that dominates the others as the parent node.
Exception handling
When there's an error, the program makes an error object and passes it off to the runtime system, which looks for a method in the call stack to handle it.
Pointer
a data item that identifies a particular element in a data structure
Adjacency matrix
a data structure set up as a two-dimensional array or grid that shows whether there is an edge between each pair of nodes
Priority Queue
a data structure that allows at least the following two operations: insert, which does the obvious thing; and deleteMin, which finds, returns, and removes the minimum element in the priority queue.
Dictionary
a data structure that maps keys to data
Composite types
a data type. a composite data type or compound data type is any data type that can be constructed in a program using the programming language's primitive data types and other composite types. It is sometimes called a structure or aggregate data type, although the latter term may also refer to arrays, lists, etc. The act of constructing a composite type is known as composition. Array Record (also called tuple or struct) Union Tagged union (also called variant, variant record, discriminated union, or disjoint union)
Text File
a file that contains human-readable characters
complete graph
a graph with every pair of its vertices connected by an edge
Arc
a join or relationship between two nodes - also known as an edge
Static data structure
a method of storing data where the amount of data stored is fixed
data structure
a particular scheme organizing related data items.
Heap
a pool of unused memory that can be allocated to a dynamic data structure
Skew Heap
a self-adjusting of a leftist heap.
A path from vertex u to vertex v
a sequence of adjacent vertices that starts with u and ends with v
If a graph's edges are unordered [ (u,v) == (v,u)], then the vertices u and v are
adjacent
Reference variable
alias of another variable. changes to reference variable, changes variable it references.
Θ-notation
asymptotically tight bound
Derived class
child class inheriting from base class
heap
complete binary tree; special binary tree which is ordered in weaker sense and will always be complete binary tree; its root contains values greater than or equal to each of its children and has heaps as its subtrees.
Array data type
composite data type. a data type that is meant to describe a collection of elements (values or variables), each selected by one or more indices (identifying keys) that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array.[1] By analogy with the mathematical concepts of vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively.
Compiling files with g++
g++ <insert source file name here>.cpp -o <insert your desired executable file name here>
exception
indicate something unexpected has occurred or been detected. allows for program to deal with problem in controlled manner.
The more items a table can hold, the () likely a collision will happen.
less
30. What is the height of a full binary search tree with n elements?
log2(n)
Graph
made up of a set of vertices and edges
Leftist Heap
maintains the heap order property (Smaller values over larger values)
backtracking algorithm
reduce the test size over a brute force type solution by pruning (removing) very bad possibilities
BSP tree (binary space partitioning)*
space partitioning or binary space partitioning data structure. BSP is a method for recursively subdividing a space into convex sets (the region such that, for every pair of points within the region, every point on the straight line segment that joins the pair of points is also within the region) by hyperplanes (a subspace of one dimension less than its ambient space). This subdivision gives rise to a representation of objects within the space by means of a tree data structure known as a BSP tree. -Binary space partitioning was developed in the context of 3D computer graphics,[1][2] where the structure of a BSP tree allows spatial information about the objects in a scene that is useful in rendering, such as their ordering from front-to-back with respect to a viewer at a given location, to be accessed rapidly. Other applications include performing geometrical operations with shapes (constructive solid geometry) in CAD,[3] collision detection in robotics and 3-D video games, ray tracing and other computer applications that involve handling of complex spatial scenes. -Binary space partitioning is a generic process of recursively dividing a scene into two until the partitioning satisfies one or more requirements. It can be seen as a generalisation of other spatial tree structures such as k-d trees and quadtrees, one where hyperplanes that partition the space may have any orientation, rather than being aligned with the coordinate axes as they are in k-d trees or quadtrees. When used in computer graphics to render scenes composed of planar polygons, the partitioning planes are frequently (but not always) chosen to coincide with the planes defined by polygons in the scene. -The specific choice of partitioning plane and criterion for terminating the partitioning process varies depending on the purpose of the BSP tree. For example, in computer graphics rendering, the scene is divided until each node of the BSP tree contains only polygons that can render in arbitrary order. When back-face culling is used, each node therefore contains a convex set of polygons, whereas when rendering double-sided polygons, each node of the BSP tree contains only polygons in a single plane. In collision detection or ray tracing, a scene may be divided up into primitives on which collision or ray intersection tests are straightforward. -Binary space partitioning arose from the computer graphics need to rapidly draw three-dimensional scenes composed of polygons. A simple way to draw such scenes is the painter's algorithm, which produces polygons in order of distance from the viewer, back to front, painting over the background and previous polygons with each closer object. This approach has two disadvantages: time required to sort polygons in back to front order, and the possibility of errors in overlapping polygons. Fuchs and co-authors[2] showed that constructing a BSP tree solved both of these problems by providing a rapid method of sorting polygons with respect to a given viewpoint (linear in the number of polygons in the scene) and by subdividing overlapping polygons to avoid errors that can occur with the painter's algorithm. A disadvantage of binary space partitioning is that generating a BSP tree can be time-consuming. Typically, it is therefore performed once on static geometry, as a pre-calculation step, prior to rendering or other realtime operations on a scene. The expense of constructing a BSP tree makes it difficult and inefficient to directly implement moving objects into a tree. -BSP trees are often used by 3D video games, particularly first-person shooters and those with indoor environments. Game engines utilising BSP trees include the Doom engine (probably the earliest game to use a BSP data structure was Doom), the Quake engine and its descendants. In video games, BSP trees containing the static geometry of a scene are often used together with a Z-buffer, to correctly merge movable objects such as doors and characters onto the background scene. While binary space partitioning provides a convenient way to store and retrieve spatial information about polygons in a scene, it does not solve the problem of visible surface determination. -The canonical use of a BSP tree is for rendering polygons (that are double-sided, that is, without back-face culling) with the painter's algorithm. Each polygon is designated with a front side and a back side which could be chosen arbitrarily and only affects the structure of the tree but not the required result
list operaitons
size(), isEmpty(), get(i), set(i, e), add(i, e ), remove(e)
dictionary implementations
sort by search key array based; sort by search key link based; usurper array based; unsorted link based;
Source to Executable
source->preprocessor->modifiedSource->compiler->objectCode->linker->executable
node
stores the element and the link to the next node
open addressing
the colliding item is placed in a different cell of the table
Quadratic Probing
the collision function is quadratic
height of a tree
the length of the longest simple path from the root to a leaf
Index
the location where values where be stored, calculated from the keys
percolate down
the new element is percolated down the heap until correct location is found
tree root
top level vertex
Clustering
when a hashing algorithm produces indices that are not randomly distributed
Full Tree
-All leaves are on the same level
Selection Sort
-Array has a sorted portion & unsorted portion. -Absolutely sort the next element, grow sorted portion -O(n^2) runtime always
Algorithm Analysis
-Based on function growth rate -Big O() notation -Worst case, avg case, best case analysis
Runtime Stack
-Contains all data from a method call: local variables, arguments, return address -Pushes new return address when new method is called
Iterator
-Data type that traverses an object -Acts as a cursor through a data member -Saves its position in the data type
Insertion Sort
-Grow sorted element section -Relatively sort elements in sorted section -Worst Case: O(n^2) -Best Case: O(n)
Inheritance
-Is a relationship -Writes a specialized subclass for a super class -"Extends"
Linked Chain
-Node based -More memory overhead -Each node has an address of the next node in the chain and an object
Divide and Conquer
-Recursive case that divides the problem into fractional sized components
Maximum Heap Tree
A tree in which every parent is greater in value than both its children, which means that the root of the tree is greatest value in the tree.
Heap Binary Tree: access, search, insert, delete,
Access: O(1) Search: O(n) Insert: O (lg n) Best case: sorted array Delete: O (lg n)
Red Black Tree: advantage, disadvantage
Advantage: quick insert, delete, and search Disadvantage: complex implementation
Iterators
An object that knows how to "walk" over a collection of things. Encapsulates everything it needs to know about what it's iterating over. Should all have similar interfaces. Can read data, move, know when to stop.
What is the big O of the BODY of a for loop with n elements that contains only: sum = sum + 1?
O(1). The loop portion itself would be O(n), since it has to go through n elements.
Double Rotation
Two single rotation at different locations, either right-left or left-right. First rotation is deeper than the second.
Quick Sort
efficient sorting algorithm, serving as a systematic method for placing the elements of an array in order
Lists data structure
linear data structure.
Recursion
the process of a subroutine calling itself
Linear Probing
the process of finding an empty cell involves checking in one direction of the collision for an empty cell
16. Questions about binary trees refer to the definitions on the last page for BinaryTree, BinarySearchTreeInterface, BinarySearchTree, and Node. 1. Briefly explain why the root data member in the BinaryTree class is declared to be protected? 2. Briefly explain why the Node class is declared to be static? 3. Write a recursive BinaryTree method named count_leaves that returns the number of leaves in the tree.
1. So BinarySearchTree(the subclass) can also use it. 2. Because the Node class doesn't need to be bound to a single instance of the parent class 3. public int count_leaves(Node node) { if(node == null) return 0; return 1 + count_leaves(node.left) + count_leaves(node.right); }
Dictionary: definition
A data structure that maps keys to values.
Dense
A graph with many edges
Pop
A process used in stack and queue processing where a copy of the top or front value is acquired, and then removed from the stack or queue (Dequeue).
Peek
A process used in stack and queue processing where a copy of the top or front value is acquired, without removing that item.
Push
A process used in stack and queue processing where a new value is inserted onto the top of the stack OR into the back of the queue (Enqueue).
Linear Data Structure
A programming data structure that occupies contiguous memory, such as an array of values.
Rolling hash function
A rolling hash (also known as a rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below.
Self-loop
A self-loop is an edge that connects a vertex to itself.
linked list
A sequence of zero or more nodes containing some data and pointers to other nodes of the list.
Subtrees of T
A set of nodes with a distinguished root node R and a partition of the remaining nodes into subsets. T0,T1, .... Tn-1 which are themselves trees.
simple cycle
A simple cycle is a cycle with no repeated edges or vertices (except the requisite repetition of the first and last vertices).
spanning forest
A spanning forest of a graph is the union of the spanning trees of its connected components.
spanning tree
A spanning tree of a connected graph is a subgraph that contains all of that graph's vertices and is a single tree.
tree
A tree is an acyclic connected graph.
Binary trees
AA tree AVL tree Binary search tree Binary tree Cartesian tree Left-child right-sibling binary tree Order statistic tree Pagoda Randomized binary search tree Red-black tree Rope Scapegoat tree Self-balancing binary search tree Splay tree T-tree Tango tree Threaded binary tree Top tree Treap WAVL tree Weight-balanced tree
How can you make it so that you can determine the difference between a full and empty circular array?
Add an empty space that will act as delimiter so that the array is full when frontIndex equals (backIndex + 2) % queue.length, whereas it is empty when frontIndex equals (backIndex + 1) % queue.length.
weight/cost matrix
Adjacency matrix of a weighted graph.
Bellman-Ford
Allowed to reconsider costs of reaching vertexes. Can detect negative cost cycles. Able to handle negative graphs by performing relaxation on all edges V-1 times where V is the number of vertices.
Reduction
Analysis pattern. Use a well-known solution to some other problem as a subroutine.
Weight
Associated with each edge
Heap v.s. BST (time complexity)
BST Insert: O(log N) FindMin: O(log N) Heap Insert: O(log N) FindMin: O(1) DeleteMin is O(log N) BuildHeap from N inputs: O(N) Find(X): O(N) * all operations work in time O(log n) (GetMax even works in O(1)) Sacrificed performance of these operations in order to get O(1) performance for FindMin
How would a recursive merge sort work?
By doing making each sub-array the input for the next recursive Merge Sort call until you get down to 2 elements per array.
How can you read data from an input file?
By using a scanner object. Scanner read = new Scanner(File filename);
Examples
Call digraph Web digraph Software design
Quadratic Probing
Checks the square of the nth time it has to check, causes secondary clustering. Not guaranteed to find an open table spot unless table is 1/2 empty.
Relaxation
Getting from A->C more cheaply by using B as an intermediary.
8. Here is an adjacency list representation of a directed graph where there are no weights assigned to the edges). See Question 8 for picture (a) Draw a picture of the directed graph that has the above adjacency list representation. (b) Another way to represent a graph is an adjacency matrix. Draw the adjacency matrix for this graph.
Hand draw this one
Weighted path length
Leaf's weight times its depth
Array: memory
Memory: O(n)
Splay Tree Space Complexity
O(n)
Influence graph mechanism
Personally on vertex a can influence the behaviour of person in vertex b No loops or multiple edges used
Preemption
Preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch.
OOA
Process of gathering requirements of a solution; Object Oriented Analysis
Trie
Search tree but with a child position for each character in the library Think spelling
Hash tables: search, insert, delete
Search: O(1-n) Insert: O(1-n) Delete: O(1-n)
Sharding
Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.
Compute XOR of every bit in an integer
Similar to addition, XOR is associative and communicative, so, we need to XOR every bit together. First, XOR the top half with the bottom half. Then, XOR the top quarter of the bottom half with the bottom quarter of the bottom half... x ^= x >> 32 x ^= x >> 16 x ^= x >> 8 x ^= x >> 4 x ^= x >> 2 x ^= x >> 1 x = x & 1
Parameter Passing
Small, no modification - value Large, no modification - CONST reference modified - pointer
Minimum Product Spanning Tree
The cost of a tree is the product of all the edge weights in the tree, instead of the sum of the weights. Since log(a*b) = log(a) + log(b), the minimum spanning tree on a graph whose edge weights are replaced with their logarithms gives the minimum product spanning tree on the original graph.
What is the ideal pivot value in an array?
The median value.
Minimum Spanning Trees
The smallest connected graph in terms of edge weight, minimizing the total length over all possible spanning trees. However, than can be more than one minimum spanning tree in a graph. All spanning trees of an unweighted graph are minimum spanning trees.
What happens to elements in each sub array that equal the pivot value?
They are allowed to stay in their respective sub-arrays.
Adjacent vertices
Two vertices that in undirected graph
Heap Sort
Unstable, O(n log n), Ω(n log n): Make a heap, take everything out.
Dictionary
Uses search key to identify its entries
How can you write data to an output file?
Using a PrintWriter object.
Insertion & Quick Sort
Using both algorithms together is more efficient since O(n log n) is only for large arrays.
Bx-tree
b-tree data structure. a query and update efficient B+ tree-based index structure for moving objects. -The base structure of the Bx-tree is a B+ tree in which the internal nodes serve as a directory, each containing a pointer to its right sibling. In the earlier version of the Bx-tree,[1] the leaf nodes contained the moving-object locations being indexed and corresponding index time. In the optimized version,[2] each leaf node entry contains the id, velocity, single-dimensional mapping value and the latest update time of the object. The fanout is increased by not storing the locations of moving objects, as these can be derived from the mapping values.
B*-tree
b-tree data structure. balances more neighboring internal nodes to keep the internal nodes more densely packed (Comer 1979, p. 129). This variant requires non-root nodes to be at least 2/3 full instead of 1/2 (Knuth 1998, p. 488). To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. Deleting nodes is somewhat more complex than inserting however.
static member function
can be only access static member variables, can be called before objects are defined.
What is the formula for the number of nodes in a full tree of height h? Derive the version that will give you height based upon nodes?
n = 2^h - 1; h = log(base2)(n+1)
Average Lower bound for adjacent swaps
n(n-1)/4 Ω(n^2)
HeapSort
priority queues can be used to sort in O(N logN) time. The basic strategy is to build a binary heap of N elements.
concrete class
provide implementation for each method and all abstract methods in super class
shared_pointer
provides shared ownership of object c++ 11 smart pointer
weak_pointer
reference to an object already managed by shared pointer... does not have ownership of object c++ 11 smart pointer
Compute x modulo a power of 2 (y)
x & (y - 1)
Isolate the lowest bit that is 1 in x
x & ~(x - 1)
antisymmetric property
x <= y ^ y <= x -> x = y
Graph theory
the underlying mathematical principles behind the use of graphs
Chained bucket:
+--easy to implement +-- buckets can't overfill +-- buckets won't waste time. +-- buckets are dynamically sized.
Cycle Graph
- Cn, n=number of vertices, n>=3 - circle with every vertex connected to two other vertices
Complete Bipartite
- Ka,b - every vertex in set A connects to every vertex in set B
Complete Graph
- Kn, n=number of vertices, n>=3 - every vertex is connected to every other vertex
Star Graph
- Sn, n=number of vertices, n>=4 - single point connected to all other points
Wheel Graph
- Wn, n=number of vertices, n>=4 - Cycle graph + Star graph
Minimum Spanning Trees
- a tree that spans a weighted graph such that the sum of the edges in the tree is the smallest possible
Probe Hashing:
-> Hash it, and if it leads to a collision, use a separate equation to determine the step size and use that step size to find a new site.
-Complete tree
-All rows are filled except the last -Last row is filled left to right
Casting
-Allows you to look at an object type as if it were something else -Temporarily changes the reference type
Circular Linked Chain
-Always has at least one node empty -If full: back.next.next()=front -If empty: back.next()=front
Interfaces
-Fully abstract class -Can declare static constants, methods have no code -"Implements"
Abstract Type
-No code -Has abstract methods -Defines data that is stored and operations. -Defines the WHAT not the HOW
Tree Data Type
-No restrictions on structure -Stored with each node pointing to right sibling and 1st child -Lost of overhead this way
Heap Sort
-Put array in a heap, then remove the root and re heapify it -Once all of the elements are removed, array is in sorted order -O(nlogn) runtime
Contiguous memory
-Random Access -Binary Search -Little per element memory overhead -Simple -Allocated all at once -Requires shifts for in order
Amortized Analysis
-Sequence of operations where the runtime follows a pattern -Gives runtime for each iteration of an alorithm -Steps: -Determine the pattern -Add runtime for entire sequence -Divide by number of operations
Geometric series
.
BST Priority Queue: insert, max, extract max, increase valye
BST-insert: O(h) BST-maximum: O(h) BST-extract-max: O(h) BST-increase-value: O(h)
6. What is the expected number of operations needed to loop through all the edges terminating at a particular vertex given an adjacency matrix representation of the graph? (Assume n vertices are in the graph and m edges terminate at the desired node). A. O(m) B. O(n) C. O(m²) D. O(n²)
D. O(n²)
Mixed graphs
Have directed and undirected edges and can use loops
Rehashing Complexity:
O(N)-- costly. Carefully select initial TS to avoid re-hashing.
if (u,v) is the last edge of the simple path from the root to vertex v, v is the _ of u
child
extends
only one class in Java
stack
sequence of elements with one end designed as the top.
24. Given the the following trees stored as arrays, which of them represent a heap? treeA = [13, 21,16, 24, 31, 19, 68, 65, 26, 32] treeB = [13, 21, 16, 6, 31, 19, 68, 65, 26, 32]
treeA = [13, 21,16, 24, 31, 19, 68, 65, 26, 32] minHeap
Hash collision
two (or more) keys hash to same slot
Loop
when a vertex has an edge to itself
No indication Access
-Accessible by the class and package
Steps to resizing:
1. Double table size to nearest prime number 2. Re-hash items from old table into the new table.
simple path
A directed path with no repeated vertices
Set Partition
A partitioning of elements of some universal set into a collection of disjointed subsets. Thus, each element must be in exactly one subset.
Weakly connected
A path between every pair of vertices which are undirected.
Path
A path in a graph is a sequence of vertices connected by edges. A simple path is one with no repeated vertices.
Define post-order traversal of a binary tree?
A postorder traversal visits the root of a binary tree after visiting the nodes in the root's subtrees. In particular, it visits nodes in the following order: 1-Visit all the nodes in the root's left subtree 2-Visit all the nodes in the root's right subtree 3-Visit the root
Quick Select
A selection algorithm to find the kth smallest element in an unordered list. Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side - the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n). Partition algorithm:
Minimum Weight Spanning Trees (MST)
A spanning tree whose weight is no larger than the weight of any other spanning tree which could be made with the graph. The properties of this thing include that the graph is connected, the edge weights may not necessarily be distances, the edge weights may be zero or negative, and the edge weights are all different. Can be constructed using a greedy algorithm such as Prim's or Kruskal's. Generally used in network design.
Array length
A value that represents the number of elements contained in an array. Often there is a process associated with an array that provides this value, such as list.length, or len(list).
memoization
An optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
Binary Search
An ordered array of data which has efficiently supported operations. The worst and average case of a search using this structure is lgN. The Worst case of an insertion is N, and the average case of an insertion is N/2.
Decision tree
Application-specific tree data structure. a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. -Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.
Load Factor
Approximately how it's full... 0.7-0.8.
Abstract Class
Contains abstract methods (method placeholders) that are implemented by concrete subclasses.
Directed acyclic graph (DAG)
Graph data structure. a finite directed graph with no directed cycles. That is, it consists of finitely many vertices and edges, with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again. Equivalently, a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence. -The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown automaton. This allows the algorithm to encode the nondeterministic choices in parsing an ambiguous grammar, sometimes with greater efficiency. -look at
In loops
Initial and end vertex are same
Static Memory
Memory allocated to an array, which cannot grow or shrink once declared.
Contiguous Memory
Memory that is "side-by-side" in a computer, typical of an array structure.
Binary Search Tree Search (Average)
O(log(n))
Winged edge
Other data structure. a data representation used to describe polygon models in computer graphics. It explicitly describes the geometry and topology of faces, edges, and vertices when three or more surfaces come together and meet at a common edge. The ordering is such that the surfaces are ordered counter-clockwise with respect to the innate orientation of the intersection edge. Moreover the representation allows numerically unstable situations like that depicted below. -The winged edge data structure allows for quick traversal between faces, edges, and vertices due to the explicitly linked structure of the network. This rich form of specifying an unstructured grid is in contrast to simpler specifications of polygon meshes such as a node and element list, or the implied connectivity of a regular grid. It has application in common modeling operations such as subdivision, extrusion etc.
Insertion Sort
Stable, O(n^2), Ω(n) : Swapping elements one at a time starting at the beginning.
Prim's Algorithm
Start with one vertex, grow tree on min weight edge from all vertices So out of all reachable edges that don't cause cycles, take the smallest
What is a simple definition of the height of a tree?
The number of nodes between the root and the lead on the longest path.
Module dependency graph
Used in software design how to design a program into different parts And testing ad maintenance of thebresulting programs
siblings
Vertices of a tree that have the same parent.
How do you delete a value within the hash table?
You just set Table[hash(Key)] = null
Binary Heap
a binary tree with every level completely filled except possibly the last level
Graph
a mathematical structure that models the relationship between pairs of objects
Call stack
a special type of stack used to store information about active subroutines and functions within a program
trees data structure
a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. binary trees, b-trees, heaps, trees, multiway trees, space-partitioning trees, application-specific trees
Dynamic array (or growable array, resizable array, dynamic table, mutable array, or array list)
array data structure. a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. -A dynamic array is not the same thing as a dynamically allocated array, which is an array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end. -The simplest dynamic array is constructed by allocating a fixed-size array and then dividing it into two parts: the first stores the elements of the dynamic array and the second is reserved, or unused. We can then add or remove elements at the end of the dynamic array in constant time by using the reserved space, until this space is completely consumed. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array's capacity or physical size, which is the maximum possible size without relocating data. -In applications where the logical size is bounded, the fixed-size data structure suffices. This may be short-sighted, as more space may be needed later. A philosophical programmer may prefer to write the code to make every array capable of resizing from the outset, then return to using fixed-size arrays during program optimization. Resizing the underlying array is an expensive task, typically involving copying the entire contents of the array. -To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling in size, and use the reserved space for future expansion. -The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements: 1) Getting or setting the value at a particular index (constant time) 2) Iterating over the elements in order (linear time, good cache performance) 3) Inserting or deleting an element in the middle of the array (linear time) 4) Inserting or deleting an element at the end of the array (constant amortized time) -Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures. However, in languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the actual data, but rather it will store references to the data that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of this data structure are lost. -Compared to linked lists, dynamic arrays have faster indexing (constant time versus linear time) and typically faster iteration due to improved locality of reference; however, dynamic arrays require linear time to insert or delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in constant time. This disadvantage is mitigated by the gap buffer and tiered vector variants discussed under Variants below. Also, in a highly fragmented memory region, it may be expensive or impossible to find contiguous space for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously. -A balanced tree can store a list while providing all operations of both dynamic arrays and linked lists reasonably efficiently, but both insertion at the end and iteration over the list are slower than for a dynamic array, in theory and in practice, due to non-contiguous storage and tree traversal/manipulation overhead.
29. What advantage does balanced trees provide over BST?
balanced trees best and worst search time is O(log n), BST's worst is O(n).
23. Given the complete tree shown below and stored using the array representation of trees (for an element at position i, the left child is on position 2*i + 1, right child is on position 2*(i+1) ), return the height of the tree. [1, 2, 3 , 4 , 5, 6, 8, 10, 12, 13, 15]
height = 3
Radix Sort
non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value
ancestors of a node
parent, grandparent, grandgrandparent
Inheritance
provides way to create new class from an existing class. the new class is specialized version of the existing class
Cover tree
space partitioning or binary space partitioning data structure. a type of data structure in computer science that is specifically designed to facilitate the speed-up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data structures developed for indexing intrinsically low-dimensional data.[1] -The tree can be thought of as a hierarchy of levels with the top level containing the root point and the bottom level containing every point in the metric space. Each level C is associated with an integer value i that decrements by one as the tree is descended. Each level C in the cover tree has three important properties: 1.Nesting: C_{i}\subseteq C_{i-1} 2.Covering: For every point p [ ] C_{{i-1}}, there exists a point q [ ] C_{i} such that the distance from p to q is less than or equal to 2^{i} and exactly one such q is a parent of p. 3.Separation: For all points p,q [ ] C_{i}, the distance from p to q is greater than 2^{i}
X-tree (for eXtended node tree)
space partitioning or binary space partitioning data structure. an index tree structure based on the R-tree used for storing data in many dimensions. It appeared in 1996,[2] and differs from R-trees (1984), R+-trees (1987) and R*-trees (1990) because it emphasizes prevention of overlap in the bounding boxes, which increasingly becomes a problem in high dimensions. In cases where nodes cannot be split without preventing overlap, the node split will be deferred, resulting in super-nodes. In extreme cases, the tree will linearize, which defends against worst-case behaviors observed in some other data structures.
Nesting
the process of putting one statement inside another statement
Load factor
the ratio of how many indices are available to how many there are in total
Magnitude and direction
the two components of a vector
Components
the values within a vector
root
top node, node without parent
Suffix array
tree data structure where each tree node compares a bit slice of key values. a sorted array of all suffixes of a string. It is a data structure used, among others, in full text indices, data compression algorithms and within the field of bioinformatics.[1] -Suffix arrays were introduced by Manber & Myers (1990) as a simple, space efficient alternative to suffix trees. They have independently been discovered by Gaston Gonnet in 1987 under the name PAT array (Gonnet, Baeza-Yates & Snider 1992).
Graph
A data structure in programming which consists of a set of vertices (nodes) and edges (connections).
Binary Tree
A data structure that consists of nodes, with one root node at the base of the tree, and two nodes (left child and right child) extending from the root, and from each child node.
Non-Linear Data Structure
A data structure that does not occupy contiguous memory, such as a linked list, graph, or tree.
Undirected Graph
A graph that contains edges between vertices with no specific direction associated with any edge.
Directed Graph
A graph where an edge has a direction associated with it, for example, a plane flight that takes off in one location and arrives in another. The return flight would be considered a separate edge.
Connected Graph
A graph where there exists a simple path from any vertex in the graph to any other vertex in the graph, even if it takes several "hops" to get there.
Parent Node
A node, including the root, which has one or more child nodes connected to it.
Full Tree Traversal
A non-executable, visual approach to help determine the pre-order, in-order, or post-order traversal of a tree.
Minimum Heap Tree
A tree in which every parent is lesser in value than both its children, which means that the root of the tree is least value in the tree.
Complete Tree
A tree in which there are no missing nodes when looking at each level of the tree. The lowest level of tree may not be completely full, but may not have any missing nodes. All other levels are full.
Head
A typical object variable identifier name used to reference, or point to, the first object in a linked list. The number one rule for processing linked lists is, 'Never let go of the head of the list!", otherwise all of the list is lost in memory. The number two rule when managing linked lists is, 'Always connect before you disconnect!'.
2D Array
An array of an arrays, characterized by rows and columns, arranged in a grid format, but still stored in contiguous, or side-by-side memory, accessed using two index values.
Ragged Array
An array where the number of columns in each row may be different.
Row Major
An array where the two index values for any element are the row first, then the column.
Node
An object linked to other objects, representing some entity in that data structure.
Friendship graphs mechanism
Each person in particular group represents by avertex. Undirected edges are connect two.people when they known each other Not loops or multiple graphs are used
Graph equation
G=(V,E) V IS NONempty set of vertices called nodes And E is a set of edges
26. Suppose that we have int values between 1 and 1000 in a binary search tree, and we search for value 363. Which of the following cannot be a sequence of values examined to obtain the value 363? (a) 2 252 401 398 330 363 (b) 399 387 219 266 382 381 278 363 (c) 3 923 220 911 244 898 258 362 363 (d) 4 924 278 347 621 299 392 358 363 (e) 5 925 202 910 245 363
(d) 4 924 278 347 621 299 392 358 363 363 is > 347, so everything to the right of 347 should be greater than 347
Backtracking
-Build a solution, adding one new decision per recursive call -When a dead end is found, undo most recent decision & try a new option -Useful for a very large solution space
Stack Data Type
-Collection of elements -Ordered -Can have Dupes -First in, last out -Operations: add, remove, peek
Heap Data Structure
-Complete binary tree -Root is greater in value than both of its children, children in arbitrary order -Operations: -Add -Remove -peek -Should be stored in an array that simulates a tree structure
Queue
A FIFO (First In First Out) data structure, where the first element added will be the first to be removed, and where a new element is added to the back, much like a waiting line.
What does overriding a method mean?
A subclass can redefine a method of a superclass - the overridden method has the same name, parameter list and return type. The keyword @Override is used to indicate to java that a method is overridden. Static methods of the superclass are not overridden.
Heap: advantage, disadvantage
Advantage: quick insert, quick delete, access to largest item Disadvantage: slow access to all other items
binary tree
An ordered tree in which every vertex has no more than two children, with each child designated as a left or right child. Potentially empty.
Graph Modeling
Analysis pattern. Describe the problem using a graph and solve it using an existing algorithm.
Bubble Sort
Compares adjacent numbers: exchanges if out of order. Requires several passes over the data. Largest item bubbles to end of array. Best: O(n) Worst: Omega(n^2)
Unordered Linked List
Data structure with non-efficently supported operations. Is unordered. Has a worst case cost of search and insertion at N, an average case cost of insertion at N, and an average case cost of searching at N/2.
Adjacency matrix
Graph data structure. a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. -In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory. -The adjacency matrix should be distinguished from the incidence matrix (a matrix that shows the relationship between two classes of objects) for a graph, a different matrix representation whose elements indicate whether vertex-edge pairs are incident or not.
2. A hash table has buckets 0 to 9 and uses a hash function of key % 10. If the table is initially empty and the following inserts are applied in the order shown, the insert of which item results in a collision?
HashInsert(hashTable, item 55) 55 % 10 = 5 collision HashInsert(hashTable, item 90) 90 % 10 = 0 HashInsert(hashTable, item 95) 95 % 10 = 5 collision
Directed multiple graphs
Have multiple directed edges from.vertex to scond vertex
Describe the Merge Sort?
It splits the array into 2 distinct arrays, which are sorted, then put back together into a single array in order. It will compare the current positions in each sub-array, placing the lowest value first of the pair.
Skip list*
List data structure. a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, each skipping over fewer elements. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally we are searching in the full sequence. The elements that are skipped over may be chosen probabilistically [2] or deterministically,[3] with the former being more common. -A schematic picture of the skip list data structure. Each box with an arrow represents a pointer and a row is a linked list giving a sparse subsequence; the numbered boxes at the bottom represent the ordered data sequence. Searching proceeds downwards from the sparsest subsequence at the top until consecutive elements bracketing the search element are found. -A skip list is built in layers. The bottom layer is an ordinary ordered linked list. Each higher layer acts as an "express lane" for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4). On average, each element appears in 1/(1-p) lists, and the tallest element (usually a special head element at the front of the skip list) in all the lists, {\displaystyle \log _{1/p}n\,} \log _{1/p}n\, of them. A search for a target element begins at the head element in the top list, and proceeds horizontally until the current element is greater than or equal to the target. If the current element is equal to the target, it has been found. If the current element is greater than the target, or the search reaches the end of the linked list, the procedure is repeated after returning to the previous element and dropping down vertically to the next lower list. The expected number of steps in each linked list is at most 1/p, which can be seen by tracing the search path backwards from the target until reaching an element that appears in the next higher list or reaching the beginning of the current list. Therefore, the total expected cost of a search is (log _{1/p}n)/p, which is O(log n) when p is a constant. By choosing different values of p, it is possible to trade search costs against storage costs.
Difference list
List data structure. may refer to one of two data structures for representing lists. One of these data structures contains two lists, and represents the difference of those two lists. -The second data structure is a functional representation of a list with an efficient concatenation operation. In the second approach, difference lists are implemented as single-argument functions, which take a list as argument and prepend to that list. As a consequence, concatenation of difference lists of the second type is implemented essentially as function composition, which is O(1). However, of course the list still has to be constructed eventually (assuming all of its elements are needed), which is plainly at least O(n). -A difference list of the second sort represents lists as a function f, which when given a list x, returns the list that f represents, prepended to x. It is typically used in functional programming languages such as Haskell, although it could be used in imperative languages as well. Whether this kind of difference list is more efficient than another list representations depends on usage patterns. If an algorithm builds a list by concatenating smaller lists, which are themselves built by concatenating still smaller lists, then use of difference lists can improve performance by effectively "flattening" the list building computations.
Inversions
Min: 0 Max: n(n-1)/2 Swapping removes 1 inversion
Heap Sort
Non-stable, in place sort which has an order of growth of NlogN. Requires only one spot of extra space. Works like an improved version of selection sort. It divides its input into a sorted and unsorted region, and iteratively shrinks the unsorted region by extracting the smallest element and moving it into the sorted region. It will make use of a heap structure instead of a linear time search to find the minimum.
Quick Sort
Non-stable, in place sort with an order of growth of NlogN. Needs lgN of extra space. It has a probabilistic guarantee. Works by making use of a divide and conquer method. The array is divided into two parts, and then the parts are sorted independently. An arbitrary value is chosen as the partition. Afterwards, all items which are larger than this value go to the right of it, and all items which are less than this value go to the left of it. We arbitrarily choose a[lo] as a partitioning item. Then we scan from the left end of the array one by one until we find an entry that is greater than a[lo]. At the same time, we are scanning from a[lo] to the right to find an entry that is less than or equal to a[lo]. Once we find these two values, we swap them.
Selection Sort
Non-stable, in place sort. Has an N-squared order of growth, needs only one spot of extra space. Works by searching the entire array for the smallest item, then exchanging it with the first item in the array. Repeats this process down the entire array until it is sorted.
O(1)
O(1) = happens once
Big O for Insertion sort?
O(n) for best (if everything started in order), O(n^2) otherwise.
Big O for Selection Sort?
O(n^2) - gotta go through all the elements for all the elements, approximately. Best and worst.
How many classes can one class inherit from?
ONE! In java a class can inherit only one class.
LinkedList
Objects called nodes linked together. Better for searching and needs better add and remove.
Polymorphism
Objects determine appropriate operations at execution time
What is an object?
Objects represent an entity in the real words. They have state, represented by data fields and behavior defined by methods.
LSD Radix Sort
Stable sort which sorts fixed length strings. Uses an axillary array, and therefore is not in place. Goes through to the last character of a string (its least significant digit), and takes its value. All strings given are then organized based on the value of their least significant digit. Following this, the algorithm proceeds to the next least significant digit, repeating the process until it has gone through the length of the strings. Best used for sorting things with fixed string lengths, like Social Security numbers or License Plates. Has a time complexity of O(n*k) where n is the number of keys and k is the average length of those keys.
Tree Sort
Stable, O(n log n), Ω(n log n) : Put everything in the tree, traverse in-order.
Merge Sort
Stable, O(n log n), Ω(n log n): Use recursion to split arrays in half repeatedly. An array with size 1 is already sorted.
Palindrome
String that reads the same forwards as backwards
What must needs be done when the array sizes get small for Quick Sort?
Switch out for another sorting algorithm. Insertion sort maybe.
What is indicated in a circular array when the frontIndex and backIndex + 1 are equal?
That either the circular array is empty or that it is full! This just makes sense, duh.
B-tree
That tree where you have like 5 keys in a node and 6 offshoots
Table Size(TS)
The Array's Length
Post-Order Traversal
The process of systematically visiting every node in a tree once, starting at the root and proceeding left down the tree, accessing the first node encountered at its "right" side, proceeding likewise along the tree, accessing each node as encountered at its "right" side.
Selection Sort
Unstable, O(n^2), Ω(n^2) : Iterates through every elements to ensure the list is sorted.
MSD Radix Sort
Used to sort an array of strings based on their first character. Is done recursively and can sort strings which are of different lengths. This algorithm will be slower than its counterpart if used for sets of strings which all have the same length. Has a time complexity of 2W(N+R).
Open addressing
Uses probes to find an open location to store data.
In an ADT bag implemented using an array when is the addition of an element NOT fast?
When the array must be resized.
Cycles
When there are at least two unique paths which connect vertices A and B, forming a loop or loops
Red Black Tree
Worst case height of 2log(n+1). The nodes are either red or black. The root is black. If a node is red, its children MUST BE BLACK. Every path from a node to a leaf must contain the same number of black nodes. New insertions will always be red and always left leaning. Insertions must satisfy the conditions that red nodes have black children and that they have the same number of black nodes in all paths. Time complexity on its operations are O(logN).
built-in type
a data type for which the programming language provides built-in support.
complete graph
a graph in which every vertex is directly connected to every other vertex
Dynamic data structure
a method of storing data where the amount of data stored will vary as the program is being run
Abstract data type (ADT)
a model of data structure that specifies the type of data stored, the operations supported on them, and the type of parameters of the operations.
Leaf
a node that does not have any other nodes beneath it
Bit field
array data structure. -can be distinguished (need verification) from a bit array in that the latter is used to store a large set of bits indexed by integers and is often wider than any integral type supported by the language. Bit fields, on the other hand, typically fit within a machine word, and the denotation of bits is independent of their numerical index. -Bit field can be used to reduce memory consumption when it is known that only some bits would be used for a variable. Bit fields allow efficient packaging of data in the memory. As we know, integer takes two bytes(16-bits) in memory. Some times we need to store value that takes less than 2-bytes. In such cases, there is wastages of memory. For example, if we use a variable temp to store value either 0 or 1. In this case only one bit of memory will be used rather than 16-bits. By using bit field, we can save lot of memory.
Abstract base class
class that has no objects and serves as a basis for a derived class. a class becomes this when one or more of its member functions are pure virtual functions.
priority queue
collection of elements, called values, each having an associated key that is provided at the time the element is inserted
A graph is undirected if
every edge in it is undirected.
d-Heap
exactly like a binary heap except that all nodes have d children
adjacency matrix
good for dense graphs; connectivity between two vertices can be tested quickly.
Adjacency list
good for sparse graphs; vertices adjacent to another vertex can be found quickly;
Algorithm
has input, produces output, definite, finite, operates on the data it is given
Ternary heap
heap data structure. 3-heap data structure. the nodes have 3 children.
Using a circular array to represent a queue what is the formula to determine the backIndex (or frontIndex) when you add or remove an item?
index = (index + 1) % queue.length;
Tree T
is a finite set of one or more nodes such that there is one designated node R, called the root of T.
ADT heap
isEmpty; getNumberOfNodes; getHeight; peekTop; add; remove; clear;
HashTable<K,V> & HashMap<K,V> class
java.util implements map<K,V> interface K-- type paramater for key and v-- type parameter for associated value Operations: lookup, insert, delete. Constructor lets you set init capacity and load factor handles collisions with chained buckets hash map only allows null for keys and values
rep of complete binary tree
left child: if exists [2*i+1]; right child: if exists [2*1+2]; parent: if exists [(i-1)/2]
Arrays data structure
linear data structure. a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key. An array is stored so that the position of each element can be computed from its index tuple by a mathematical formula.[1][2][3] The simplest type of data structure is a linear array, also called one-dimensional array.
linear recursion
makes only one recursive call. This step may have a test that decides which of several possible recursive steps to make, but should ultimately make only one
hash function
maps search key into unique location of hash table
Dynamic Programming
method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions
Dot product
multiplying two vectors together to produce a number
Spaghetti stack (cactus stack or saguaro stack)
multiway tree data structure. an in-tree or parent pointer tree is an N-ary tree data structure in which each node has a pointer to its parent node, but no pointers to child nodes. When used to implement a set of stacks, the structure is called a spaghetti stack, cactus stack or saguaro stack (after the saguaro, a kind of cactus).[1] Parent pointer trees are also used as disjoint-set data structures. -The structure can be regarded as a set of singly linked lists that share part of their structure, in particular, their tails. From any node, one can traverse to ancestors of the node, but not to any other node.
Quicksort
partitioning Best: O(n log n) (or O(n) three-way) Avg: O(n log n) Worst: O(n^2)
this pointer
predefined pointer available to class' member functions. always points to the instance of the class who's function is being called.
Tree Topology
A tree is widely used abstract data type (ADT) or data structure implementing this ADT that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.
Aggregate Data Types
Any type of data that can be referenced as a single entity, and yet consists of more than one piece of data, like strings, arrays, classes, and other complex structures.
Binary Tree Traversal
The process of systematically visiting every node in a tree once. The three most common traversals are: pre-order, in-order, and post-order.
Circular queue
The queue in this way called circular queue that the front and rear can continue to be moved without any boundary
Binary Search Tree
Binary tree where each node has a comparable key and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node's left subtree and smaller than the keys in all nodes in that node's right subtree
Important Sorting Assumptions
1.Sorting array of integers 2. Length of array is n 3.Sorting least to greatest 4.Can access array element in constant time 5.Compare ints in array only with '<' 6.Focus on # of comparisons
Tries
A collection of nodes, each of which can hold a key and a value- often the values will be null. The nodes will have a value attached to the last character of the string upon insertion, which apparently makes searching very easy. Very useful for searching keys.
Forest
A collection of one or more trees
Recursion
A common method of simplification is to divide a problem into subproblems of the same type. As a computer programming technique, this is called divide and conquer and is key to the design of many important algorithms. Divide and conquer serves as a top-down approach to problem solving, where problems are solved by solving smaller and smaller instances. A contrary approach is dynamic programming. This approach serves as a bottom-up approach, where problems are solved by solving larger and larger instances, until the desired size is reached. -Recursion in computer programming is exemplified when a function is defined in terms of simpler, often smaller versions of itself. The solution to the problem is then devised by combining the solutions obtained from the simpler versions of the problem.
Cycle
A cycle is a path (with at least one edge) whose first and last vertices are the same. A simple cycle is a cycle with no repeated edges or vertices (except the requisite repetition of the first and last vertices).
simple path
A simple path has the additional criteria that every vertex be unique, except possibly the first/last
Maximum Spanning Tree
A spanning tree of a weighted graph having maximum weight. It can be computed by negating the edges and running either Prim's or Kruskal's algorithms.
Describe a perfect hash function.
A perfect hash function maps each search key into a different integer that is suitable as an index to the hash table. A perfect hash function would require that unequal objects have distinct hash codes.
Permutations
A permutation is an ordered combination. Repetition is Allowed: such as the lock above. It could be "333". No Repetition: for example the first three people in a running race. You can't be first and second. https://www.mathsisfun.com/combinatorics/combinations-permutations.html
Merge Sort
Divide and conquer T(n) = 2T(n/2) + O(n) O(n*log(n))
Recursion
Powerful tool that breaks a problem into smaller problems that are, in some sense, identical to the original one. The smaller problems provide solution to the larger one.
Mergesort
Stable sort which is not in place. It has an order of growth of NlogN and requires N amount of extra space. Works by dividing an array in half continuously into smaller and smaller arrays. At the lowest level, these arrays are sorted and then merged together after sorting in the reverse order they were divided apart in.
Insertion Sort
Stable, in place sort with an order of growth which is between N and N-squared, needs only one spot of extra space and is dependent on the order of the items. Works by scanning over the list, then inserting the current item to the front of the list where it would fit sequentially. All the items to the left of the list will be sorted, but may not be in their final place as the larger items are continuously pushed back to make room for smaller items if necessary.
Breadth-first search
Visits the neighbor vertices before visiting the child vertices Often used to find the shortest path from one vertex to another. A queue is usually implemented
recursion
a method that is calling itself
Graphs
- Collection of points, aka vertices and edges which connect a pair of vertices
O-notation
asymptotic upper bound
L N R
in-order traversal
1 + 2 + ... + (n-1) = ?
n(n-1)/2
Trie (digital tree and sometimes radix tree or prefix tree)
tree data structure where each tree node compares a bit slice of key values. a kind of search tree -- an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree. -In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen as a tree-shaped deterministic finite automaton. Each finite language is generated by a trie automaton, and each trie can be compressed into a deterministic acyclic finite state automaton. -Though tries are usually keyed by character strings, they need not be. The same algorithms can be adapted to serve similar functions of ordered lists of any construct, e.g. permutations on a list of digits or shapes. In particular, a bitwise trie is keyed on the individual bits making up any fixed-length binary datum, such as an integer or memory address. -As discussed below, a trie has a number of advantages over binary search trees.[6] A trie can also be used to replace a hash table, over which it has the following advantages: 1. Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash. 2. There are no collisions of different keys in a trie. 3. Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value. 4. There is no need to provide a hash function or to change hash functions as more keys are added to a trie. 5. A trie can provide an alphabetical ordering of the entries by key. -Tries do have some drawbacks as well: 1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.[7] 2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless, a bitwise trie can handle standard IEEE single and double format floating point numbers. 3. Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables. Dictionary Representation -A common application of a trie is storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored. -Tries are also well suited for implementing approximate matching algorithms,[8] including those used in spell checking and hyphenation[4] software. Term Indexing -A discrimination tree term index stores its information in a trie data structure.
Binary tree
tree data structure.
Collision
when two keys hash to the same value
Backtracking
- going back to a previous decision point and making a different choice - e.g. N-Queens - a non-polynomial time problem
Bipartite
- graph that has two sets of vertices A and B, such that there are only edges between vertices in opposite sets
If we have UW-Madison student ID's, and we wanted the ideal hash functions, how would we do it, and why would there be a problem
-> We'd simply count each one as an index -> Hash table would be huge.
Private Access
-Accessible by the class
Protected Access
-Accessible by the class, package, and subclass
Public access
-Accessible by the class, package, subclass, and everyone else
List ADT
-Add(E) -Add(int, E) -Remove(int) -contains(E)
Collision Hashing using Buckets
-Each element can solre than one item. -throw collisions into a bucket. -buckets aren't sorted.
Binary Search Tree
-Each element to the left is less than its parent, each to the right is greater -Operations: -E add(E elem) -E remove(E elem) -E getEntry(E elem) - search method
Bubble Sort
-Iterate through array, comparing adjacent elements -Swap pairs based on relative order -Worst Case: O(n^2) -Best Case: O(n)
Static definition
-Only one of these variables for the class, NOT each instance of it
Merge Sort
-Splits array into two halves, locally sort them, then recombines in a copy array. Copies copy array into original array -O(nlogn)
Tail recursion
-The very last step inthe recursive case is the recursive call -Can be trivially transformed into an iterative algorithm
What are the 2 fundamental ways you can resolve hash collisions?
-Use another location in the hash table (open addressing) -Change the structure of the hash table so that each array location can represent more than one value (bucket hashing)
binary search algorithm
-binary search, also known as half-interval search[1] or logarithmic search,[2] is a search algorithm that finds the position of a target value within a sorted array. It compares the target value to the middle element of the array; if they are unequal, the half in which the target cannot lie is eliminated and the search continues on the remaining half until it is successful. -Binary search runs in at worst logarithmic time, making O(log n) comparisons, where n is the number of elements in the array and log is the binary logarithm; and using only constant (O(1)) space. Although specialized data structures designed for fast searching—such as hash tables—can be searched more efficiently, binary search applies to a wider range of search problems. -Although the idea is simple, implementing binary search correctly requires attention to some subtleties about its exit conditions and midpoint calculation. -There exist numerous variations of binary search. One variation in particular (fractional cascading) speeds up binary searches for the same value in multiple arrays.
Heap Parent/Child algebra relations
-leftChild(i) = 2i+1 -rightChild(i)=2i+2 -parent(i)=((i-1)/2)
HashCode Method:
-method of OBJECT class -Returns an int -default hash code is BAD-- computed from Object's memory address. --> must override
Pre Order Traversal
-outputs data -recurses on left child -recurses on right child
In order traversal
-recurses on left child -outputs data -recurses on right child
Post Order traversal
-recurses on left child -recurses on right child -outputs data
Good Hash Function qualities:
1. Must be deterministic: -> Key must ALWAYS generate the same Hash Index (excluding rehashing). 2. Must achieve uniformity -> Keys should be distributed evenly across hash table. 3. FAST/EASY to compute -> only use parts of the key that DISTINGUISH THE ITEMS FROM EACH OTHER 4. Minimize collisions:
Constructing a heap in linear time
1. Place the data into the heap's data set blindly. It will have the correct shape, but the dominance order will be incorrect. 2. Starting from the last (nth) position, walk backwards through the array until we encounter an internal node with children. 3. Perform bubble down n times. Explanation: heapify() takes time proportional to the height of the heaps it is merging. Most of these heaps are extremely small. In a full binary tree on n nodes, there are n/2 nodes that are leaves, n/4 nodes that are height 1, n/8 nodes that are height 2, and so on. In general, there are at most n/(2^(h+1) nodes of hieght h, so the cost of building the heap is <= 2n (see picture).
Combinations of binary tree traversal sequences that do not uniquely identify a tree
1. Postorder and preorder. 2. Preorder and level-order. 3. Postorder and level-order.
Post-Order Traversal
1. Process left child. 2. Process right child. 3. Process self.
In-Order Traversal
1. Process left child. 2. Process self. 3. Process right child.
Pre-Order Traversal
1. Process self. 2. Process left child. 3. Process right child.
Solving Divide-and-Conquer Recurrances
Case 1: Too many leaves. Case 2: Equal work per level. Case 3: Too expensive a root
Base Case
Case for which the solution can be stated non-recursively
Recursive case
Case for which the solution is expressed in terms of a smaller version of itself
Binomial Tree
Each of the heap-ordered trees
Define pre-order traversal of a binary tree?
In a preorder traversal, we visit the root before we visit the root's subtrees. We then visit all the nodes in the root's left subtree before we visit the nodes in the right subtree.
Trie
In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree.
Trees
In these data structures each tree node compares a bit slice of key values. Trie Radix tree Suffix tree Suffix array Compressed suffix array FM-index Generalised suffix tree B-trie Judy array X-fast trie Y-fast trie Merkle Tree Ctrie
Queue
a data structure where the first item added is the first item removed
Breadth-first search
Use a queue to search tree
Red-black Tree
Worst height: 2 log n
O(n)
happens for each element
27. Given the following array: [20, 3 , 4, 5 , 6, 1, 10, 15, 16, 2, 8] How does the array look like after one iteration of heapsort in ascending order? Assume the following: 1. One iteration of heapsort is after the removal of the first element in the heap 2. Assume that the removed element from the heap is inserted on a new array Long Explanation in handout
→ [16, 15, 10, 8, 6, 1 ,4, 3, 5, 2] <--
Operation on queue
Empty () true if key is empty , else false Enqueue (x) add an element x at the rear end of the queue Dequeue () remove an element from the front of the queue
Each edges of graph
Has either one or two vertices associated with it
Acquaintance ship graph of all people all.over the world
Have more than six billion vertices and more than trillions edges
Spatial locality
Close by in memory
Rabin-Karp
Compute hash codes of each substring whose length is the length of s, such as a function with the property that the hash code of a string is an additive function of each individual character. Get the hash code of a sliding window of characters and compare if the hash matches.
Depth First Search
A method which is used to traverse through a graph. Works by creating a stack of nodes to visit, which consist of all the nodes around your current position. You move to the next location and add the nodes surrounding it to the stack, making sure not to add any nodes you may have already visited. You repeat this pattern until you either reach the destination, or a dead end. At a dead end, you would backtrack to the last node which still has unvisited neighbors. Time complexity of |V|+|E|
simple path
A path in which all vertices are distinct
TreeMap complexity for iterating over associated values:
O(N)
AVL Tree Search (Average)
O(log(n))
B-Tree Access (Average)
O(log(n))
B-Tree Deletion (Average)
O(log(n))
Binary Search Tree Deletion (Average)
O(log(n))
Binary Search Tree Insertion (Average)
O(log(n))
Cartesian Tree Deletion (Average)
O(log(n))
Cartesian Tree Search (Average)
O(log(n))
Red-Black Tree Access (Average)
O(log(n))
Splay Tree Deletion (Average)
O(log(n))
Splay Tree Insertion (Average)
O(log(n))
Splay Tree Search (Average)
O(log(n))
Binary Search Tree Deletion (Worst)
O(n)
Cartesian Tree Deletion (Worst)
O(n)
Cartesian Tree Insertion (Worst)
O(n)
Cartesian Tree Space Complexity
O(n)
Red-Black Tree Space Complexity
O(n)
What are the goals of Quality Software?
1.It Works 2.It can be modified 3. It is reusable 4. Completed on time
Prim's Minimum Spanning Tree Algorithm
1) choose a vertex, add that to the MST 2) look at adjacent vertices of that vertex and track their distances to the tree 3) choose the closest vertex not in the MST and add it to the tree and repeat from 1 with that vertex
Linked List summary
1. Constant time to insert at or remove from the front. 2. With tail and doubly-linked, constant time to insert at or remove from the back. 3. O(n) time to find arbitrary element. 4. With doubly-linked list, constant time to insert between nodes or remove a node.
Red Black Tree: properties
1. Every node is either red or black 2. The root is black 3. Every leaf (NIL) is black 4. If a node is red, then both its children are black 5. All simple paths from node to child leaves contain the same # of black nodes
What are the Phases of Software Development
1. Specification of the task 2. Design of a solution 3. Implementation of the solution 4. Testing and debugging 5. Analysis of the solution 6. Maintenance and evolution of the system
Adjacency list
a data structure that stores a list of nodes with their adjacent nodes
Hash table
a data structure that stores key/value pairs based on an index calculate from an algorithm
basic type
a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
Multiple inheritance
a derived class has more than one base class
Path
a sequence of connected vertices
Java stream: definition
a sequence of data
path
a sequence of vertices that connect two nodes of a graph
Radix Sort
a series of consecutive bucket sorts, sorting one digit at a time
Non-deterministic Polynomial-time (NP)
a series of problems that appear similar enough that they could be converted back and forth between each other, and solved in the same way
convex hull
a spatial representation of the vector space between two vectors
Edges/Arcs
pair of vertices that are connected
if (u,v) is the last edge of the simple path from the root to vertex v, u is the _ of v
parent
Base class
parent class is inherited from
Rehashing
the process of growing the hash table size, and re-determine which index every item should have
Heap-order Property
the property that allows operations to be performed quickly
pivot
the randomly picked element in Quick-sort
Depth-first search
Visits the child vertices before visiting the sibling vertices A stack is usually implemented
Union-Find
(Disjoint-set data structure) keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two useful operations: find and union. Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its "representative"; by comparing the result of two Find operations, one can determine whether two elements are in the same subset. Union: Join two subsets into a single subset. In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.
Prim's Algorithm
(Minimum Spanning Trees, O(m + nlogn), where m is number of edges and n is the number of vertices) Starting from a vertex, grow the rest of the tree one edge at a time until all vertices are included. Greedily select the best local option from all available choices without regard to the global structure.
Kruskal's Algorithm
(Minimum Spanning Trees, O(mlogm) with a union find, which is fast for sparse graphs) Builds up connected components of vertices, repeatedly considering the lightest remaining edge and tests whether its two endpoints lie within the same connected component. If not, insert the edge and merge the two components into one.
20. Mark all properties that are TRUE for a hashtable with n elements? (a) an ideal hash table using array doubling has worst-case time complexity of O(1) for every insert operation (b) an ideal hash table using array doubling has average-case time complexity of O(1) for lookups (c) can be used to sort an array of n real numbers with average-case time complexity O(n) (d) it is possible to have different keys being hashed to the same position in the array
(b) an ideal hash table using array doubling has average-case time complexity of O(1) for lookups (d) it is possible to have different keys being hashed to the same position in the array
Ctrie (concurrent hash-trie)
*tree data structure where each tree node compares a bit slice of key values. *hash data structure. a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient.[3] It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots. |Advantages| Ctries have been shown to be comparable in performance with concurrent skip lists,[2][4] concurrent hash tables and similar data structures in terms of the lookup operation, being slightly slower than hash tables and faster than skip lists due to the lower level of indirections. However, they are far more scalable than most concurrent hash tables where the insertions are concerned.[1] Most concurrent hash tables are bad at conserving memory - when the keys are removed from the hash table, the underlying array is not shrunk. Ctries have the property that the allocated memory is always a function of only the current number of keys in the data-structure.[1] -Ctries have logarithmic complexity bounds of the basic operations, albeit with a low constant factor due to the high branching level (usually 32). -Ctries support a lock-free, linearizable, constant-time snapshot operation,[2] based on the insight obtained from persistent data structures. This is a breakthrough in concurrent data-structure design, since existing concurrent data-structures do not support snapshots. The snapshot operation allows implementing lock-free, linearizable iterator, size and clear operations - existing concurrent data-structures have implementations which either use global locks or are correct only given that there are no concurrent modifications to the data-structure. In particular, Ctries have an O(1) iterator creation operation, O(1) clear operation, O(1) duplicate operation and an amortized O(logn) size retrieval operation. |Problems| Most concurrent data structures require dynamic memory allocation, and lock-free concurrent data structures rely on garbage collection on most platforms. The current implementation[4] of the Ctrie is written for the JVM, where garbage collection is provided by the platform itself. While it's possible to keep a concurrent memory pool for the nodes shared by all instances of Ctries in an application or use reference counting to properly deallocate nodes, the only implementation so far to deal with manual memory management of nodes used in Ctries is the common-lisp implementation cl-ctrie, which implements several stop-and-copy and mark-and-sweep garbage collection techniques for persistent, memory-mapped storage. Hazard pointers are another possible solution for a correct manual management of removed nodes. Such a technique may be viable for managed environments as well, since it could lower the pressure on the GC. A Ctrie implementation in Rust makes use of hazard pointers for this purpose
Generics
- Use when we want to write a method that can handle any type of object -Can write in an interface, method, or class -Generics do not have covariance, so any object of type T is not also an object of its super classes.
Breadth First Search
- at starting vertex V(0) tries to find V(x) by examining all vertices that are closest to V(0) and then proceeding further away from the start - use a queue to track unprocessed vertices that we need search still ---starting with V(0), enqueue all unknown adjacent vertices of that vertex ---dequeue and repeat
Adjacency Matrix
- square matrix of vertices where the elements of the matrix describe the edges - vertices will be listed in sorted order - e.g. Binary Adjacency Matrix
Floyd's All Pairs Shortest Path
- use a reference vertex as a jumping point to see shorter paths - identify the costs but not the actual paths
Recursion
-Solve problem P by breaking it down into structurally identical sub problems. -Requirements: -Recursive Case -Base Case -Termination
19. We have a hash table of size 7 to store integer keys, with linear probing and a hash function h(x) = x mod 7 (x mod 7 return the remainder of the integer division with 7). Show the content of the hashtable after inserting the keys 0,11,3,7,1,9 in the given order.
0: 0 1: 7 b/c index 0 was taken 2: 1 b/c index 1 was taken 3: 3 4: 11 5: 9 b/c index 2 was taken and so was 3 and 4 6:
AVL Tree
A BST where the height of every node and that of its sibling differ by at most 1
Full Binary Tree
A full binary tree (sometimes referred to as a proper or plane binary tree) is a tree in which every node in the tree has either 0 or 2 children.
binary search tree
A binary tree with the property that for all parent nodes, the left subtree contains only values less than the parent, and the right subtree contains only values greater than the parent.
bipartite graph
A bipartite graph is a graph whose vertices we can divide into two sets such that all edges connect a vertex in one set with a vertex in the other set.
Bit Array
A bit array is a mapping from some domain (almost always a range of integers) to values in the set {0, 1}. The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed by applying an index to the array. Assuming its size (or length) to be n bits, the array can be used to specify a subset of the domain (e.g. {0, 1, 2, ..., n−1}), where a 1-bit indicates the presence and a 0-bit the absence of a number in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).
Fibonacci Heap
A data structure that is a collection of trees satisfying the minimum-heap property, that is, the key of a child is always greater than or equal to the key of the parent. This implies that the minimum key is always at the root of one of the trees. The trees do not have a prescribed shape and in the extreme case the heap can have every element in a separate tree. This flexibility allows some operations to be executed in a "lazy" manner, postponing the work for later operations. For example, merging heaps is done simply by concatenating the two lists of trees, and operation decrease key sometimes cuts a node from its parent and forms a new tree. For the Fibonacci heap, the find-minimum operation takes constant (O(1)) amortized time. The insert and decrease key operations also work in constant amortized time. Deleting an element (most often used in the special case of deleting the minimum element) works in O(log n) amortized time, where n is the size of the heap. This means that starting from an empty data structure, any sequence of a insert and decrease key operations and b delete operations would take O(a + b log n) worst case time, where n is the maximum heap size. In a binary or binomial heap such a sequence of operations would take O((a + b) log n) time. A Fibonacci heap is thus better than a binary or binomial heap when b is smaller than a by a non-constant factor. It is also possible to merge two Fibonacci heaps in constant amortized time, improving on the logarithmic merge time of a binomial heap, and improving on binary heaps which cannot handle merges efficiently. Using Fibonacci heaps for priority queues improves the asymptotic running time of important algorithms, such as Dijkstra's algorithm for computing the shortest path between two nodes in a graph, compared to the same algorithm using other slower priority queue data structures.
connected graph
A graph is connected if there is a path from every vertex to every other vertex.
connected graph
A graph such that for all vertices u and v, there exists a path from u to v.
forest
A graph that has no cycles, but not necessarily connected
DAG (Directed Acyclic Graph)
A graph which is directed and contains no cycles
Weighted Graph
A graph which places "costs" on the edges for traveling their path
Directed Graph/Digraph
A graph with edges directed from one vertex to another
acyclic graph
A graph with no cycles.
Topological Sort
A linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering.
directed path
A sequence of vertices (v1,v2,v3,...) such that v1->v2, v2->v3,v3->... for a directed graph.
14. A binary tree is constructed of nodes that are instances of the following class: public class Node { public int val; public Node left; public Node right; } Consider the following method: public static Node mystery(Node root) { if (root.right == null) return root; else return mystery(root.right); } You consult three supposedly tech-savvy consultants, and you get the following three opinions about what the method does when passed a reference to the root node of a binary tree: I. It returns the last node visited by an inorder traversal II. It returns the last node visited by a postorder traversal III. It returns the last node visited by a level-order traversal Which of these opinions is correct regardless of the contents of the tree? A. I only B. II only C. III only D. I and III E. II and III
A. I only
Temporal locality
Accessing something over a short period of time
Adjacency matrix: add vertex/edge, delete vertex/edge
Add vertex: O(|V|^2) Add edge: O(1) Delete vertex: O(|V|^2) Delete edge: O(1)
AVL Trees
Adelson-Velskii & Landis: Any pair of sibling nodes have a height difference of at most 1. On insertion, at most one rotation (single or double) is needed to restore balance. On removal, multiple rotations may be necessary.
ArrayLists: advantages, disadvantages
Advantage: advantages of an array, plus does not run out of space Disadvantage: inserting can be slower than an array
Graph: advantage, disadvantage
Advantage: best models real-world situations Disadvantage: can be slow and complex
Tries: advantage, disadvantage, memory
Advantage: faster search than a hash table, no collisions, no hash function needed, quick insert and delete Disadvantage: can take up more space than a hash table Memory: A LOT - need empty memory for every possibility
Array: advantage, disadvantage
Advantage: quick insert, quick access if index is known Disadvantage: slow search, slow delete, fixed size
Doubly Linked List: advantage, disadvantage
Advantage: quick insert, quick delete Disadvantage: slow search
Divide-and-conquer
Algorithm design patterns. Divide the problem into two or more smaller independent subproblems and solve the original problem using solutions to the subproblems.
Invariants
Algorithm design patterns. Identify an invariant and use it to rule out potential solutions that are suboptimal/dominated by other solutions.
Recursion
Algorithm design patterns. If the structure of the input is defined in a recursive manner, design a recursive algorithm that follows the input definition.
Sorting
Algorithm design patterns. Uncover some structure by sorting the input.
machine data type
All data in computers based on digital electronics is represented as bits (alternatives 0 and 1) on the lowest level. The smallest addressable unit of data is usually a group of bits called a byte (usually an octet, which is 8 bits). The unit processed by machine code instructions is called a word (as of 2011, typically 32 or 64 bits). Most instructions interpret the word as a binary number, such that a 32-bit word can represent unsigned integer values from 0 to (2^32) - 1 or signed integer values from -2^31 to (2^31) - 1. Because of two's complement, the machine language and machine doesn't need to distinguish between these unsigned and signed data types for the most part.
subtree of T rooted at v
All descendants of a vertex v
Proper ancestors of a vertex v in a tree
All vertices on the simple path from the root to v, but excluding v itself.
acyclic
An acyclic graph is a graph with no cycles.
Floyd-Warshall Algorithm
An algorithm for finding shortest paths in a weighted graph with positive or negative edge weights (but with no negative cycles). A single execution of the algorithm will find the lengths (summed weights) of the shortest paths between all pairs of vertices, though it does not return details of the paths themselves.
Dijkstra's Algorithm
An algorithm for finding the shortest paths between nodes in a weighted graph. For a given source node in the graph, the algorithm finds the shortest path between that node and every other. It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined. Its time complexity is O(E + VlogV), where E is the number of edges and V is the number of vertices.
Counting Sort
An algorithm for sorting a collection of objects according to keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently.[1][2][3] Because counting sort uses key values as indexes into an array, it is not a comparison sort, and the Ω(n log n) lower bound for comparison sorting does not apply to it.
Adjacency list
An array of linked lists. The array is |V| items long, with position i storing a pointer to the linked list of edges for vertex Vi.
Define in-order traversal of a binary tree?
An inorder traversal visits the root of a binary tree between visiting the nodes in the root's subtrees. In particular, it visits nodes in the following order: 1-Visit all the nodes in the root's left subtree 2-Visit the root 3-Visit all the nodes in the root's right subtree
Internal Sorting
An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory. For sorting larger datasets, it may be necessary to hold only a chunk of data in memory at a time, since it won't all fit. The rest of the data is normally held on some larger, but slower medium, like a hard-disk. Any reading or writing of data to and from this slower media can slow the sortation process considerably.
What is an aggregating object?
An object that contains another object is called the aggregating object. The object that it contains is called the aggregated object. e.g Student object has a name, which is a string object.
Connected
An undirected graph is __________ if there is at least one path from any vertex to another
Concrete Examples
Analysis pattern. Manually solve concrete instances of the problem and then build a general solution
Heuristics
Any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, stereotyping, profiling, or common sense
Log-structured merge-tree (or LSM tree)
Application-specific tree data structure. a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches. -One simple version of the LSM tree is a two-level LSM tree.[1] As described by Patrick O'Neil, a two-level LSM tree comprises two tree-like structures, called C0 and C1. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. The performance characteristics of LSM trees stem from the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort. -Most LSM trees used in practice employ multiple levels. Level 0 is kept in main memory, and might be represented using a tree. The on-disk data is organized into sorted runs of data. Each run contains data sorted by the index key. A run can be represented on disk as a single file, or alternatively as a collection of files with non-overlapping key ranges. To perform a query on a particular key to get its associated value, one must search in the Level 0 tree, as well as each run. -A particular key may appear in several runs, and what happens depends on the application. Some applications simply want the newest key-value pair with a given key. Some applications must combine the values in some way to get the proper aggregate value to return. For example, in Apache Cassandra, each value represents a row in a database, and different versions of the row may have different sets of columns.[2] -In order to keep down the cost of queries, the system must avoid a situation where there are too many runs. -Extensions to the 'levelled' method to incorporate B+ structures have been suggested, for example bLSM[3] and Diff-Index.[4] -LSM trees are used in database management systems such as BigTable, HBase, LevelDB, MongoDB, SQLite4, RocksDB, WiredTiger,[5] Apache Cassandra, and InfluxDB
Alternating decision tree (ADTree)
Application-specific tree data structure. a machine learning method for classification. It generalizes decision trees and has connections to boosting. -An ADTree consists of an alternation of decision nodes, which specify a predicate condition, and prediction nodes, which contain a single number. An instance is classified by an ADTree by following all paths for which all decision nodes are true, and summing any prediction nodes that are traversed. -An alternating decision tree consists of decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes contain a single number. ADTrees always have prediction nodes as both root and leaves. An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed. This is different from binary classification trees such as CART (Classification and regression tree) or C4.5 in which an instance follows only one path through the tree.
Finger tree
Application-specific tree data structure. a purely functional data structure used in efficiently implementing other functional data structures. A finger tree gives amortized constant time access to the "fingers" (leaves) of the tree, where data is stored, and also stores in each internal node the result of applying some associative operation to its descendants. This "summary" data stored in the internal nodes can be used to provide the functionality of data structures other than trees. For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. -Finger trees can provide amortized O(1) pushing, reversing, popping, O(log n) append and split; and can be adapted to be indexed or ordered sequences. And like all functional data structures, it is inherently persistent; that is, older versions of the tree are always preserved. -They have since been used in the Haskell core libraries (in the implementation of Data.Sequence), and an implementation in OCaml exists[1] which was derived from a proven-correct Coq specification;[2] and a C# implementation of finger trees was published in 2008; the Yi text editor specializes finger trees to finger strings for efficient storage of buffer text. Finger trees can be implemented with or without lazy evaluation,[3] but laziness allows for simpler implementations. -They were first published in 1977 by Leonidas J. Guibas,[4] and periodically refined since (e.g. a version using AVL trees,[5] non-lazy finger trees, simpler 2-3 finger trees,[6] B-Trees and so on)
Expectiminimax tree
Application-specific tree data structure. a specialized variation of a minimax game tree for use in artificial intelligence systems that play two-player zero-sum games such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" ("move by nature") nodes, which take the expected value of a random event occurring.[1] In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information. -In the traditional minimax method, the levels of the tree alternate from max to min until the depth limit of the tree has been reached. In an expectiminimax tree, the "chance" nodes are interleaved with the max and min nodes. Instead of taking the max or min of the utility values of their children, chance nodes take a weighted average, with the weight being the probability that that child is reached.[1] -The interleaving depends on the game. Each "turn" of the game is evaluated as a "max" node (representing the AI player's turn), a "min" node (representing a potentially-optimal opponent's turn), or a "chance" node (representing a random effect or player).[1] -For example, consider a game in which each round consists of a single dice throw, and then decisions made by first the AI player, and then another intelligent opponent. The order of nodes in this game would alternate between "chance", "max" and then "min"
Expression tree
Application-specific tree data structure. a specific kind of a binary tree used to represent expressions. Two common types of expressions that a binary expression tree can represent are algebraic[1] and boolean. These trees can represent expressions that contain both unary (an operation with only one operand, i.e. a single input) and binary (a function that takes two inputs) operators.[1] -Each node of a binary tree, and hence of a binary expression tree, has zero, one, or two children. This restricted structure simplifies the processing of expression trees. -The leaves of a binary expression tree are operands, such as constants or variable names, and the other nodes contain operators. These particular trees happen to be binary, because all of the operations are binary, and although this is the simplest case, it is possible for nodes to have more than two children. It is also possible for a node to have only one child, as is the case with the unary minus operator. An expression tree, T, can be evaluated by applying the operator at the root to the values obtained by recursively evaluating the left and right subtrees
Abstract syntax tree (AST, or just syntax tree)
Application-specific tree data structure. a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is "abstract" in not representing every detail appearing in the real syntax. For instance, grouping parentheses are implicit in the tree structure, and a syntactic construct like an if-condition-then expression may be denoted by means of a single node with three branches. -This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse trees, which are often[citation needed] built by a parser during the source code translation and compiling process. Once built, additional information is added to the AST by means of subsequent processing, e.g., contextual analysis. -Abstract syntax trees are also used in program analysis and program transformation systems.
Parse tree (or parsing tree or derivation tree or (concrete) syntax tree)
Application-specific tree data structure. an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax the term syntax tree is more common. Parse trees are distinct from the abstract syntax trees used in computer programming, in that their structure and elements more concretely reflect the syntax of the input language. They are also distinct from (although based on similar principles to) the sentence diagrams (such as Reed-Kellogg diagrams) sometimes used for grammar teaching in schools. -Parse trees are usually constructed based on either the constituency relation of constituency grammars (phrase structure grammars) or the dependency relation of dependency grammars. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages. -A related concept is that of phrase marker or P-marker, as used in transformational generative grammar. A phrase marker is a linguistic expression marked as to its phrase structure. This may be presented in the form of a tree, or as a bracketed expression. Phrase markers are generated by applying phrase structure rules, and themselves are subject to further transformational rules.
Red-Black Tree
BSTs having red and black links satisfying: -Red links lean left -No node has two links connected to it -The tree has perfect black balance: every path from the root to a null link has the same number of blacks
Double Checked Locking
Double-checked locking is a software design pattern used to reduce the overhead of acquiring a lock by first testing the locking criterion (the "lock hint") without actually acquiring the lock. Only if the locking criterion check indicates that locking is required does the actual locking logic proceed. (Often used in Singletons, and has issues in C++).
lists data structure
Doubly linked list Array list Linked list Self-organizing list Skip list Unrolled linked list VList Conc-Tree list Xor linked list Zipper Doubly connected edge list Difference list Free list
12. If the binary tree below is printed by a preorder traversal, what will the result be? See Handout A. 9 4 17 16 12 11 6 B. 9 17 6 4 16 22 12 C. 6 9 17 4 16 22 12 D. 6 17 22 9 4 16 12 E. 6 17 9 4 22 16 12
E. 6 17 9 4 22 16 12
Rehashing
Expanding the table: double table size, find closest prime number. Rehash each element for the new table size.
Dijksta's Algorithm
Finds the shortest path with no negative weights given a source vertex. Tries to find the distance to all other vertices in the graph. It produces a shortest paths tree by initializing the distance of all other nodes to infinity, and then relaxes these distances step by step by iteratively adding vertexes that dont already exist in the tree and have the lowest cost of distance to travel to them. Time complexity O(|E|+|V|log|V|)
What is the Big O complexity of Sequential Search (sorted or unsorted) vs. a binary search of a sorted array?
For sequential (sorted OR unsorted) best is O(1). Worst and average O(n). The binary search is O(log n).
Adjacency list
Graph data structure. a collection of unordered lists used to represent a finite graph. Each list describes the set of neighbors of a vertex in the graph. This is one of several commonly used representations of graphs for use in computer programs. |Data Structures| For use as a data structure, the main alternative to the adjacency list is the adjacency matrix. Because each entry in the adjacency matrix requires only one bit, it can be represented in a very compact way, occupying only |V|^2/8 bytes of contiguous space, where | V| is the number of vertices of the graph. Besides avoiding wasted space, this compactness encourages locality of reference. -However, for a sparse graph, adjacency lists require less space, because they do not waste any space to represent edges that are not present. Using a naïve array implementation on a 32-bit computer, an adjacency list for an undirected graph requires about 2(32/8)| E| = 8| E| bytes of space, where | E| is the number of edges of the graph. -Noting that an undirected simple graph can have at most | V|^2/2 edges, allowing loops, we can let d = |E|/| V|^2 denote the density of the graph. Then, 8| E | > | V |^2/8 when | E|/| V|^2 > 1/64, that is the adjacency list representation occupies more space than the adjacency matrix representation when d > 1/64. Thus a graph must be sparse enough to justify an adjacency list representation. -Besides the space trade-off, the different data structures also facilitate different operations. Finding all vertices adjacent to a given vertex in an adjacency list is as simple as reading the list. With an adjacency matrix, an entire row must instead be scanned, which takes O(| V |) time. Whether there is an edge between two given vertices can be determined at once with an adjacency matrix, while requiring time proportional to the minimum degree of the two vertices with the adjacency list.
Propositional directed acyclic graph (PDAG)
Graph data structure. a data structure that is used to represent a Boolean function. A Boolean function can be represented as a rooted, directed acyclic graph of the following form: 1.Leaves are labeled with [T] (true), [upsidedownT] (false), or a Boolean variable. 2.Non-leaves are [bigtriangleup] (logical and), [bigtriangledown with point on bottom] (logical or) and [Diamond] (logical not). 3.[bigtriangleup]- and [bigtriangledown]- nodes have at least one child. 4.[Diamond]- nodes have exactly one child. Leaves labeled with [T] ([upsidedownT]) represent the constant Boolean function which always evaluates to 1 (0). A leaf labeled with a Boolean variable x is interpreted as the assignment x=1, i.e. it represents the Boolean function which evaluates to 1 if and only if x=1. The Boolean function represented by a [bigtriangleup]-node is the one that evaluates to 1, if and only if the Boolean function of all its children evaluate to 1. Similarly, a [bigtriangledown]-node represents the Boolean function that evaluates to 1, if and only if the Boolean function of at least one child evaluates to 1. Finally, a [Diamond]-node represents the complemenatary Boolean function its child, i.e. the one that evaluates to 1, if and only if the Boolean function of its child evaluates to 0.
Binary decision diagram (BDD or branching program)
Graph data structure. a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression. Other data structures used to represent a Boolean function include negation normal form (NNF), and propositional directed acyclic graph (PDAG). |Definition| A Boolean function can be represented as a rooted, directed, acyclic graph, which consists of several decision nodes and terminal nodes. There are two types of terminal nodes called 0-terminal and 1-terminal. Each decision node N is labeled by Boolean variable V_N and has two child nodes called low child and high child. The edge from node V_N to a low (or high) child represents an assignment of V_N to 0 (resp. 1). Such a BDD is called 'ordered' if different variables appear in the same order on all paths from the root. A BDD is said to be 'reduced' if the following two rules have been applied to its graph: 1.Merge any isomorphic subgraphs. 2.Eliminate any node whose two children are isomorphic. -In popular usage, the term BDD almost always refers to Reduced Ordered Binary Decision Diagram (ROBDD in the literature, used when the ordering and reduction aspects need to be emphasized). The advantage of an ROBDD is that it is canonical (unique) for a particular function and variable order.[1] This property makes it useful in functional equivalence checking and other operations like functional technology mapping. -A path from the root node to the 1-terminal represents a (possibly partial) variable assignment for which the represented Boolean function is true. As the path descends to a low (or high) child from a node, then that node's variable is assigned to 0 (resp. 1).
Graph-structured stack
Graph data structure. a directed acyclic graph (DAG) where each directed path represents a stack. -The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown automaton. This allows the algorithm to encode the nondeterministic choices in parsing an ambiguous grammar, sometimes with greater efficiency. |Tomita's algorithm| -A GLR parser (GLR standing for "generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)") is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper[1] by Bernard Lang (along with other general Context-Free parsers such as GLL). It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work,[2] though in practice it is the second stage that is recognized as the GLR parser. -Though the algorithm has evolved since its original forms, the principles have remained intact. As shown by an earlier publication,[3] Lang was primarily interested in more easily used and more flexible parsers for extensible programming languages. Tomita's goal was to parse natural language text thoroughly and efficiently. Standard LR parsers cannot accommodate the nondeterministic and ambiguous nature of natural language, and the GLR algorithm can. |Advantages of GLR| Recognition using the GLR algorithm has the same worst-case time complexity as the CYK algorithm and Earley algorithm: O(n3).[citation needed] However, GLR carries two additional advantages: 1.The time required to run the algorithm is proportional to the degree of nondeterminism in the grammar: on deterministic grammars the GLR algorithm runs in O(n) time (this is not true of the Earley[citation needed] and CYK algorithms, but the original Earley algorithms can be modified to ensure it) 2.The GLR algorithm is "online" - that is, it consumes the input tokens in a specific order and performs as much work as possible after consuming each token. -In practice, the grammars of most programming languages are deterministic or "nearly deterministic", meaning that any nondeterminism is usually resolved within a small (though possibly unbounded) number of tokens. Compared to other algorithms capable of handling the full class of context-free grammars (such as Earley or CYK), the GLR algorithm gives better performance on these "nearly deterministic" grammars, because only a single stack will be active during the majority of the parsing process. -GLR can be combined with the LALR(1) algorithm, in a hybrid parser, allowing still higher performance
And-inverter graph (AIG)
Graph data structure. a directed, acyclic graph that represents a structural implementation of the logical functionality of a circuit or network. An AIG consists of two-input nodes representing logical conjunction, terminal nodes labeled with variable names, and edges optionally containing markers indicating logical negation. This representation of a logic function is rarely structurally efficient for large circuits, but is an efficient representation for manipulation of boolean functions. Typically, the abstract graph is represented as a data structure in software. -Conversion from the network of logic gates to AIGs is fast and scalable. It only requires that every gate be expressed in terms of AND gates and inverters. This conversion does not lead to unpredictable increase in memory use and runtime. This makes the AIG an efficient representation in comparison with either the binary decision diagram (BDD) or the "sum-of-product" (ΣoΠ) form,[citation needed] that is, the canonical form in Boolean algebra known as the disjunctive normal form (DNF). The BDD and DNF may also be viewed as circuits, but they involve formal constraints that deprive them of scalability. For example, ΣoΠs are circuits with at most two levels while BDDs are canonical, that is, they require that input variables be evaluated in the same order on all paths.
Scene graph
Graph data structure. a general data structure commonly used by vector-based graphics editing applications and modern computer games, which arranges the logical and often (but not necessarily) spatial representation of a graphical scene. Examples of such programs include Acrobat 3D, Adobe Illustrator, AutoCAD, CorelDRAW, OpenSceneGraph, OpenSG, VRML97, X3D, Hoops and Open Inventor. -A scene graph is a collection of nodes in a graph or tree structure. A tree node (in the overall tree structure of the scene graph) may have many children but often only a single parent, with the effect of a parent applied to all its child nodes; an operation performed on a group automatically propagates its effect to all of its members. In many programs, associating a geometrical transformation matrix (see also transformation and matrix) at each group level and concatenating such matrices together is an efficient and natural way to process such operations. A common feature, for instance, is the ability to group related shapes/objects into a compound object that can then be moved, transformed, selected, etc. as easily as a single object. -It also happens that in some scene graphs, a node can have a relation to any node including itself, or at least an extension that refers to another node (for instance Pixar's PhotoRealistic RenderMan because of its usage of Reyes rendering algorithm, or Adobe Systems's Acrobat 3D for advanced interactive manipulation). -The term scene graph is sometimes confused with Canvas (GUI), since some canvas implementations include scene graph functionality.
Hypergraph
Graph data structure. a generalization of a graph in which an edge can connect any number of vertices. Formally, a hypergraph {\displaystyle H} H is a pair {\displaystyle H=(X,E)} H = (X,E) where {\displaystyle X} X is a set of elements called nodes or vertices, and {\displaystyle E} E is a set of non-empty subsets of {\displaystyle X} X called hyperedges or edges. Therefore, {\displaystyle E} E is a subset of {\displaystyle {\mathcal {P}}(X)\setminus \{\emptyset \}} \mathcal{P}(X) \setminus\{\emptyset\}, where {\displaystyle {\mathcal {P}}(X)} {\mathcal {P}}(X) is the power set of {\displaystyle X} X. -While graph edges are pairs of nodes, hyperedges are arbitrary sets of nodes, and can therefore contain an arbitrary number of nodes. However, it is often desirable to study hypergraphs where all hyperedges have the same cardinality; a k-uniform hypergraph is a hypergraph such that all its hyperedges have size k. (In other words, one such hypergraph is a collection of sets, each such set a hyperedge connecting k nodes.) So a 2-uniform hypergraph is a graph, a 3-uniform hypergraph is a collection of unordered triples, and so on. -A hypergraph is also called a set system or a family of sets drawn from the universal set X. The difference between a set system and a hypergraph is in the questions being asked. Hypergraph theory tends to concern questions similar to those of graph theory, such as connectivity and colorability, while the theory of set systems tends to ask non-graph-theoretical questions, such as those of Sperner theory. -There are variant definitions; sometimes edges must not be empty, and sometimes multiple edges, with the same set of nodes, are allowed. -Hypergraphs can be viewed as incidence structures. In particular, there is a bipartite "incidence graph" or "Levi graph" corresponding to every hypergraph, and conversely, most, but not all, bipartite graphs can be regarded as incidence graphs of hypergraphs. -Hypergraphs have many other names. In computational geometry, a hypergraph may sometimes be called a range space and then the hyperedges are called ranges.[1] In cooperative game theory, hypergraphs are called simple games (voting games); this notion is applied to solve problems in social choice theory. In some literature edges are referred to as hyperlinks or connectors.[2] -Special kinds of hypergraphs include, besides k-uniform ones, clutters, where no edge appears as a subset of another edge; and abstract simplicial complexes, which contain all subsets of every edge. -The collection of hypergraphs is a category with hypergraph homomorphisms as morphisms.
Multigraph
Graph data structure. a graph which is permitted to have multiple edges (also called parallel edges[1]), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge. -There are two distinct notions of multiple edges: 1.Edges without own identity: The identity of an edge is defined solely by the two nodes it connects. In this case, the term "multiple edges" means that the same edge can occur several times between these two nodes. 2.Edges with own identity: Edges are primitive entities just like nodes. When multiple edges connect two nodes, these are different edges. A multigraph is different from a hypergraph, which is a graph in which an edge can connect any number of nodes, not just two. -For some authors, the terms pseudograph and multigraph are synonymous. For others, a pseudograph is a multigraph with loops.
Directed graph (or digraph)
Graph data structure. a graph, or set of vertices connected by edges, where the edges have a direction associated with them. In formal terms, a directed graph is an ordered pair G = (V, A) (sometimes G = (V, E)) where[1] 1.V is a set whose elements are called vertices, nodes, or points; 2.A is a set of ordered pairs of vertices, called arrows, directed edges (sometimes simply edges with the corresponding set named E instead of A), directed arcs, or directed lines. -It differs from an ordinary or undirected graph, in that the latter is defined in terms of unordered pairs of vertices, which are usually called edges, arcs, or lines. -A directed graph is called a simple digraph if it has no multiple arrows (two or more edges that connect the same two vertices in the same direction) and no loops (edges that connect vertices to themselves). A directed graph is called a directed multigraph or multidigraph if it may have multiple arrows (and sometimes loops). In the latter case the arrow set forms a multiset, rather than a set, of ordered pairs of vertices.
Zero-suppressed decision diagram (ZSDD or ZDD)
Graph data structure. a type of binary decision diagram (BDD) where instead of nodes being introduced when the positive and the negative part are different, they are introduced when positive part is different from constant 0. A zero-suppressed decision diagram is also commonly referred to as a zero-suppressed binary decision diagram (ZBDD). -They are useful when dealing with functions that are almost everywhere 0. -In a 2011 talk "All Questions Answered",[1] Donald Knuth referred to ZDD as the most beautiful construct in computer science. -In The Art of Computer Programming, volume 4, Knuth introduces his Simpath algorithm for constructing a ZDD representing all simple paths between two vertices in a graph.
Distributed hash table (DHT)
Hash data structure. a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. -DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative Web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox instant messenger, Freenet and the YaCy search engine.
Double Hashing
Hash data structure. a computer programming technique used in hash tables to resolve hash collisions (a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest. Collisions are unavoidable whenever members of a very large set (such as all possible person names, or all possible computer files) are mapped to a relatively short bit string. This is merely an instance of the pigeonhole principle, which states that if n items are put into m containers, with n > m, then at least one container must contain more than one item), in cases when two different values to be searched for produce the same hash key. It is a popular collision-resolution technique in open-addressed hash tables. Double hashing is implemented in many popular libraries. -Like linear probing, it uses one hash value as a starting point and then repeatedly steps forward an interval until the desired value is located, an empty location is reached, or the entire table has been searched; but this interval is decided using a second, independent hash function (hence the name double hashing). Unlike linear probing and quadratic probing, the interval depends on the data, so that even values mapping to the same location have different bucket sequences; this minimizes repeated collisions and the effects of clustering. -Given two randomly, uniformly, and independently selected hash functions h_{1} and h_{2}, the ith location in the bucket sequence for value k in a hash table T is: h(i,k)=(h_1(k) + i * h_2(k)) mod |T|. Generally, h_{1} and h_{2} are selected from a set of universal hash functions.
Hash table (or hash map)*
Hash data structure. a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. -Ideally, the hash function will assign each key to a unique bucket, but it is possible that two keys will generate an identical hash causing both keys to point to the same bucket. Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way. -In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key-value pairs, at (amortized[2]) constant average cost per operation. -In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.
Prefix hash tree (PHT)
Hash data structure. a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). The prefix hash tree uses the lookup interface of a DHT to construct a trie-based data structure that is both efficient (updates are doubly logarithmic in the size of the domain being indexed), and resilient (the failure of any given node in a prefix hash tree does not affect the availability of data stored at other nodes).
Bloom filter
Hash data structure. a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter). The more elements that are added to the set, the larger the probability of false positives. Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses, an 85-15 form of the Pareto principle.[1] More generally, fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set.
MinHash (or the min-wise independent permutations locality sensitive hashing scheme)
Hash data structure. a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words. -The Jaccard similarity coefficient is a commonly used indicator of the similarity between two sets. |Applications| The original applications for MinHash involved clustering and eliminating near-duplicates among web documents, represented as sets of the words occurring in those documents.[1][2] Similar techniques have also been used for clustering and near-duplicate elimination for other types of data, such as images: in the case of image data, an image can be represented as a set of smaller subimages cropped from it, or as sets of more complex image feature descriptions.[6] -In data mining, Cohen et al. (2001) use MinHash as a tool for association rule learning. Given a database in which each entry has multiple attributes (viewed as a 0-1 matrix with a row per database entry and a column per attribute) they use MinHash-based approximations to the Jaccard index to identify candidate pairs of attributes that frequently co-occur, and then compute the exact value of the index for only those pairs to determine the ones whose frequencies of co-occurrence are below a given strict threshold.
Hash array mapped trie (HAMT)
Hash data structure. an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie.[1] It is a refined version of the more general notion of a hash tree. |Operation| A HAMT is an array mapped trie where the keys are first hashed in order to ensure an even distribution of keys and a constant key length. -In a typical implementation of HAMT's array mapped trie, each node contains a table with some fixed number N of slots with each slot containing either a nil pointer or a pointer to another node. N is commonly 32. As allocating space for N pointers for each node would be expensive, each node instead contains a bitmap which is N bits long where each bit indicates the presence of a non-nil pointer. This is followed by an array of pointers equal in length to the number of ones in the bitmap, (its Hamming weight). |Advantages of HAMTs| The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. Also, a hash table may have to be periodically resized, an expensive operation, whereas HAMTs grow dynamically. Generally, HAMT performance is improved by a larger root table with some multiple of N slots; some HAMT variants allow the root to grow lazily[1] with negligible impact on performance.
Quotient filter
Hash data structure. introduced by Bender et al. in 2011. the quotient filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set (an approximate member query filter, AMQ). A query will elicit a reply specifying either that the element is definitely not in the set or that the element is probably in the set. The former result is definitive; i.e., the test does not generate false negatives. But with the latter result there is some probability, ε, of the test returning "element is in the set" when in fact the element is not present in the set (i.e., a false positive). There is a tradeoff between ε, the false positive rate, and storage size; increasing the filter's storage size reduces ε. Other AMQ operations include "insert" and "optionally delete". The more elements are added to the set, the larger the probability of false positives. -An approximate member query (AMQ) filter used to speed up answers in a key-value storage system. Key-value pairs are stored on a disk which has slow access times. AMQ filter decisions are much faster. However some unnecessary disk accesses are made when the filter reports a positive (in order to weed out the false positives). Overall answer speed is better with the AMQ filter than without it. Use of an AMQ filter for this purpose, however, does increase memory usage. A typical application for quotient filters, and other AMQ filters, is to serve as a proxy for the keys in a database on disk. As keys are added to or removed from the database, the filter is updated to reflect this. Any lookup will first consult the fast quotient filter, then look in the (presumably much slower) database only if the quotient filter reported the presence of the key. If the filter returns absence, the key is known not to be in the database without any disk accesses having been performed. -A quotient filter has the usual AMQ operations of insert and query. In addition it can also be merged and re-sized without having to re-hash the original keys (thereby avoiding the need to access those keys from secondary storage). This property benefits certain kinds of log-structured merge-trees.
Hash list
Hash data structure. typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed hash tables). This article covers hash lists that are used to guarantee data integrity. -A hash list is an extension of the old concept of hashing an item (for instance, a file). A hash list is usually sufficient for most needs, but a more advanced form of the concept is a hash tree. -Hash lists can be used to protect any kind of data stored, handled and transferred in and between computers. An important use of hash lists is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and to check that the other peers do not "lie" and send fake blocks. -Usually a cryptographic hash function such as SHA-256 is used for the hashing. If the hash list only needs to protect against unintentional damage less secure checksums such as CRCs can be used. -Hash lists are better than a simple hash of the entire file since, in the case of a data block being damaged, this is noticed, and only the damaged block needs to be redownloaded. With only a hash of the file, many undamaged blocks would have to be redownloaded, and the file reconstructed and tested until the correct hash of the entire file is obtained. Hash lists also protect against nodes that try to sabotage by sending fake blocks, since in such a case the damaged block can be acquired from some other source.
Key
Information in items that is used to determine where the item goes into the table.
What is Inheritance?
Inheritance is defining new classes from existing ones. The specialized class "inherits" from the general class. The class that inherits is called the subclass and the class that is inherited from is called the superclass. The keyword "extends" is used to indicate inheritance.
Single Source Shortest Path
Input: Graph and starting vertex. Output: shortest path to all points. Unweighted: BFS Weighted: Dijkstra's Method
Shellsort
Insertion sort over a gap Best: O(n log n) Avg: depends on gap sequence Worst: O(n^2)
Why do you need to override the default hashCode function of Object?
It uses a memory location, and therefore will not necessarily assign two equivalent (different memory locations) objects the same hash. To be useful as a dictionary implementation, hashing must map equal objects into the same location in a hash table.
B-Trees
Items are stored in leaves. The root is either a leaf, or it will have between two and M children. All non-leaf nodes will have between M/2 and M children. All leaves will be at the same depth and store between L/2 and L data values where we are free to choose L. Useful for data storage, searching a database/sorted files. Time complexity of logN.
Stack: definition
Last in, first out.
Topological Sort
Linear ordering of the vertices of a directed graph such that for every directed edge "uv" which connects "u" to "v" (u points to v), u comes before v. This ordering is only possible if and only if there are no directed cycles in the graph, therefore, it must be a DAG.
Doubly connected edge list (DCEL, also known as half-edge data structure)*
List data structure. a data structure to represent an embedding of a planar graph in the plane, and polytopes in 3D -This data structure provides efficient manipulation of the topological information associated with the objects in question (vertices, edges, faces). It is used in many algorithms of computational geometry to handle polygonal subdivisions of the plane, commonly called planar straight-line graphs (PSLG).[1] For example, a Voronoi diagram is commonly represented by a DCEL inside a bounding box. -This data structure was originally suggested by Muller and Preparata[2] for representations of 3D convex polyhedra. -Later a somewhat different data structuring was suggested, but the name "DCEL" was retained. -For simplicity, only connected graphs are considered, however the DCEL structure may be extended to handle disconnected graphs as well. -DCEL is more than just a doubly linked list of edges. In the general case, a DCEL contains a record for each edge, vertex and face of the subdivision. Each record may contain additional information, for example, a face may contain the name of the area. Each edge usually bounds two faces and it is therefore convenient to regard each edge as two half-edges. Each half-edge bounds a single face and thus has a pointer to that face. A half-edge has a pointer to the next half-edge and previous half-edge of the same face. To reach the other face, we can go to the twin of the half-edge and then traverse the other face. Each half-edge also has a pointer to its origin vertex (the destination vertex can be obtained by querying the origin of its twin, or of the next half-edge). -Each vertex contains the coordinates of the vertex and also stores a pointer to an arbitrary edge that has the vertex as its origin. Each face stores a pointer to some half-edge of its outer boundary (if the face is unbounded then pointer is null). It also has a list of half-edges, one for each hole that may be incident within the face. If the vertices or faces do not hold any interesting information, there is no need to store them, thus saving space and reducing the data structure's complexity.
Free list
List data structure. a data structure used in a scheme for dynamic memory allocation. It operates by connecting unallocated regions of memory together in a linked list, using the first word of each unallocated region as a pointer to the next. It is most suitable for allocating from a memory pool, where all objects have the same size. -Free lists make the allocation and deallocation operations very simple. To free a region, one would just link it to the free list. To allocate a region, one would simply remove a single region from the end of the free list and use it. If the regions are variable-sized, one may have to search for a region of large enough size, which can be expensive. -Free lists have the disadvantage, inherited from linked lists, of poor locality of reference and so poor data cache utilization, and they do not automatically consolidate adjacent regions to fulfill allocation requests for large regions, unlike the buddy allocation system. Nevertheless, they're still useful in a variety of simple applications where a full-blown memory allocator is unnecessary or requires too much overhead.
Conc-Tree list*
List data structure. a data-structure that stores element sequences, and provides amortized O(1) time append and prepend operations, O(log n) time insert and remove operations and O(log n) time concatenation. This data structure is particularly viable for functional task-parallel and data-parallel programming, and is relatively simple to implement compared to other data-structures with similar asymptotic complexity.[1] Conc-Trees were designed to improve efficiency of data-parallel operations that do not depend on the iteration order,[3] and improve constant factors in these operations by avoiding unnecessary copies of the data.[2] Orthogonally, they are used to efficiently aggregate data in functional-style task-parallel algorithms, as an implementation of the conc-list data abstraction.[4] Conc-list is a parallel programming counterpart to functional cons-lists, and was originally introduced by the Fortress language Fortress (programming language).
Linked list*
List data structure. a linear collection of data elements, called nodes, pointing to the next node by means of a pointer. It is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing efficient insertion or removal from arbitrary element references. -Linked lists are among the simplest and most common data structures. They can be used to implement several other common abstract data types, including lists (the abstract data type), stacks, queues, associative arrays, and S-expressions, though it is not uncommon to implement the other data structures directly without using a list as the basis of implementation. -The principal benefit of a linked list over a conventional array is that the list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the data items need not be stored contiguously in memory or on disk, while an array has to be declared in the source code, before compiling and running the program. Linked lists allow insertion and removal of nodes at any point in the list, and can do so with a constant number of operations if the link previous to the link being added or removed is maintained during list traversal. -On the other hand, simple linked lists by themselves do not allow random access to the data, or any form of efficient indexing. Thus, many basic operations — such as obtaining the last node of the list (assuming that the last node is not maintained as separate node reference in the list structure), or finding a node that contains a given datum, or locating the place where a new node should be inserted — may require sequential scanning of most or all of the list elements. The advantages and disadvantages of using linked lists are given below.
Doubly linked list*
List data structure. a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. -The two node links allow traversal of the list in either direction. While adding or removing a node in a doubly linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. The concept is also the basis for the mnemonic link system memorization technique.
VList
List data structure. a persistent data structure designed by Phil Bagwell in 2002 that combines the fast indexing of arrays with the easy extension of cons-based (or singly linked) linked lists. -Like arrays, VLists have constant-time lookup on average and are highly compact, requiring only O(log n) storage for pointers, allowing them to take advantage of locality of reference. Like singly linked or cons-based lists, they are persistent, and elements can be added to or removed from the front in constant time. Length can also be found in O(log n) time. -The primary operations of a VList are: 1) Locate the kth element (O(1) average, O(log n) worst-case) 2) Add an element to the front of the VList (O(1) average, with an occasional allocation) 3) Obtain a new array beginning at the second element of an old array (O(1)) 4) Compute the length of the list (O(log n)) -The primary advantage VLists have over arrays is that different updated versions of the VList automatically share structure. Because VLists are immutable, they are most useful in functional programming languages, where their efficiency allows a purely functional implementation of data structures traditionally thought to require mutable arrays, such as hash tables. -However, VLists also have a number of disadvantages over their competitors: ...While immutability is a benefit, it is also a drawback, making it inefficient to modify elements in the middle of the array. ...Access near the end of the list can be as expensive as O(log n); it is only constant on average over all elements. This is still, however, much better than performing the same operation on cons-based lists. ...Wasted space in the first block is proportional to n. This is similar to linked lists, but there are data structures with less overhead. When used as a fully persistent data structure, the overhead may be considerably higher and this data structure may not be appropriate. -VList may be modified to support the implementation of a growable array. In the application of a growable array, immutability is no longer required. Instead of growing at the beginning of the list, the ordering interpretation is reversed to allow growing at the end of the array.
Zipper
List data structure. a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997.[1] It includes and generalizes the gap buffer technique sometimes used with arrays. -The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation. -A layman's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often cd ..), and the possibility to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes the zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure). -Uses: -The zipper is often used where there is some concept of focus or of moving around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner. -The zipper has been used in ...Xmonad, to manage focus and placement of windows ...Huet's papers cover a structural editor based on zippers and a theorem prover ...A filesystem (ZipperFS) written in Haskell offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references." ...Clojure has extensive support for zippers.
Unrolled linked list*
List data structure. a variation on the linked list which stores multiple elements in each node. It can dramatically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references. It is related to the B-tree.
Doubly Linked List: memory
Memory: O(3n) (LL: O 2n)
Heap Binary Tree: memory
Memory: O(n)
Red Black Tree: memory
Memory: O(n)
Adjacency list: memory
Memory: O(|V|+|E|)
Divide and Conquer (Recursive)
Merge/Quick Sorting
Kruskal's Algorithm
MST Builder/Greedy Algorithm which works by taking edges in order of their weight values, continuously adding edges to the tree if their addition doesn't create a cycle. Is generally slower than the other prominent Greedy Algorithm due to its need to check whether or not an edge is part of a cycle at each phase. Time complexity ElogE
Heap Binary Tree: max-heapify, build-max-heap, heap-sort
Max-heapify: O(n) Build-max-heap: O(n) Heap-sort: O(n lgn)
Binary Search Tree: max, min, successor, predecessor
Max: O(h) Min: O(h) Successor: O(h) Predecessor: O(h)
Red Black Tree: max, min, successor, predecessory
Max: O(lg n) Min: O(lg n) Successor: O(lg n) Predecessor: O(lg n)
Growth of functions
Measure of algorithm's time requirement as a function of problem size. Demonstrates contrast in growth rates.
edges in complete undirected graph
N*(N-1)/2 = O(N^2)
Disjoint Sets
Never allowed to break apart sets. Also known as Union/Find Algorithm. Each node has a parent pointer which points to a representative for each set
Lower Bound on the complexity of pairwise comparisons
No compare based sorting algorithm can have fewer than ~NlogN compares
ShellSort
Non-stable, in place sort with an order of growth which is undetermined, though usually given at being N-to-the 6/5. Needs only one spot of extra space. Works as an extension of insertion sort. It gains speed by allowing exchanges of entries which are far apart, producing partially sorted arrays which are eventually sorted quickly at the end with an insertion sort. The idea is to rearrange the array so that every h-th entry yields a sorted sequence. The array is h-sorted.
Load Factor (LF)
Number of items/Table size. For instance, a load factor of 1 = 100% of the items are used.
What is the Big O value of the Radix Sort?
O((d)(n)), but since d (the number of possible digits) is very small it is O(n).
HashMap complexity of basic operations:
O(1)
What is the best big O value of a hash search of a dictionary?
O(1)
What is the worst case time complexity for: Insert, lookup, and delete, for hash functions?
O(1)
Quad-edge
Other data structure. a computer representation of the topology of a two-dimensional or three-dimensional map, that is, a graph drawn on a (closed) surface. The quad-edge data structure: 1.represents simultaneously both the map, its dual and mirror image. 2.can represent the most general form of a map, admitting vertices and faces of degree 1 and 2. 3.is a variant of the earlier winged edge data structure. -The fundamental idea behind the quad-edge structure is the recognition that a single edge, in a closed polygonal mesh topology, sits between exactly two faces and exactly two vertices. Thus, it can represent a dual of the graph simply by reversing the convention on what is a vertex and what is a face. |Uses| Much like Winged Edge, quad-edge structures are used in programs to store the topology of a 2D or 3D polygonal mesh. The mesh itself does not need to be closed in order to form a valid quad-edge structure. -Using a quad-edge structure, iterating through the topology is quite easy. Often, the interface to quad-edge topologies is through directed edges. This allows the two vertices to have explicit names (start and end), and this gives faces explicit names as well (left and right, relative to a person standing on start and looking in the direction of end). The four edges are also given names, based on the vertices and faces: start-left, start-right, end-left, and end-right. A directed edge can be reversed to generate the edge in the opposite direction. -Iterating around a particular face only requires having a single directed edge to which that face is on the left (by convention) and then walking through all of the start-left edges until the original edge is reached.
Symbol table
Other data structure. a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source. |Implementation| A common implementation technique is to use a hash table. A compiler may use one large symbol table for all symbols or use separated, hierarchical symbol tables for different scopes. There are also trees, linear lists and self-organizing lists which can be used to implement a symbol table. It also simplifies the classification of literals in tabular format. The symbol table is accessed by most phases of a compiler, beginning with the lexical analysis to optimization. |Uses| An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references. -A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program. -While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program. -At that time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.
Lightmap
Other data structure. a data structure used in lightmapping, a form of surface caching in which the brightness of surfaces in a virtual scene is pre-calculated and stored in texture maps for later use. Lightmaps are most commonly applied to static objects in realtime 3d graphics applications, such as video games, in order to provide lighting effects such as global illumination at a relatively low computational cost.
Routing table (routing information base or RIB)
Other data structure. in computer networking, a routing table is a data table stored in a router or a networked computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it. The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology "discovery" procedure.
Adjacency matrix: query for adjacency
Query for adjacency: O(1)
Rate of growth
Rate at which running time increases as a function of input
Single Rotation
Rotation preserves order. Inner children become the child of the node which was replaced.
Depth First Search
Runs in time equal to the size of the graph, can determine if a graph has a cycle.
Prim's
Same overall algorithm as Dijkstra's except that it only considers lowest cost of single edge. Continually builds onto a tree with the cheapest cost edges.
What is a constructor?
Special method that is used to construct an object. Classes can have multiple constructors.
Kruskal's
Takes edges in sorted order by cost, creates many trees which join into one large tree.
Children (of trees)
The Ti/subtrees of root node R. They are ordered by indices T0 < T1 < ... <Tn-1. T0 is the leftmost child. Tn-1 is the rightmost child.
Path
The edge, or link between two vertices
Describe the various components of trees?
The nodes at each successive level of a tree are the children of the nodes at the previous level. A node that has children is the parent of those children. Node A is the parent of nodes B, C, D, and E. Since these children have the same parent, they are called siblings. They also are the descendants of node A, and node A is their ancestor. Furthermore, node P is a descendant of A, and A is an ancestor of P. Notice that node P has no children. Such a node is called a leaf. A node that is not a leaf—that is, one that has children—is called either an interior node or a nonleaf. Such a node is also a parent.
Pre-Order Traversal
The process of systematically visiting every node in a tree once, starting with the root node, proceeding to the left along the tree and accessing the node when the "left" side of the node is encountered.
Heap-sort: time complexity
The running time of BuildHeap is O(n log n) since we call SiftDown for O(n) nodes. However, If a node is already close to the leaves, then sifting it down is fast -> BuildHeap is O(n) PartialSorting(A[1 . . . n], k) BuildHeap(A) for i from 1 to k: ExtractMax() O(n + k log n)
Chromatic Number
The smallest number of colors needed for an edge coloring of a graph
Double Hashing
This method uses a second hash function to attempt to generate a new valid location for the data
Directed Acyclic Graph
Thus a directed graphed without cycles
Parent pointer implementation of trees
To store for each node only a pointer to that node's parent. "Given 2 nodes, are they in the same tree?"
Sequential
Tree Implementation that has the advantage of saving space because no pointers are stored.
Adjacent
Two vertices are __________ if they are joined by an edge.
Type erasure
Type erasure is any technique in which a single type can be used to represent a wide variety of types that share a common interface. In the C++ lands, the term type-erasure is strongly associated with the particular technique that uses templates in the interface and dynamic polymorphism in the implementation. 1. A union is the simplest form of type erasure. - It is bounded, and all participating types have to be mentioned at the point of declaration. 2. A void pointer is a low-level form of type erasure. Functionality is provided by pointers to functions that operate on void* after casting it back to the appropriate type. - It is unbounded, but type unsafe. 3. Virtual functions offer a type safe form of type erasure. The underlying void and function pointers are generated by the compiler. - It is unbounded, but intrusive. - Has reference semantics. 4. A template based form of type erasure provides a natural C++ interface. The implementation is built on top of dynamic polymorphism. - It is unbounded and unintrusive. - Has value semantics.
What is polymorphism?
Variable of a superclass can refer to a variable of a subclass. Person p = new Student ( ) is allowed but Student s = new Person ( ) isn't. This is because a student is a person but a person isn't necessarily a student.
What is Separate Chaining?
When a hash table references a bucket that is a linked chain. These must have the specific keys in addition to their values.
Binary Search Tree
Will have a best case high of lgN. This is also its expected height. In the worst case, it will have a height of N, and thus become similar to a linked list. Works by inserting nodes of lesser values to the left of a node, and inserting greater values to the right of the node, traversing down the tree until we reach a blank spot to insert. Has a worst case cost of N to search and insert node. The average case of searching will be 1.39lgN compares
primitive data types
a basic type or a built-in type. In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is not basic should be considered "primitive". Boolean, true or false Character Floating-point, single-precision real number values Double, a wider floating-point size Integer, integral or fixed-precision values Enumerated type, a small set of uniquely-named values
Complete Binary Tree
a binary tree that is completely filled, with possible exception of bottom level, which is filled from left to right
lists
a collection S of elements stored in a certain linear order
Stack frame
a collection of data about a subroutine call
list
a collection of data items arranged in a certain linear order
Binomial Queues
a collection of heap-ordered trees
function
a collection of statements to perform a task
Data structure
a common format for storing large volumes of related data, which is an implementation of an abstract data type
Hash table
a fixed size array that provide nearly constant time to search
friend function or class
a function or class that is not a member of class, but has access to private members of the class
minimum spanning tree
a general tree structure, where every node is connected by the minimum number of edges
Weighted graph
a graph that has a data value labelled on each edge
Directed graph
a graph where the relationship between vertices is one-way
Undirected graph
a graph where the relationship between vertices is two-way
abstract data type (ADT)
a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. a data type is defined by its behavior (semantics), specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. Container List Associative array Multimap Set Multiset Stack Queue Double-ended queue Priority queue Tree Graph
two's complement
a mathematical operation on binary numbers, as well as a binary signed number representation based on this operation. Its wide use in computing makes it the most important example of a radix complement. The two's-complement system has the advantage that the fundamental arithmetic operations of addition, subtraction, and multiplication are identical to those for unsigned binary numbers (as long as the inputs are represented in the same number of bits and any overflow beyond those bits is discarded from the result). This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.
instance variable
a member variable in a class that each object has a copy of
abstract method
a method that has no definition(a method without body)
Child
a node that has nodes above it in the hierarchy
algorithm
a step-by-step procedure for performing some tasks.
data structure
a systematic way of organising, accessing and updating data
Chaining
a technique for generating a unique index when there is a collision by adding the key/value to a list stored at the same index
Anagram
a word, phrase, or name formed by rearranging the letters of another, such as cinema, formed from iceman.
18. In the Java standard library, the Iterator interface defines a method called "remove". What remove() is supposed to do (when it's implemented, which is optional), is remove the element that was most recently returned by next(). If you were to implement remove() what would you expect the worst case time complexity to be for one call to remove() for each of the following data structures (assume there are N elements in the data structure). a. A LinkedList (assume doubly-linked, if that matters) b. A Hash Table (assume the table size is M and chaining is used, if that matters)
a. A LinkedList (assume doubly-linked, if that matters) O(n) because it has to visit each node b. A Hash Table (assume the table size is M and chaining is used, if that matters) O(n) because worst case all the elements are in the same chain and the one you're looking for is last, so it would have to visit each link in the chain
Container
abstract data type. a class, a data structure,[1][2] or an abstract data type (ADT) whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules. The size of the container depends on the number of objects (elements) it contains. Containers can be looked at in three ways: -access, that is the way of accessing the objects of the container. In the case of arrays, access is done with the array index. In the case of stacks, access is done according to the LIFO (last in, first out) order (alternative name: FILO, first in, last out)[3] and in the case of queues it is done according to the FIFO (first in, first out) order (alternative name: LILO, last in, last out); -storage, that is the way of storing the objects of the container; -traversal, that is the way of traversing the objects of the container. -Container classes are expected to implement methods to do the following: create an empty container; insert objects into the container; delete objects from the container; delete all the objects in the container (clear); access the objects in the container; access the number of objects in the container (size). Containers are sometimes implemented in conjunction with iterators. -Containers can be divided into two groups: single value containers and associative containers.
Multimap (sometimes also multihash)
abstract data type. a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers (for example, see C++ Standard Template Library containers). Often the multimap is implemented as a map with lists or sets as the map values. C++, Dart, Java, OCaml, Scala.
Queue
abstract data type. a particular kind of abstract data type or collection in which the entities in the collection are kept in order and the principal (or only) operations on the collection are the addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position, known as dequeue. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. Often a peek or front operation is also entered, returning the value of the front element without dequeuing it. A queue is an example of a linear data structure, or more abstractly a sequential collection. -Queues provide services in computer science, transport, and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. -Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. Common implementations are circular buffers and linked lists.
Tree data structure
abstract data type. a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. -a (possibly non-linear) data structure made up of nodes or vertices and edges without having any cycle. The tree with no nodes is called the null or empty tree. A tree that is not empty consists of a root node and potentially many levels of additional nodes that form a hierarchy. -A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root. -Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a list of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance).
Associative array (or map, symbol table, or dictionary)
abstract data type. an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection. -Operations associated with this data type allow: the addition of a pair to the collection the removal of a pair from the collection the modification of an existing pair the lookup of a value associated with a particular key -The dictionary problem is a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search', 'delete', and 'insert' operations. The two major solutions to the dictionary problem are a hash table or a search tree. In some cases it is also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures. -Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others. Content-addressable memory is a form of direct hardware-level support for associative arrays. -Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern.
Set
abstract data type. an abstract data type that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set. -Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set. -An abstract data structure is a collection, or aggregate, of data. The data may be booleans, numbers, characters, or other data structures. If one considers the structure yielded by packaging or indexing, there are four basic data structures: unpackaged, unindexed: bunch packaged, unindexed: set unpackaged, indexed: string (sequence) packaged, indexed: list (array)
Double-ended queue (dequeue, often abbreviated to deque, pronounced deck)
abstract data type. an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation. -This differs from the queue abstract data type or First-In-First-Out List (FIFO), where elements can only be added to one end and removed from the other. This general data class has some possible sub-types: 1) An input-restricted deque is one where deletion can be made from both ends, but insertion can be made at one end only. 2) An output-restricted deque is one where insertion can be made at both ends, but deletion can be made from one end only. -Both the basic and most common list types in computing, queues and stacks can be considered specializations of deques, and can be implemented using deques. -There are at least two common ways to efficiently implement a deque: with a modified dynamic array or with a doubly linked list. The dynamic array approach uses a variant of a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end. Three common implementations include: 1) Storing deque contents in a circular buffer, and only resizing when the buffer becomes full. This decreases the frequency of resizings. 2) Allocating deque contents from the center of the underlying array, and resizing the underlying array when either end is reached. This approach may require more frequent resizings and waste more space, particularly when elements are only inserted at one end. 3) Storing contents in multiple smaller arrays, allocating additional arrays at the beginning or end as needed. Indexing is implemented by keeping a dynamic array containing pointers to each of the smaller arrays.
Graph data structure
abstract data type. an abstract data type that is meant to implement the undirected graph and directed graph concepts from mathematics. -A graph data structure consists of a finite (and possibly mutable) set of vertices or nodes or points, together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges, arcs, or lines for an undirected graph and as arrows, directed edges, directed arcs, or directed lines for a directed graph. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references. -A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.). -Different data structures for the representation of graphs are used in practice: 1) Adjacency list Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices. 2) Adjacency matrix A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices. 3) Incidence matrix A two-dimensional Boolean matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate whether the vertex at a row is incident to the edge at a column. -look at table for time complexity cost -Adjacency lists are generally preferred because they efficiently represent sparse graphs. An adjacency matrix is preferred if the graph is dense, that is the number of edges |E | is close to the number of vertices squared, |V |2, or if one must be able to quickly look up if there is an edge connecting two vertices. -some properties of abstract data types: order, unique, associative
List (or sequence)
abstract data type. an abstract data type that represents an ordered sequence of values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a finite sequence; the (potentially) infinite analog of a list is a stream.[1]:§3.5 Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item. -Implementation of the list data structure may provide some of the following operations: a constructor for creating an empty list; an operation for testing whether or not a list is empty; an operation for prepending an entity to a list an operation for appending an entity to a list an operation for determining the first component (or the "head") of a list an operation for referring to the list consisting of all the components of a list except for its first (this is called the "tail" of the list.)
Stack
abstract data type. an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed. The order in which elements come off a stack gives rise to its alternative name, LIFO (for last in, first out). Additionally, a peek operation may give access to the top without modifying the stack. -The name "stack" for this type of structure comes from the analogy to a set of physical items stacked on top of each other, which makes it easy to take an item off the top of the stack, while getting to an item deeper in the stack may require taking off multiple other items first. -Considered as a linear data structure, or more abstractly a sequential collection, the push and pop operations occur only at one end of the structure, referred to as the top of the stack. This makes it possible to implement a stack as a singly linked list and a pointer to the top element.
Priority queue
abstract data type. an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue. -While priority queues are often implemented with heaps (it is the maximally efficient implementation of it), they are conceptually distinct from heaps. A priority queue is an abstract concept like "a list" or "a map"; just as a list can be implemented with a linked list or an array, a priority queue can be implemented with a heap or a variety of other methods such as an unordered array. -One can imagine a priority queue as a modified queue, but when one would get the next element off the queue, the highest-priority element is retrieved first. -Stacks and queues may be modeled as particular kinds of priority queues. As a reminder, here is how stacks and queues behave: 1) stack - elements are pulled in last-in first-out-order (e.g., a stack of papers) 2) queue - elements are pulled in first-in first-out-order (e.g., a line in a cafeteria) In a stack, the priority of each inserted element is monotonically increasing; thus, the last element inserted is always the first retrieved. In a queue, the priority of each inserted element is monotonically decreasing; thus, the first element inserted is always the first retrieved.
Multiset (or bag)
abstract data type. similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are considered identical, and are simply counted, or equal values are considered equivalent, and are stored as distinct items.
In a deque which operations are synonymous with enqueue and dequeue?
addToBack() and removeFront(), respectively.
Which deque operation is synonymous with push()?
addToFront(T)
Decision Tree
an abstraction used to prove lower bounds
Dijkstra's algorithm
an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.[1][2] The algorithm exists in many variants; Dijkstra's original variant found the shortest path between two nodes,[2] but a more common variant fixes a single node as the "source" node and finds shortest paths from the source to all other nodes in the graph, producing a shortest-path tree. For a given source node in the graph, the algorithm finds the shortest path between that node and every other.[3]:196-206 It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined. For example, if the nodes of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used to find the shortest route between one city and all other cities. As a result, the shortest path algorithm is widely used in network routing protocols, most notably IS-IS and Open Shortest Path First (OSPF). It is also employed as a subroutine in other algorithms such as Johnson's. Dijkstra's original algorithm does not use a min-priority queue and runs in time {\displaystyle O(|V|^{2})} O(|V|^{2}) (where {\displaystyle |V|} |V| is the number of nodes). The idea of this algorithm is also given in (Leyzorek et al. 1957). The implementation based on a min-priority queue implemented by a Fibonacci heap and running in {\displaystyle O(|E|+|V|\log |V|)} O(|E|+|V|\log |V|) (where {\displaystyle |E|} |E| is the number of edges) is due to (Fredman & Tarjan 1984). This is asymptotically the fastest known single-source shortest-path algorithm for arbitrary directed graphs with unbounded non-negative weights. However, specialized cases (such as bounded/integer weights, directed acyclic graphs etc.) can indeed be improved further as detailed in § Specialized variants. In some fields, artificial intelligence in particular, Dijkstra's algorithm or a variant of it is known as uniform-cost search and formulated as an instance of the more general idea of best-first search
Greedy Algorithm
an algorithm that follows problem solving heuristic of making optimal choices at each stage. Hopefully finds the global optimum. An example would be Kruskal's algorithm.
greedy algorithm
an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum.
Counting Radix Sort
an alternative implementation of radix sort that avoids using ArrayLists
bucket array
an array A of size N, where each cell of A is thought of as a "bucket" (that is a collection of key-value pair)
Primary Clustering
any key that hashes into the cluster will require several attempts to resolve the collision
Bitboard
array data structure. A bitboard, often used for boardgames such as chess, checkers, othello and word games, is a specialization of the bit array data structure, where each bit represents a game position or state, designed for optimization of speed and/or memory or disk use in mass calculations. Bits in the same bitboard relate to each other in the rules of the game, often forming a game position when taken together. Other bitboards are commonly used as masks to transform or answer queries about positions. The "game" may be any game-like system where information is tightly packed in a structured form with "rules" affecting how the individual units or pieces relate. -Bitboards are used in many of the world's highest-rated chess playing programs such as Houdini, Stockfish, and Critter. They help the programs analyze chess positions with few CPU instructions and hold a massive number of positions in memory efficiently. -Bitboards allow the computer to answer some questions about game state with one logical operation. For example, if a chess program wants to know if the white player has any pawns in the center of the board (center four squares) it can just compare a bitboard for the player's pawns with one for the center of the board using a logical AND operation. If there are no center pawns then the result will be zero. -Query results can also be represented using bitboards. For example, the query "What are the squares between X and Y?" can be represented as a bitboard. These query results are generally pre-calculated, so that a program can simply retrieve a query result with one memory load. However, as a result of the massive compression and encoding, bitboard programs are not easy for software developers to either write or debug.
Parallel array (SoA)
array data structure. a data structure for representing arrays of records. It keeps a separate, homogeneous data array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory (also known as AoS). For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.
Circular buffer (or circular queue, cyclic buffer or ring buffer)
array data structure. a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams. -a data buffer (or just buffer) is a region of a physical memory storage used to temporarily store data while it is being moved from one place to another. -the useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed. (If a non-circular buffer were used then it would be necessary to shift all elements when one is consumed.) In other words, the circular buffer is well-suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer. -Circular buffering makes a good implementation strategy for a queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a linked list approach may be preferred instead. -In some situations, overwriting circular buffer can be used, e.g. in multimedia. If the buffer is used as the bounded buffer in the producer-consumer problem then it is probably desired for the producer (e.g., an audio generator) to overwrite old data if the consumer (e.g., the sound card) is unable to momentarily keep up. Also, the LZ77 family of lossless data compression algorithms operates on the assumption that strings seen more recently in a data stream are more likely to occur soon in the stream. Implementations store the most recent data in a circular buffer. -A circular buffer can be implemented using four pointers, or two pointers and two integers: buffer start in memory buffer end in memory, or buffer capacity start of valid data (index or pointer) end of valid data (index or pointer), or amount of data currently in the buffer (integer) -A circular-buffer implementation may be optimized by mapping the underlying buffer to two contiguous regions of virtual memory. (Naturally, the underlying buffer's length must then equal some multiple of the system's page size.) Reading from and writing to the circular buffer may then be carried out with greater efficiency by means of direct memory access; those accesses which fall beyond the end of the first virtual-memory region will automatically wrap around to the beginning of the underlying buffer. When the read offset is advanced into the second virtual-memory region, both offsets—read and write—are decremented by the length of the underlying buffer.
Dope vector
array data structure. a data structure used to hold information about a data object, e.g. an array, especially its memory layout. -A dope vector typically contains information about the type of array element, rank of an array, the extents of an array, and the stride of an array as well as a pointer to the block in memory containing the array elements. -It is often used in compilers to pass entire arrays between procedures in a high level language like Fortran. -The dope vector includes an identifier, a length, a parent address, and a next child address. The identifier is an assigned name and may be unused. The length is the amount of allocated storage to this vector from the end of the dope vector that contains data of use to the internal processes of the computer. This length is called the offset, span, or vector length. The parent and child references are absolute memory addresses, or register and offset settings to the parent or child depending on the type of computer. -Dope vectors are often managed internally by the operating system and allow the processor to allocate and de-allocate storage in specific segments as needed. -Dope vectors may also have a status bit that tells the system if they are active; if it is not active it can be reallocated when needed. Using this technology the computer can perform a more granular memory management.
Iliffe vector (or display)
array data structure. a data structure used to implement multi-dimensional arrays. An Iliffe vector for an n-dimensional array (where n ≥ 2) consists of a vector (or 1-dimensional array) of pointers to an (n − 1)-dimensional array. They are often used to avoid the need for expensive multiplication operations when performing address calculation on an array element. They can also be used to implement jagged arrays, such as triangular arrays, triangular matrices and other kinds of irregularly shaped arrays. The data structure is named after John K. Iliffe. Their disadvantages include the need for multiple chained pointer indirections to access an element, and the extra work required to determine the next row in an n-dimensional array to allow an optimising compiler to prefetch it. Both of these are a source of delays on systems where the CPU is significantly faster than main memory. The Iliffe vector for a 2-dimensional array is simply a vector of pointers to vectors of data, i.e., the Iliffe vector represents the columns of an array where each column element is a pointer to a row vector. Multidimensional arrays in languages such as Java, Python (multidimensional lists), Ruby, Visual Basic .NET, Perl, PHP, JavaScript, Objective-C (when using NSArray, not a row-major C-style array), Swift, and Atlas Autocode are implemented as Iliffe vectors. Iliffe vectors are contrasted with dope vectors in languages such as Fortran, which contain the stride factors and offset values for the subscripts in each dimension.
Hashed array tree (HAT)
array data structure. a dynamic array data-structure published by Edward Sitarski in 1996,[1] maintaining an array of separate memory fragments (or "leaves") to store the data elements, unlike simple dynamic arrays which maintain their data in one contiguous memory area. Its primary objective is to reduce the amount of element copying due to automatic array resizing operations, and to improve memory usage patterns. -Whereas simple dynamic arrays based on geometric expansion waste linear (Ω(n)) space, where n is the number of elements in the array, hashed array trees waste only order O(√n) storage space. An optimization of the algorithm allows to eliminate data copying completely, at a cost of increasing the wasted space. -It can perform access in constant (O(1)) time, though slightly slower than simple dynamic arrays. The algorithm has O(1) amortized performance when appending a series of objects to the end of a hashed array tree. Contrary to its name, it does not use hash functions. -As defined by Sitarski, a hashed array tree has a top-level directory containing a power of two number of leaf arrays. All leaf arrays are the same size as the top-level directory. This structure superficially resembles a hash table with array-based collision chains, which is the basis for the name hashed array tree. A full hashed array tree can hold m2 elements, where m is the size of the top-level directory. The use of powers of two enables faster physical addressing through bit operations instead of arithmetic operations of quotient and remainder[1] and ensures the O(1) amortized performance of append operation in the presence of occasional global array copy while expanding. -Brodnik et al. presented a dynamic array algorithm with a similar space wastage profile to hashed array trees. Brodnik's implementation retains previously allocated leaf arrays, with a more complicated address calculation function as compared to hashed array trees.
Gap buffer
array data structure. a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in a large buffer in two contiguous segments, with a gap between them for inserting new text. Moving the cursor involves copying text from one side of the gap to the other (sometimes copying is delayed until the next operation that changes the text). Insertion adds new text at the end of the first segment; deletion deletes it. Text in a gap buffer is represented as two strings, which take very little extra space and which can be searched and displayed very quickly, compared to more sophisticated data structures such as linked lists. However, operations at different locations in the text and ones that fill the gap (requiring a new gap to be created) may require copying most of the text, which is especially inefficient for large files. The use of gap buffers is based on the assumption that such recopying occurs rarely enough that its cost can be amortized over the more common cheap operations. This makes the gap buffer a simpler alternative to the rope for use in text editors such as Emacs.
Bitmap
array data structure. a mapping from some domain (for example, a range of integers) to bits, that is, values which are zero or one. It is also called a bit array or bitmap index. In computer graphics, when the domain is a rectangle (indexed by two coordinates) a bitmap gives a way to store a binary image, that is, an image in which each pixel is either black or white (or any two colors). -The more general term pixmap refers to a map of pixels, where each one may store more than two colors, thus using more than one bit per pixel. Often bitmap is used for this as well. In some contexts, the term bitmap implies one bit per pixel, while pixmap is used for images with multiple bits per pixel -A bitmap is a type of memory organization or image file format used to store digital images. The term bitmap comes from the computer programming terminology, meaning just a map of bits, a spatially mapped array of bits. Now, along with pixmap, it commonly refers to the similar concept of a spatially mapped array of pixels. Raster images in general may be referred to as bitmaps or pixmaps, whether synthetic or photographic, in files or memory. -Many graphical user interfaces use bitmaps in their built-in graphics subsystems -Similarly, most other image file formats, such as JPEG, TIFF, PNG, and GIF, also store bitmap images (as opposed to vector graphics), but they are not usually referred to as bitmaps, since they use compressed formats internally. -In typical uncompressed bitmaps, image pixels are generally stored with a color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixels of 8 bits and fewer can represent either grayscale or indexed color. An alpha channel (for transparency) may be stored in a separate bitmap, where it is similar to a grayscale bitmap, or in a fourth channel that, for example, converts 24-bit images to 32 bits per pixel. -The bits representing the bitmap pixels may be packed or unpacked (spaced out to byte or word boundaries), depending on the format or device requirements. Depending on the color depth, a pixel in the picture will occupy at least n/8 bytes, where n is the bit depth.
Sparse matrix
array data structure. a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is called the sparsity of the matrix (which is equal to 1 minus the density of the matrix). -Conceptually, sparsity corresponds to systems which are loosely coupled. Consider a line of balls connected by springs from one to the next: this is a sparse system as only adjacent balls are coupled. By contrast, if the same line of balls had springs connecting each ball to all other balls, the system would correspond to a dense matrix. The concept of sparsity is useful in combinatorics and application areas such as network theory, which have a low density of significant data or connections. -Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations. -When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus require significantly less storage. Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms.
Heightmap (or heightfield)
array data structure. a raster image used to store values, such as surface elevation data, for display in 3D computer graphics. A heightmap can be used in bump mapping to calculate where this 3D data would create shadow in a material, in displacement mapping to displace the actual geometric position of points over the textured surface, or for terrain where the heightmap is converted into a 3D mesh. -A heightmap contains one channel interpreted as a distance of displacement or "height" from the "floor" of a surface and sometimes visualized as luma of a grayscale image, with black representing minimum height and white representing maximum height. When the map is rendered, the designer can specify the amount of displacement for each unit of the height channel, which corresponds to the "contrast" of the image. Heightmaps can be stored by themselves in existing grayscale image formats, with or without specialized metadata, or in specialized file formats such as Daylon Leveller, GenesisIV and Terragen documents. -One may also exploit the use of individual color channels to increase detail. For example, a standard RGB 8-bit image can only show 256 values of grey and hence only 256 heights. By using colors, a greater number of heights can be stored (for an 24-bit image, 2563 = 16,777,216 heights can be represented (2564 = 4,294,967,296 if the alpha channel is also used)). This technique is especially useful where height varies slightly over a large area. Using only grey values, because the heights must be mapped to only 256 values, the rendered terrain appears flat, with "steps" in certain places. -Heightmap of Earth's surface (including water and ice) in equirectangular projection, normalized as 8-bit grayscale Heightmaps are commonly used in geographic information systems, where they are called digital elevation models.
Matrix
array data structure. a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. The individual items in a matrix are called its elements or entries. -Applications of matrices are found in most scientific fields. In every branch of physics, including classical mechanics, optics, electromagnetism, quantum mechanics, and quantum electrodynamics, they are used to study physical phenomena, such as the motion of rigid bodies. In computer graphics, they are used to project a 3D model onto a 2 dimensional screen. In probability theory and statistics, stochastic matrices are used to describe sets of probabilities; for instance, they are used within the PageRank algorithm that ranks the pages in a Google search.[5] Matrix calculus generalizes classical analytical notions such as derivatives and exponentials to higher dimensions. -A major branch of numerical analysis is devoted to the development of efficient algorithms for matrix computations, a subject that is centuries old and is today an expanding area of research. Matrix decomposition methods simplify computations, both theoretically and practically. Algorithms that are tailored to particular matrix structures, such as sparse matrices and near-diagonal matrices, expedite computations in finite element method and other computations. Infinite matrices occur in planetary theory and in atomic theory. A simple example of an infinite matrix is the matrix representing the derivative operator, which acts on the Taylor series of a function.
Sorted array
array data structure. an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which have the same data type. Sorting an array is useful in organising data in ordered form and recovering them rapidly. -There are many well-known methods by which an array can be sorted, which include, but are not limited to: selection sort, bubble sort, insertion sort, merge sort, quicksort, heapsort, and counting sort. -Sorted arrays are the most space-efficient data structure with the best locality of reference for sequentially stored data.[citation needed] -Elements within a sorted array are found using a binary search, in O(log n); thus sorted arrays are suited for cases when one needs to be able to look up elements quickly, e.g. as a set or multiset data structure. This complexity for lookups is the same as for self-balancing binary search trees. -In some data structures, an array of structures is used. In such cases, the same sorting methods can be used to sort the structures according to some key as a structure element; for example, sorting records of students according to roll numbers or names or grades. -If one is using a sorted dynamic array, then it is possible to insert and delete elements. The insertion and deletion of elements in a sorted array executes at O(n), due to the need to shift all the elements following the element to be inserted or deleted; in comparison a self-balancing binary search tree inserts and deletes at O(log n). In the case where elements are deleted or inserted at the end, a sorted dynamic array can do this in amortized O(1) time while a self-balancing binary search tree always operates at O(log n). -Elements in a sorted array can be looked up by their index (random access) at O(1) time, an operation taking O(log n) or O(n) time for more complex data structures.
Bit array (also known as bitmap, bitset, bit string, or bit vector)
array data structure. an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation. -A bit array is a mapping from some domain (almost always a range of integers) to values in the set {0, 1}. The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed by applying an index to the array. Assuming its size (or length) to be n bits, the array can be used to specify a subset of the domain (e.g. {0, 1, 2, ..., n−1}), where a 1-bit indicates the presence and a 0-bit the absence of a number in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).
Sparse array
array data structure. an array in which most of the elements have the default value (usually 0 or null). The occurrence of zero-value elements in a large array is inefficient for both computation and storage. An array in which there is a large number of zero elements is referred to as being sparse. -In the case of sparse arrays, one can ask for a value from an "empty" array position. If one does this, then for an array of numbers, a value of zero should be returned, and for an array of objects, a value of null should be returned. -A naive implementation of an array may allocate space for the entire array, but in the case where there are few non-default values, this implementation is inefficient. Typically the algorithm used instead of an ordinary array is determined by other known features (or statistical features) of the array. For instance, if the sparsity is known in advance or if the elements are arranged according to some function (e.g., the elements occur in blocks). A heap memory allocator in a program might choose to store regions of blank space in a linked list rather than storing all of the allocated regions in, say a bit array. -An obvious question that might be asked is why we need a linked list to represent a sparse array if we can represent it easily using a normal array. The answer to this question lies in the fact that while representing a sparse array as a normal array, a lot of space is allocated for zero or null elements. For example, consider the following array declaration: double arr[1000][1000] -When we define this array, enough space for 1,000,000 doubles is allocated. If each double requires 8 bytes of memory, this array will require 8 million bytes of memory. Because this is a sparse array, most of its elements will have a value of zero (or null). Hence, defining this array will soak up all this space and waste memory (compared to an array in which memory has been allocated only for the nonzero elements). An effective way to overcome this problem is to represent the array using a linked list which requires less memory as only elements having non-zero value are stored. This involves a time-space trade-off: though less memory is used, average access and insertion time becomes linear in the number of elements stored because the previous elements in the list must be traversed to find the desired element. A normal array has constant access and insertion time. -A sparse array as a linked list contains nodes linked to each other. In a one-dimensional sparse array, each node includes the non-zero element's "index" (position), the element's "value", and a node pointer "next" (for linking to the next node). Nodes are linked in order as per the index. In the case of a two-dimensional sparse array, each node contains a row index, a column index (which together give its position), a value at that position and a pointer to the next node.
Lookup table
array data structure. an array that replaces runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an "expensive" computation or input/output operation. The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase (memorization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. -Examples 1) Simple lookup in an array, an associative array or a linked list (unsorted list) This is known as a linear search or brute-force search, each element being checked for equality in turn and the associated value, if any, used as a result of the search. This is often the slowest search method unless frequently occurring values occur early in the list. For a one-dimensional array or linked list, the lookup is usually to determine whether or not there is a match with an 'input' data value. 2) Binary search in an array or an associative array (sorted list) An example of a "divide and conquer algorithm", binary search involves each element being found by determining which half of the table a match may be found in and repeating until either success or failure. This is only possible if the list is sorted but gives good performance even if the list is lengthy. 3) Trivial hash function For a trivial hash function lookup, the unsigned raw data value is used directly as an index to a one-dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time.
Control table
array data structure. tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design[1][2] (although this typically refers to generating code automatically from external tables rather than direct run-time tables). In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines -control flow (or alternatively, flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language. Within an imperative programming language, a control flow statement is a statement whose execution results in a choice being made as to which of two or more paths should be followed. For non-strict functional languages, functions and language constructs exist to achieve the same result, but they are not necessarily called control flow statements.
has table load factor
currentNumberOfEntries/tableSize
Queap
b-tree data structure. a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time. -The data structure consists of a doubly linked list and a 2-4 tree data structure, each modified to keep track of its minimum-priority element. The basic operation of the structure is to keep newly inserted elements in the doubly linked list, until a deletion would remove one of the list items, at which point they are all moved into the 2-4 tree. The 2-4 tree stores its elements in insertion order, rather than the more conventional priority-sorted order. -Both the data structure and its name were devised by John Iacono and Stefan Langerman.
2-3-4 tree (also called a 2-4 tree)
b-tree data structure. a self-balancing data structure that is commonly used to implement dictionaries. -The numbers mean a tree where every node with children (internal node) has either two, three, or four child nodes: 1) a 2-node has one data element, and if internal has two child nodes; 2) a 3-node has two data elements, and if internal has three child nodes; 3) a 4-node has three data elements, and if internal has four child nodes. -2-3-4 trees are B-trees of order 4;[1] like B-trees in general, they can search, insert and delete in O(log n) time. One property of a 2-3-4 tree is that all external nodes are at the same depth. -2-3-4 trees are an isometry of red-black trees, meaning that they are equivalent data structures. In other words, for every 2-3-4 tree, there exists at least one red-black tree with data elements in the same order. Moreover, insertion and deletion operations on 2-3-4 trees that cause node expansions, splits and merges are equivalent to the color-flipping and rotations in red-black trees. Introductions to red-black trees usually introduce 2-3-4 trees first, because they are conceptually simpler. 2-3-4 trees, however, can be difficult to implement in most programming languages because of the large number of special cases involved in operations on the tree. Red-black trees are simpler to implement,[2] so tend to be used instead.
Dancing tree
b-tree data structure. a tree data structure similar to B+ trees. It was invented by Hans Reiser, for use by the Reiser4 file system. As opposed to self-balancing binary search trees that attempt to keep their nodes balanced at all times, dancing trees only balance their nodes when flushing data to a disk (either because of memory constraints or because a transaction has completed).[1] The idea behind this is to speed up file system operations by delaying optimization of the tree and only writing to disk when necessary, as writing to disk is thousands of times slower than writing to memory. Also, because this optimization is done less often than with other tree data structures, the optimization can be more extensive. In some sense, this can be considered to be a self-balancing binary search tree that is optimized for storage on a slow medium, in that the on-disc form will always be balanced but will get no mid-transaction writes; doing so eases the difficulty (at the time) of adding and removing nodes, and instead performs these (slow) rebalancing operations at the same time as the (much slower) write to the storage medium. However, a (negative) side effect of this behavior is witnessed in cases of unexpected shutdown, incomplete data writes, and other occurrences that may prevent the final (balanced) transaction from completing. In general, dancing trees will pose a greater difficulty for data recovery from incomplete transactions than a normal tree; though this can be addressed by either adding extra transaction logs or developing an algorithm to locate data on disk not previously present, then going through with the optimizations once more before continuing with any other pending operations/transactions.
2-3 tree
b-tree data structure. a tree data structure, where every node with children (internal node) has either two children (2-node) and one data element or three children (3-nodes) and two data elements. According to Knuth, "a B-tree of order 3 is a 2-3 tree." Nodes on the outside of the tree (leaf nodes) have no children and one or two data elements.[1][2] 2−3 trees were invented by John Hopcroft in 1970
10. Which of these methods can be used to obtain set of all keys in a map? a) getAll() b) getKeys() c) keyall() d) keySet()
d) keySet()
B+ tree
b-tree data structure. an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.[2] -A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves. -The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree. -The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories. NTFS uses B+ trees for directory indexing. EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.[3] Relational database management systems such as IBM DB2,[4] Informix,[4] Microsoft SQL Server,[4] Oracle 8,[4] Sybase ASE,[4] and SQLite[5] support this type of tree for table indices. Key-value database management systems such as CouchDB[6] and Tokyo Cabinet[7] support this type of tree for data access.
Fusion tree
b-tree data structure. multiway tree data structure. a type of tree data structure that implements an associative array on w-bit integers. When operating on a collection of n key-value pairs, it uses O(n) space and performs searches in O(logw n) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by exploiting certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.[1] -Several advances have been made since Fredman and Willard's original 1990 paper. In 1999[2] it was shown how to implement fusion trees under a model of computation in which all of the underlying operations of the algorithm belong to AC0, a model of circuit complexity that allows addition and bitwise Boolean operations but disallows the multiplication operations used in the original fusion tree algorithm. A dynamic version of fusion trees using hash tables was proposed in 1996[3] which matched the original structure's O(logw n) runtime in expectation. Another dynamic version using exponential tree was proposed in 2007[4] which yields worst-case runtimes of O(logw n + log log u) per operation, where u is the size of the largest key. It remains open whether dynamic fusion trees can achieve O(logw n) per operation with high probability. -A fusion tree is essentially a B-tree with branching factor of w1/5 (any small exponent is also possible), which gives it a height of O(logw n). To achieve the desired runtimes for updates and queries, the fusion tree must be able to search a node containing up to w1/5 keys in constant time. This is done by compressing ("sketching") the keys so that all can fit into one machine word, which in turn allows comparisons to be done in parallel.
Cuckoo Hashing
based off the concept of picking two possible spots of picking two possible spots in the table, and putting the item into the smaller of two
Hamilton circuit
begin at vertex v and pass through every vertex once and end at vertex v
Euler Circuit
begins vertex v and passes through every edge to end at vertex v.
Red-black tree
binary tree data structure. The leaf nodes of red-black trees do not contain data. These leaves need not be explicit in computer memory—a null child pointer can encode the fact that this child is a leaf—but it simplifies some algorithms for operating on red-black trees if the leaves really are explicit nodes. To save memory, sometimes a single sentinel node performs the role of all leaf nodes; all references from internal nodes to leaf nodes then point to the sentinel node. Red-black trees, like all binary search trees, allow efficient in-order traversal (that is: in the order Left-Root-Right) of their elements. The search-time results from the traversal from root to leaf, and therefore a balanced tree of n nodes, having the least possible tree height, results in O(log n) search time.
Binary search tree (BST, sometimes called ordered or sorted binary trees)
binary tree data structure. BST are a particular type of containers: data structures that store "items" (such as numbers, names etc.) in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by name). -Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables. -Several variants of the binary search tree have been studied in computer science; this article deals primarily with the basic type, making references to more advanced types when appropriate.
Left-child right-sibling binary tree (or child-sibling representation, doubly chained tree or filial-heir chain)
binary tree data structure. Every multi-way or k-ary tree structure studied in computer science admits a representation as a binary tree. In a binary tree that represents a multi-way tree T, each node corresponds to a node in T and has two pointers: one to the node's first child, and one to its next sibling in T. The children of a node thus form a singly-linked list. -use cases: LCRS representation fits two normal criteria: 1) LCRS representation uses much memory. 2) Random access of a node's children is not required. Case (1) is utilized when large multi-way trees are necessary, especially when the trees contains a large set of data. For example, if storing a phylogenetic tree, the LCRS representation might be suitable. Case (2) arises in specialized data structures in which the tree structure is being used in very specific ways. For example, many types of heap data structures that use multi-way trees can be space optimized by using the LCRS representation. The main reason for this is that in heap data structures, the most common operations tend to be 1) Remove the root of a tree and process each of its children, or 2) Join two trees together by making one tree a child of the other. Operation (1) it is very efficient. In LCRS representation, it organizes the tree to have a right child because it does not have a sibling, so it is easy to remove the root. Operation (2) it is also efficient. It is easy to join two trees together.
Cartesian tree*
binary tree data structure. a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. Introduced by Vuillemin (1980) in the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence. -Cartesian trees may be used as part of an efficient data structure for range minimum queries, a range searching problem involving queries that ask for the minimum value in a contiguous subsequence of the original sequence.[2] In a Cartesian tree, this minimum value may be found at the lowest common ancestor of the leftmost and rightmost values in the subsequence.
queue operations
enqueue(e), dequeue(), front(), size(), isEmpty()
Top tree
binary tree data structure. a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median. -A top tree R is defined for an underlying tree T and a set delta T (?) of at most two vertices called as External Boundary Vertices -The treap was first described by Cecilia R. Aragon and Raimund Seidel in 1989;[1][2] its name is a portmanteau of tree and heap. It is a Cartesian tree in which each key is given a (randomly chosen) numeric priority. As with any binary search tree, the inorder traversal order of the nodes is the same as the sorted order of the keys. The structure of the tree is determined by the requirement that it be heap-ordered: that is, the priority number for any non-leaf node must be greater than or equal to the priority of its children. Thus, as with Cartesian trees more generally, the root node is the maximum-priority node, and its left and right subtrees are formed in the same manner from the subsequences of the sorted order to the left and right of that node. An equivalent way of describing the treap is that it could be formed by inserting the nodes highest-priority-first into a binary search tree without doing any rebalancing. Therefore, if the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order. Because random binary search trees are known to have logarithmic height with high probability, the same is true for treaps. Aragon and Seidel also suggest assigning higher priorities to frequently accessed nodes, for instance by a process that, on each access, chooses a random number and replaces the priority of the node with that number if it is higher than the previous priority. This modification would cause the tree to lose its random shape; instead, frequently accessed nodes would be more likely to be near the root of the tree, causing searches for them to be faster. Naor and Nissim[3] describe an application in maintaining authorization certificates in public-key cryptosystems.
Rope (or cord)
binary tree data structure. a data structure composed of smaller strings that is used for efficiently storing and manipulating a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently -A rope is a binary tree having leaf nodes that contain a short string. Each node has a weight value equal to the length of its string plus the sum of all leaf nodes' weight in its left subtree, namely the weight of a node is the total string length in its left subtree for a non-leaf node, or the string length of itself for a leaf node. Thus a node with two children divides the whole string into two parts: the left subtree stores the first part of the string. The right subtree stores the second part and its weight is the sum of the left child's weight and the length of its contained string. -The binary tree can be seen as several levels of nodes. The bottom level contains all the nodes that contain a string. Higher levels have fewer and fewer nodes. The top level consists of a single "root" node. The rope is built by putting the nodes with short strings in the bottom level, then attaching a random half of the nodes to parent nodes in the next level.
AA tree*
binary tree data structure. a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named for Arne Andersson, their inventor. -AA trees are a variation of the red-black tree, a form of binary search tree which supports efficient addition and deletion of entries. Unlike red-black trees, red nodes on an AA tree can only be added as a right subchild. In other words, no red node can be a left sub-child. This results in the simulation of a 2-3 tree instead of a 2-3-4 tree, which greatly simplifies the maintenance operations. The maintenance algorithms for a red-black tree need to consider seven different shapes to properly balance the tree (see image in wiki). An AA tree on the other hand only needs to consider two shapes due to the strict requirement that only right links can be red (see image in wiki). -Balancing rotations: -Whereas red-black trees require one bit of balancing metadata per node (the color), AA trees require O(log(N)) bits of metadata per node, in the form of an integer "level". The following invariants hold for AA trees: 1. The level of every leaf node is one. 2. The level of every left child is exactly one less than that of its parent. 3. The level of every right child is equal to or one less than that of its parent. 4. The level of every right grandchild is strictly less than that of its grandparent. 5. Every node of level greater than one has two children. A link where the child's level is equal to that of its parent is called a horizontal link, and is analogous to a red link in the red-black tree. Individual right horizontal links are allowed, but consecutive ones are forbidden; all left horizontal links are forbidden. These are more restrictive constraints than the analogous ones on red-black trees, with the result that re-balancing an AA tree is procedurally much simpler than re-balancing a red-black tree. Insertions and deletions may transiently cause an AA tree to become unbalanced (that is, to violate the AA tree invariants). Only two distinct operations are needed for restoring balance: "skew" and "split". Skew is a right rotation to replace a subtree containing a left horizontal link with one containing a right horizontal link instead. Split is a left rotation and level increase to replace a subtree containing two or more consecutive right horizontal links with one containing two fewer consecutive right horizontal links. Implementation of balance-preserving insertion and deletion is simplified by relying on the skew and split operations to modify the tree only if needed, instead of making their callers decide whether to skew or split.
Pagoda
binary tree data structure. a priority queue implemented with a variant of a binary tree. The root points to its children, as in a binary tree. Every other node points back to its parent and down to its leftmost (if it is a right child) or rightmost (if it is a left child) descendant leaf. The basic operation is merge or meld, which maintains the heap property. An element is inserted by merging it as a singleton. The root is removed by merging its right and left children. Merging is bottom-up, merging the leftmost edge of one with the rightmost edge of the other.
Splay tree*
binary tree data structure. a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For many sequences of non-random operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.[1] -All normal operations on a binary search tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree. One way to do this is to first perform a standard binary tree search for the element in question, and then use tree rotations in a specific fashion to bring the element to the top. Alternatively, a top-down algorithm can combine the search and the tree reorganization into a single phase.
Scapegoat tree
binary tree data structure. a self-balancing binary search tree, invented by Arne Andersson[1] and again by Igal Galperin and Ronald L. Rivest.[2] It provides worst-case O(log n) lookup time, and O(log n) amortized insertion and deletion time. -Unlike most other self-balancing binary search trees that provide worst case O(log n) lookup time, scapegoat trees have no additional per-node memory overhead compared to a regular binary search tree: a node stores only a key and two pointers to the child nodes. This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce node overhead by up to one-third.
AVL tree
binary tree data structure. a self-balancing binary search tree. It was the first such data structure to be invented.[2] In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. -The AVL tree is named after its two Soviet inventors, Georgy Adelson-Velsky and Evgenii Landis, who published it in their 1962 paper "An algorithm for the organization of information". -AVL trees are often compared with red-black trees because both support the same set of operations and take O(log n) time for the basic operations. For lookup-intensive applications, AVL trees are faster than red-black trees because they are more rigidly balanced. Similar to red-black trees, AVL trees are height-balanced. Both are in general not weight-balanced nor μ-balanced for any μ≤1⁄2;[5] that is, sibling nodes can have hugely differing numbers of descendants. -Both AVL trees and red-black trees are self-balancing binary search trees and they are very similar mathematically.[9] The operations to balance the trees are different, but both occur on the average in O(1) with maximum in O(log n). The real difference between the two is the limiting height. -AVL trees are more rigidly balanced than red-black trees, leading to faster retrieval but slower insertion and deletion.
WAVL tree (or weak AVL tree)
binary tree data structure. a self-balancing binary search tree. WAVL trees are named after AVL trees, another type of balanced search tree, and are closely related both to AVL trees and red-black trees, which all fall into a common framework of rank balanced trees. Like other balanced binary search trees, WAVL trees can handle insertion, deletion, and search operations in time O(log n) per operation.[1][2] -WAVL trees are designed to combine some of the best properties of both AVL trees and red-black trees. One advantage of AVL trees over red-black trees is that they are more balanced: they have height at most log (phi?) n= approx 1.44 log _{2}n (for a tree with n data items, where phi is the golden ratio), while red-black trees have larger maximum height, 2log _{2}n. If a WAVL tree is created using only insertions, without deletions, then it has the same small height bound that an AVL tree has. On the other hand, red-black trees have the advantage over AVL trees that they perform less restructuring of their trees. In AVL trees, each deletion may require a logarithmic number of tree rotation operations, while red-black trees have simpler deletion operations that use only a constant number of tree rotations. WAVL trees, like red-black trees, use only a constant number of tree rotations, and the constant is even better than for red-black trees.[1][2] -WAVL trees were introduced by Haeupler, Sen & Tarjan (2015). The same authors also provided a common view of AVL trees, WAVL trees, and red-black trees as all being a type of rank-balanced tree.
Binary tree*
binary tree data structure. a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a triple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set.[1] Some authors allow the binary tree to be the empty set as well.[2] -From a graph theory perspective, binary (and K-ary) trees as defined here are actually arborescences.[3] A binary tree may thus be also called a bifurcating arborescence[3]—a term which actually appears in some very old programming books,[4] before the modern computer science terminology prevailed. It is also possible to interpret a binary tree as an undirected, rather than a directed graph, in which case a binary tree is an ordered, rooted tree.[5] Some authors use rooted binary tree instead of binary tree to emphasize the fact that the tree is rooted, but as defined above, a binary tree is always rooted.[6] A binary tree is a special case of an ordered K-ary tree, where k is 2. -In computing, binary trees are seldom used solely for their structure. Much more typical is to define a labeling function on the nodes, which associates some value to each node.[7] Binary trees labelled this way are used to implement binary search trees and binary heaps, and are used for efficient searching and sorting. The designation of non-root nodes as left or right child even when there is only one child present matters in some of these applications, in particular it is significant in binary search trees.[8] In mathematics, what is termed binary tree can vary significantly from author to author. Some use the definition commonly used in computer science,[9] but others define it as every non-leaf having exactly two children and don't necessarily order (as left/right) the children either
Tango tree
binary tree data structure. a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Patrascu in 2004. -It is an online binary search tree that achieves an O(log log n) competitive ratio relative to the optimal offline binary search tree, while only using O(log log n) additional bits of memory per node. This improved upon the previous best known competitive ratio, which was O(log n). -Tango trees work by partitioning a binary search tree into a set of preferred paths, which are themselves stored in auxiliary trees (so the tango tree is represented as a tree of trees). -First, we define for each node its preferred child, which informally is the most-recently touched child by a traditional binary search tree lookup. More formally, consider a subtree T, rooted at p, with children l (left) and r (right). We set r as the preferred child of p if the most recently accessed node in T is in the subtree rooted at r, and l as the preferred child otherwise. Note that if the most recently accessed node of T is p itself, then l is the preferred child by definition. -A preferred path is defined by starting at the root and following the preferred children until reaching a leaf node. Removing the nodes on this path partitions the remainder of the tree into a number of subtrees, and we recurse on each subtree (forming a preferred path from its root, which partitions the subtree into more subtrees).
2-3 heap
heap data structure. a data structure, a variation on the heap, designed by Tadao Takaoka in 1999. The structure is similar to the Fibonacci heap, and borrows from the 2-3 tree. -Time costs for some common heap operations are: 1) Delete-min takes O(log(n)) amortized time. 2) Decrease-key takes constant amortized time. 3) Insertion takes constant amortized time.
T-tree*
binary tree data structure. a type of binary tree data structure that is used by main-memory databases, such as Datablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite. -A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on block oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them. -T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields. -The 'T' in T-tree refers to the shape of the node data structures in the original paper that first described this type of index. -Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware. -The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access. -A T-tree node usually consists of pointers to the parent node, the left and right child node, an ordered array of data pointers and some extra control data. Nodes with two subtrees are called internal nodes, nodes without subtrees are called leaf nodes and nodes with only one subtree are named half-leaf nodes. A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively. -For each internal node, leaf or half leaf nodes exist that contain the predecessor of its smallest data value (called the greatest lower bound) and one that contains the successor of its largest data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of data elements from one to the maximum size of the data array. Internal nodes keep their occupancy between predefined minimum and maximum numbers of elements
Weight-balanced tree (WBTs)
binary tree data structure. a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences.[1] These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees.[2][3] Their more common name is due to Knuth.[4] Like other self-balancing trees, WBTs store bookkeeping information pertaining to balance in their nodes and perform rotations to restore balance when it is disturbed by insertion or deletion operations. Specifically, each node stores the size of the subtree rooted at the node, and the sizes of left and right subtrees are kept within some factor of each other. Unlike the balance information in AVL trees (which store the height of subtrees) and red-black trees (which store a fictional "color" bit), the bookkeeping information in a WBT is an actually useful property for applications: the number of elements in a tree is equal to the size of its root, and the size information is exactly the information needed to implement the operations of an order statistic tree, viz., getting the n'th largest element in a set or determining an element's index in sorted order.[5] -Weight-balanced trees are popular in the functional programming community and are used to implement sets and maps in MIT Scheme, SLIB and implementations of Haskell. -A weight-balanced tree is a binary search tree that stores the sizes of subtrees in the nodes. That is, a node has fields ...key, of any ordered type ...value (optional, only for mappings) ...left, right, pointer to node ...size, of type integer. By definition, the size of a leaf (typically represented by a nil pointer) is zero. The size of an internal node is the sum of sizes of its two children, plus one (size[n] = size[n.left] + size[n.right] + 1). Based on the size, one defines the weight as weight[n] = size[n] + 1
Order statistic tree
binary tree data structure. a variant of the binary search tree (or more generally, a B-tree[1]) that supports two additional operations beyond insertion, lookup and deletion: -Select(i) — find the i'th smallest element stored in the tree -Rank(x) - find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree -Both operations can be performed in O(log n) worst case time when a self-balancing tree is used as the base data structure.
Self-balancing binary search tree (or height-balanced binary search tree)
binary tree data structure. any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions -These structures provide efficient implementations for mutable ordered lists, and can be used for other abstract data structures such as associative arrays, priority queues and sets. -The red-black tree, which is a type of self-balancing binary search tree, was called symmetric binary B-tree[2] and was renamed but can still be confused with the generic concept of self-balancing binary search tree because of the initials. -popular implementations: 2-3 tree AA tree AVL tree Red-black tree Scapegoat tree Splay tree Treap
Treap
binary tree data structure. heap data structure. The treap was first described by Cecilia R. Aragon and Raimund Seidel in 1989;[1][2] its name is a portmanteau of tree and heap. It is a Cartesian tree in which each key is given a (randomly chosen) numeric priority. As with any binary search tree, the inorder traversal order of the nodes is the same as the sorted order of the keys. The structure of the tree is determined by the requirement that it be heap-ordered: that is, the priority number for any non-leaf node must be greater than or equal to the priority of its children. Thus, as with Cartesian trees more generally, the root node is the maximum-priority node, and its left and right subtrees are formed in the same manner from the subsequences of the sorted order to the left and right of that node. -An equivalent way of describing the treap is that it could be formed by inserting the nodes highest-priority-first into a binary search tree without doing any rebalancing. Therefore, if the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order. Because random binary search trees are known to have logarithmic height with high probability, the same is true for treaps. -Aragon and Seidel also suggest assigning higher priorities to frequently accessed nodes, for instance by a process that, on each access, chooses a random number and replaces the priority of the node with that number if it is higher than the previous priority. This modification would cause the tree to lose its random shape; instead, frequently accessed nodes would be more likely to be near the root of the tree, causing searches for them to be faster. -Naor and Nissim[3] describe an application in maintaining authorization certificates in public-key cryptosystems.
Tagged union (also called variant, variant record, discriminated union, or disjoint union)
composite data type. a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases," each of which should be handled correctly when that type is manipulated. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time. -Tagged unions are most important in functional languages such as ML and Haskell, where they are called datatypes (see algebraic data type) and the compiler is able to verify that all cases of a tagged union are always handled, avoiding many types of errors. They can, however, be constructed in nearly any language, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly keep track of which member of the union is currently in use. -Tagged unions are often accompanied by the concept of a type constructor, which is similar but not the same as a constructor for a class. Type constructors produce a tagged union type, given the initial tag type and the corresponding type. -An enumerated type can be seen as a degenerate case: a tagged union of unit types. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag.
Record (also called tuple or struct)
composite data type. a record (also called struct or compound data) is a basic data structure. A record is a collection of fields ("members" in object-oriented programming or "elements"), possibly of different data types, typically in fixed number and sequence. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency. An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object types are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects, and are known as plain old data structures (PODSs), to contrast with objects that use OO features. A record can be viewed as the computer analog of a mathematical tuple. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.
Union
composite data type. a value that may have any of several representations or formats; or it is a data structure that consists of a variable that may hold such a value. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". Contrast with a record (or structure), which could be defined to contain a float and an integer; in a union, there is only one value at any given time. -A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just bytes or words of memory), but the value can be treated as one of several abstract data types, having the type of the value that was last written to the memory area. -Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in a type-unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store a data type tag. -The name "union" stems from the type's formal definition. If a type is considered as the set of all values that that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written. -However, one useful programming function of unions is to map smaller data elements to larger ones for easier manipulation. A data structure consisting, for example, of 4 bytes and a 32-bit integer, can form a union with an unsigned 64-bit integer, and thus be more readily accessed for purposes of comparison etc. -In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. -The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, perhaps bitfield and word sharing. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.
data structures
concrete representations of data from the point of view of an implementer, not a user
Graph G
consists of a set of vertices V and a set of edges E, such that each edge in E is a connection between a pair of vertices in V.
deallocating memory
delete object; set pointer to nullptr or dangling pointers will exist
Array
easy but can waste space and size is static
Secondary Clustering
elements that hash into the same position will probe for alternative cell
Recursive case for calculating n! ( n factorial)
else{ return n * factorial(n-1); }
Object-Oriented programming
focus is on the objects, which contain data and a means to manipulate the data. Messages are sent to objects to perform operations.
Procedural Programming
focus is on the process. procedures/functions are written to process data
Skew heap (or self-adjusting heap)
heap data structure. a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied: The general heap order must be enforced Every operation (add, remove_min, merge) on two skew heaps must be done using a special skew heap merge. A skew heap is a self-adjusting form of a leftist heap which attempts to maintain balance by unconditionally swapping all nodes in the merge path when merging two heaps. (The merge operation is also used when adding and removing values.) With no structural constraints, it may seem that a skew heap would be horribly inefficient. However, amortized complexity analysis can be used to demonstrate that all operations on a skew heap can be done in O(log n).
heap-order
for every internal node v other than the root, key(v) >= key(parent(v))
What is the queue equivalent of peek()?
getFront(). Looks at the front item of the queue without changing the queue.
big-Omega notation
gives lower bound on the growth rate
breadth first search
goes to all adjacent vertices before going forward; firstVisited->firstExplored.
linear probing
handles collisions by placing the colliding item in the next (circularly) available table cell
O(lg n)
happens for up to the height of a balanced tree
Beap (bi-parental heap)
heap data structure. a data structure where a node usually has two parents (unless it is the first or last on a level) and two children (unless it is on the last level). Unlike a heap, a beap allows sublinear search. The beap was introduced by Ian Munro and Hendra Suwanda. A related data structure is the Young tableau.
Binary heap
heap data structure. a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues -A binary heap is defined as a binary tree with two additional constraints: 1. Shape property: a binary heap is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right. 2. Heap property: the key stored in each node is either greater than or equal to or less than or equal to the keys in the node's children, according to some total order. -Heaps where the parent key is greater than or equal to (≥) the child keys are called max-heaps; those where it is less than or equal to (≤) are called min-heaps. Efficient (logarithmic-time) algorithms are known for the two operations needed to implement a priority queue on a binary heap: inserting an element, and removing the smallest (largest) element from a min-heap (max-heap). Binary heaps are also commonly employed in the heapsort sorting algorithm, which is an in-place algorithm owing to the fact that binary heaps can be implemented as an implicit data structure, storing keys in an array and using their relative positions within that array to represent child-parent relationships.
Binomial heap
heap data structure. a heap similar to a binary heap but also supports quick merging of two heaps. This is achieved by using a special tree structure. It is important as an implementation of the mergeable heap abstract data type (also called meldable heap), which is a priority queue supporting merge operation. -A binomial heap is implemented as a collection of binomial trees (compare with a binary heap, which has a shape of a single binary tree), which are defined recursively as follows: 1. A binomial tree of order 0 is a single node 2. A binomial tree of order k has a root node whose children are roots of binomial trees of orders k−1, k−2, ..., 2, 1, 0 (in this order). -A binomial tree of order k has 2k nodes, height k. -Because of its unique structure, a binomial tree of order k can be constructed from two trees of order k−1 trivially by attaching one of them as the leftmost child of the root of the other tree. This feature is central to the merge operation of a binomial heap, which is its major advantage over other conventional heaps. -The name comes from the shape: a binomial tree of order n has matrix parenthesis {n d} nodes at depth d.
Brodal queue
heap data structure. a heap/priority queue structure with very low worst case time bounds: O(1) for insertion, find-minimum, meld (merge two queues) and decrease-key and O(log (n)) for delete-minimum and general deletion; they are the first heap variant with these bounds. Brodal queues are named after their inventor Gerth Stølting Brodal. -While having better asymptotic bounds than other priority queue structures, they are, in the words of Brodal himself, "quite complicated" and "[not] applicable in practice."[1] Brodal and Okasaki describe a persistent (purely functional) version of Brodal queues
Leftist heap (leftist heap)
heap data structure. a priority queue implemented with a variant of a binary heap. Every node has an s-value which is the distance to the nearest leaf. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value. -The height-biased leftist tree was invented by Clark Allan Crane.[1] The name comes from the fact that the left subtree is usually taller than the right subtree. -When inserting a new node into a tree, a new one-node tree is created and merged into the existing tree. To delete a minimum item, we remove the root and the left and right sub-trees are then merged. Both these operations take O(log n) time. For insertions, this is slower than binomial heaps which support insertion in amortized constant time, O(1) and O(log n) worst-case. -Leftist trees are advantageous because of their ability to merge quickly, compared to binary heaps which take Θ(n). In almost all cases, the merging of skew heaps has better performance. However merging leftist heaps has worst-case O(log n) complexity while merging skew heaps has only amortized O(log n) complexity.
Pairing heap
heap data structure. a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986.[1] Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm,[2] and support the following operations (assuming a min-heap): 1. find-min: simply return the top element of the heap. 2. merge: compare the two root elements, the smaller remains the root of the result, the larger element and its subtree is appended as a child of this root. 3. insert: create a new heap for the inserted element and merge into the original heap. 4. decrease-key (optional): remove the subtree rooted at the key to be decreased, replace the key with a smaller key, then merge the result back into the heap. 5. delete-min: remove the root and merge its subtrees. Various strategies are employed. The analysis of pairing heaps' time complexity was initially inspired by that of splay trees.[1] The amortized time per delete-min is O(log n), and the operations find-min, merge, and insert run in O(1) amortized time.[3] -Determining the precise asymptotic running time of pairing heaps when a decrease-key operation is needed has turned out to be difficult. Initially, the time complexity of this operation was conjectured on empirical grounds to be O(1),[4] but Fredman proved that the amortized time per decrease-key is at least Omega (\log \log n) for some sequences of operations.[5] Using a different amortization argument, Pettie then proved that insert, meld, and decrease-key all run in O(2^{{2{\sqrt {\log \log n}}}}) amortized time, which is o(\log n).[6] Elmasry later introduced a variant of pairing heaps for which decrease-key runs in O(\log \log n) amortized time and with all other operations matching Fibonacci heaps,[7] but no tight Theta (\log \log n) bound is known for the original data structure.[6][3] Moreover, it is an open question whether a o(\log n) amortized time bound for decrease-key and a O(1) amortized time bound for insert can be achieved simultaneously.[8] -Although this is worse than other priority queue algorithms such as Fibonacci heaps, which perform decrease-key in O(1) amortized time, the performance in practice is excellent. Stasko and Vitter,[4] Moret and Shapiro,[9] and Larkin, Sen, and Tarjan[8] conducted experiments on pairing heaps and other heap data structures. They concluded that pairing heaps are often faster in practice than array-based binary heaps and d-ary heaps, and almost always faster in practice than other pointer-based heaps, including data structures like Fibonacci heaps that are theoretically more efficient.
AF-heap
heap data structure. a type of priority queue for integer data, an extension of the fusion tree using an atomic heap proposed by M. L. Fredman and D. E. Willard.[1] Using an AF-heap, it is possible to perform m insert or decrease-key operations and n delete-min operations on machine-integer keys in time O(m + n log n / log log n). This allows Dijkstra's algorithm to be performed in the same O(m + n log n / log log n) time bound on graphs with n edges and m vertices, and leads to a linear time algorithm for minimum spanning trees, with the assumption for both problems that the edge weights of the input graph are machine integers in the transdichotomous model.
Soft heap
heap data structure. a variant on the simple heap data structure that has constant amortized time for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a certain number of values in the heap. The constant time operations are: 1. create(S): Create a new soft heap 2. insert(S, x): Insert an element into a soft heap 3. meld(S, S' ): Combine the contents of two soft heaps into one, destroying both 4. delete(S, x): Delete an element from a soft heap 5. findmin(S): Get the element with minimum key in the soft heap Other heaps such as Fibonacci heaps achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. -The amount of corruption can be controlled by the choice of a parameter ε, but the lower this is set, the more time insertions require (O(log 1/ε) for an error rate of ε). -More precisely, the guarantee offered by the soft heap is the following: for a fixed value ε between 0 and 1/2, at any point in time there will be at most ε*n corrupted keys in the heap, where n is the number of elements inserted so far. Note that this does not guarantee that only a fixed percentage of the keys currently in the heap are corrupted: in an unlucky sequence of insertions and deletions, it can happen that all elements in the heap will have corrupted keys. Similarly, we have no guarantee that in a sequence of elements extracted from the heap with findmin and delete, only a fixed percentage will have corrupted keys: in an unlucky scenario only corrupted elements are extracted from the heap. -The soft heap was designed by Bernard Chazelle in 2000. The term "corruption" in the structure is the result of what Chazelle called "carpooling" in a soft heap. Each node in the soft heap contains a linked-list of keys and one common key. The common key is an upper bound on the values of the keys in the linked-list. Once a key is added to the linked-list, it is considered corrupted because its value is never again relevant in any of the soft heap operations: only the common keys are compared. This is what makes soft heaps "soft"; you can't be sure whether or not any particular value you put into it will be corrupted. The purpose of these corruptions is effectively to lower the information entropy of the data, enabling the data structure to break through information-theoretic barriers regarding heaps.
Weak heap
heap data structure. a variation of the a binary heap data structure. A weak max-heap on a set of n values is defined to be a binary tree with n nodes, one for each value, satisfying the following constraints: 1. The root node has no left child 2. For every node, the value associated with that node is greater or equal to than the values associated with all nodes in its right subtree. 3. The leaves of the tree have heights that are all within one of each other. A weak min-heap is similar, but reverses the required order relationship between the value at each node and in its right subtree. -In a weak max-heap, the maximum value can be found (in constant time) as the value associated with the root node; similarly, in a weak min-heap, the minimum value can be found at the root. As with binary heaps, weak heaps can support the typical operations of a priority queue data structure: insert, delete-min, delete, or decrease-key, in logarithmic time per operation. Variants of the weak heap structure allow constant amortized time insertions and decrease-keys, matching the time for Fibonacci heaps.
Size of tree (code)
int Size(tree) if tree == nullptr return 0 return 1 + Size(tree.left) + Size(tree.right)
Weighted union rule
joins the tree with fewer nodes to the tree with more nodes by making the smaller tree's root point to the root of the bigger tree. (Depth limited to O(log n))
LIFO
last in first out refers to a data structure such as a stack
hash function
maps keys of a given type to integers in a fixed interval [0, N - 1]
map
models a searchable collection of key-value entries
implements
multiple interfaces
Link/cut tree
multiway tree data structure. data structure for representing a forest, a set of rooted trees, and offers the following operations: -Add a tree consisting of a single node to the forest. Given a node in one of the trees, disconnect it (and its subtree) from the tree of which it is part. Attach a node to another node as its child. Given a node, find the root of the tree to which it belongs. By doing this operation on two distinct nodes, one can check whether they belong to the same tree. The represented forest may consist of very deep trees, so if we represent the forest as a plain collection of parent pointer trees, it might take us a long time to find the root of a given node. However, if we represent each tree in the forest as a link/cut tree, we can find which tree an element belongs to in O(log(n)) amortized time. Moreover, we can quickly adjust the collection of link/cut trees to changes in the represented forest. In particular, we can adjust it to merge (link) and split (cut) in O(log(n)) amortized time. -Link/cut trees divide each tree in the represented forest into vertex-disjoint paths, where each path is represented by an auxiliary tree (often splay trees, though the original paper predates splay trees and thus uses biased binary search trees). The nodes in the auxiliary trees are keyed by their depth in the corresponding represented tree. In one variation, Naive Partitioning, the paths are determined by the most recently accessed paths and nodes, similar to Tango Trees. In Partitioning by Size paths are determined by the heaviest child (child with the most children) of the given node. This gives a more complicated structure, but reduces the cost of the operations from amortized O(log n) to worst case O(log n). It has uses in solving a variety of network flow problems and to jive data sets. -In the original publication, Sleator and Tarjan referred to link/cut trees as "dynamic trees", or "dynamic dyno trees".
Enfilade (Xanadu)
multiway tree data structure. a class of tree data structures used in Project Xanadu "Green" designs of the 1970s and 1980s. Enfilades allow quick editing, versioning, retrieval and inter-comparison operations in a large, cross-linked hypertext database. The Xanadu "Gold" design starting in the 1990s used a related data structure called the Ent. -Although the principles of enfilades can be applied to any tree data structure, the particular structure used in the Xanadu system was much like a B-Tree. What distinguishes enfilades is the use of dsps and wids in the indexing information within tree nodes. -Dsps are displacements, offsets or relative keys. A dsp is the difference in key between a containing node and that of a subtree or leaf. For instance, the leaf for a grid square in a map might have a certain longitude and latitude offset relative to the larger grid represented by the subtree the leaf is part of. The key of any leaf of an enfilade is found by combining all the dsps on the path down the tree to that leaf. Dsps can also be used for other context information that is imposed top-down on entire subtrees or ranges of content at once. -Wids are widths, ranges, or bounding boxes. A wid is relative to the key of a subtree or leaf, but specifies a range of addresses covering all items within the subtree. Wids identify the interesting parts of sparsely populated address spaces. In some enfilades, the wids of subtrees under a given node can overlap, and in any case, a search for data within a range of addresses must visit any subtrees whose wids intersect the search range. Wids are combined from the leaves of the tree, upward through all layers to the root (although they are maintained incrementally). Wids can also contain other summaries such as totals or maxima of data. -The relative nature of wids and dsps allows subtrees to be rearranged within an enfilade. By changing the dsp at the top of a subtree, the keys of all the data underneath are implicitly changed. Edit operations in enfilades are performed by "cutting," or splitting the tree down relevant access paths, inserting, deleting or rearranging subtrees, and splicing the pieces back together. The cost of cutting and splicing operations is generally log-like in 1-D trees and between log-like and square-root-like in 2-D trees. -Subtrees can also be shared between trees, or be linked from multiple places within a tree. This makes the enfilade a fully persistent data structure with virtual copying and versioning of content. Each use of a subtree inherits a different context from the chain of dsps down to it. Changes to a copy create new nodes only along the cut paths, and leave the entire original in place. The overhead for a version is very small, a new version's tree is balanced and fast, and its storage cost is related only to changes from the original. -One-dimensional enfilades are intermediate between arrays' direct addressability and linked lists' ease of insertion, deletion and rearrangement. Multidimensional enfilades resemble loose, rearrangeable, versionable Quad trees, Oct trees or k-d trees.
Disjoint-set data structure (also called a union-find data structure or merge-find set)
multiway tree data structure. a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two useful operations: Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its "representative"; by comparing the result of two Find operations, one can determine whether two elements are in the same subset. Union: Join two subsets into a single subset. The other important operation, MakeSet, which makes a set containing only a given element (a singleton), is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section). In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.
And-or tree
multiway tree data structure. a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals).
(a,b)-tree
multiway tree data structure. a kind of balanced search tree. An (a,b)-tree has all of its leaves at the same depth, and all internal nodes except for the root have between a and b children, where a and b are integers such that 2 ≤ a ≤ (b+1)/2. The root has, if it is not a leaf, between 2 and b children.
Ternary tree
multiway tree data structure. a tree data structure in which each node has at most three child nodes, usually distinguished as "left", "mid" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left, mid or right child. -Ternary trees are used to implement Ternary search trees and Ternary heaps.
SPQR-tree
multiway tree data structure. a tree data structure used in computer science, and more specifically graph algorithms, to represent the triconnected components of a graph. The SPQR tree of a graph may be constructed in linear time[1] and has several applications in dynamic graph algorithms and graph drawing. -The basic structures underlying the SPQR tree, the triconnected components of a graph, and the connection between this decomposition and the planar embeddings of a planar graph, were first investigated by Saunders Mac Lane (1937); these structures were used in efficient algorithms by several other researchers[2] prior to their formalization as the SPQR tree by Di Battista and Tamassia (1989, 1990, 1996). |Structure| An SPQR tree takes the form of an unrooted tree in which for each node x there is associated an undirected graph or multigraph Gx. The node, and the graph associated with it, may have one of four types, given the initials SPQR: 1.In an S node, the associated graph is a cycle graph with three or more vertices and edges. This case is analogous to series composition in series-parallel graphs; the S stands for "series".[3] 2.In a P node, the associated graph is a dipole graph, a multigraph with two vertices and three or more edges, the planar dual to a cycle graph. This case is analogous to parallel composition in series-parallel graphs; the P stands for "parallel".[3] 3.In a Q node, the associated graph has a single real edge. This trivial case is necessary to handle the graph that has only one edge. In some works on SPQR trees, this type of node does not appear in the SPQR trees of graphs with more than one edge; in other works, all non-virtual edges are required to be represented by Q nodes with one real and one virtual edge, and the edges in the other node types must all be virtual. 4.In an R node, the associated graph is a 3-connected graph that is not a cycle or dipole. The R stands for "rigid": in the application of SPQR trees in planar graph embedding, the associated graph of an R node has a unique planar embedding.[3] -Each edge xy between two nodes of the SPQR tree is associated with two directed virtual edges, one of which is an edge in Gx and the other of which is an edge in Gy. Each edge in a graph Gx may be a virtual edge for at most one SPQR tree edge. -An SPQR tree T represents a 2-connected graph GT, formed as follows. Whenever SPQR tree edge xy associates the virtual edge ab of Gx with the virtual edge cd of Gy, form a single larger graph by merging a and c into a single supervertex, merging b and d into another single supervertex, and deleting the two virtual edges. That is, the larger graph is the 2-clique-sum of Gx and Gy. Performing this gluing step on each edge of the SPQR tree produces the graph GT; the order of performing the gluing steps does not affect the result. Each vertex in one of the graphs Gx may be associated in this way with a unique vertex in GT, the supervertex into which it was merged. -Typically, it is not allowed within an SPQR tree for two S nodes to be adjacent, nor for two P nodes to be adjacent, because if such an adjacency occurred the two nodes could be merged into a single larger node. With this assumption, the SPQR tree is uniquely determined from its graph. When a graph G is represented by an SPQR tree with no adjacent P nodes and no adjacent S nodes, then the graphs Gx associated with the nodes of the SPQR tree are known as the triconnected components of G.
Exponential tree
multiway tree data structure. almost identical to a binary search tree, with the exception that the dimension of the tree is not the same at all levels. In a normal binary search tree, each node has a dimension (d) of 1, and has 2d children. In an exponential tree, the dimension equals the depth of the node, with the root node having a d = 1. So the second level can hold two nodes, the third can hold eight nodes, the fourth 64 nodes, and so on.
Van Emde Boas tree (or Van Emde Boas priority queue, also known as a vEB tree)
multiway tree data structure. is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the maximum number of elements that can be stored in the tree. The M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.
Node height
number of edges on the longest downward path between node and a leaf
Height of binary tree
number of edges on the longest downward path between the root and a leaf log(n) - complete binary tree
Double-precision floating-point
primitive data type. a computer number format that occupies 8 bytes (64 bits) in computer memory and represents a wide, dynamic range of values by using a floating point. Double-precision floating-point format usually refers to binary64, as specified by the IEEE 754 standard, not to the 64-bit decimal format decimal64. In older computers, different floating-point formats of 8 bytes were used, e.g., GW-BASIC's double-precision data type was the 64-bit MBF floating-point format. Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. As with single-precision floating-point format, it lacks precision on integer numbers when compared with an integer format of the same size. It is commonly known simply as double. The IEEE 754 standard specifies a binary64 as having: Sign bit: 1 bit Exponent width: 11 bits Significand precision: 53 bits (52 explicitly stored)
Boolean data type
primitive data type. a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions and change control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type; logic does not always have to be Boolean. In programming languages that have a built-in Boolean data type, such as Pascal and Java, the comparison operators such as > and ≠ are usually defined to return a Boolean value. Conditional and iterative commands may be defined to test Boolean-valued expressions. Languages without an explicit Boolean data type, like C90 and Lisp, may still represent truth values by some other data type. Common Lisp uses an empty list for false, and any other value for true. C uses an integer type, where relational expressions like i > j and logical expressions connected by && and || are defined to have value 1 if true and 0 if false, whereas the test parts of if, while, for, etc., treat any non-zero value as true.[1][2] Indeed, a Boolean variable may be regarded (and be implemented) as a numerical variable with a single binary digit (bit), which can store only two values. It is worth noting that the implementation of Booleans in computers are most likely represented as a full word, rather than a bit; this is usually due to the ways computers transfer blocks of information. Most programming languages, even those that do not have an explicit Boolean type, have support for Boolean algebraic operations such as conjunction (AND, &, *), disjunction (OR, |, +), equivalence (EQV, =, ==), exclusive or/non-equivalence (XOR, NEQV, ^, !=), and negation (NOT, ~, !). In some languages, like Ruby, Smalltalk, and Alice the "true" and "false" values belong to separate classes—i.e. True and False, resp.—so there is no single Boolean "type." In SQL, which uses a three-valued logic for explicit comparisons because of its special treatment of Nulls, the Boolean data type (introduced in SQL:1999) is also defined to include more than two truth values, so that SQL "Booleans" can store all logical values resulting from the evaluation of predicates in SQL. A column of Boolean type can also be restricted to just TRUE and FALSE though.
Enumerated type
primitive data type. a small set of uniquely-named values. a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily. (also called enumeration or enum, or factor in the R programming language, and a categorical variable in statistics)
Character
primitive data type. a unit of information that roughly corresponds to a grapheme (the smallest unit of a writing system of any given language), grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language. Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "-"), and whitespace. The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return or tab, as well as instructions to printers or other devices that display or otherwise process text. Characters are typically combined into strings. Computers and communication equipment represent characters using a character encoding that assigns each character to something — an integer quantity represented by a sequence of digits, typically — that can be stored or transmitted through a network. Two examples of usual encodings are ASCII and the UTF-8 encoding for Unicode. While most character encodings map characters to numbers and/or bit sequences, Morse code instead represents characters using a series of electrical impulses of varying length.
Integer
primitive data type. integral or fixed-precision values. a datum of integral data type, a data type which represents some finite subset of the mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware, including virtual machines, nearly always provide a way to represent a processor register or memory address as an integer. The most common representation of a positive integer is a string of bits, using the binary numeral system. see table in wikipedia.
floating point
primitive data type. single-precision real number values. In mathematics, a real number is a value that represents a quantity along a continuous line (vs imaginary). A number is, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. The term floating point refers to the fact that a number's radix point (decimal point, or, more commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated as the exponent component, and thus the floating-point representation can be thought of as a kind of scientific notation.
1. Write a method that takes an array of keys and an array of values and creates a corresponding Hashmap.
public HashMap makeMap(E[] keys, E[] values) { HashMap map = new HashMap(); for(int i = 0; i < keys.size(); i++) { map.put(keys[i], values[i]); } return map; }
22. A binary tree is considered to be perfect if all the leaves are at the same level and every non-leaf node has exactly two children. An empty tree is considered to be perfect. Given the following data structure called TreeNode, write a recursive algorithm that determines if a tree is perfect. Assume you can use the method created on (20) to obtain the height of a tree. The signature of the method is as follows: public class TreeNode<T> { public <T> content; public TreeNode<T> leftRoot; public TreeNode<T> rightNode; }
public boolean isPerfect(TreeNode node) { if(node == null) return true; else if(height(node.leftRoot) != height(node.rightNode)) return false; else return isPerfect(node.leftRoot) && isPerfect(node.rightNode); }
31. Password checker. Write a program that uses hash tables and reads in a string from the command line and a dictionary of words from standard input, and checks whether it is a "good" password. Here, assume "good" means that it (i) is at least 8 characters long, (ii) is not a word in the dictionary, (iii) is not a word in the dictionary followed by a digit 0-9 (e.g., hello5), (iv) is not two words separated by a digit (e.g., hello2world)
public class HelloWorld{ public static void main( String[] args) { System.out.println("Enter a password. length >=8, doesn't contain a word in the dictionary"); Scanner scan = new Scanner(System.in); String password = scan.next().trim(); //get input if(password.length() < 8) // test char count { System.out.println("bad. too small"); } else { System.out.println("Enter in dictionary of words. Enter 0 when finished"); Hashtable dict = new Hashtable(); String input = ""; while(!input.equals("0")) // get dictionary of words { input = scan.next().trim().toLowerCase(); dict.put(input, input); } String[] test = scan.next().trim().split("[0-9]+"); // "[0-9]+" is a regular expression looks for a sequence of numbers boolean isGood = true; for(int i = 0; i< test.length) && isGood; i++) { System.out.println(test[i]); if(dict.contains(test[i].toLowerCase())) { isGood = false; } } if(isGood) System.out.println("\nGood"); else System.out.println("\nBAD"); } } }
Generalised suffix tree
tree data structure where each tree node compares a bit slice of key values. a suffix tree for a set of strings. Given the set of strings D=S_{1},S_{2},...,S_{d} of total length n, it is a Patricia tree containing all n suffixes of the strings. It is mostly used in bioinformatics.[1]
21. Create a recursive algorithm that given the TreeNode structure shown below, it calculates the height of a tree. The signature of the method is as follows: The TreeNode structure is as follows: public class TreeNode<T> { public <T> content; public TreeNode<T> leftRoot; public TreeNode<T> rightNode; }
public static int height(TreeNode node) { if(node == null) // base case return 0; int leftCount = height(node.leftRoot); // count all left nodes int rightCount = height(node.rightNode); // count all right nodes if(leftCount >= rightCount)) // return the greater of left and right return leftCount + 1; else return rightCount + 1 }
Which deque operation is synonymous with pop()?
removeFront(). It DOES modify the deque, returning the item.
topological sorting applications
scheduling jobs; logic synthesis; order of compilation tasks; data serialization; resolving symbol dependencies in linkers.
B-Tree
self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time
singly linked list
sequence of nodes
VP-tree (vantage-point tree)
space partitioning or binary space partitioning data structure. a BSP tree which segregates data in a metric space by choosing a position in the space (the "vantage point") and dividing the data points into two partitions: those that are nearer to the vantage point than a threshold, and those that are not. By repeatedly applying this procedure to partition the data into smaller and smaller sets, a tree data structure is created where neighbors in the tree are likely to be neighbors in the space.[1] -One of its declination is called the multi-vantage point tree, or MVP tree: an abstract data structure for indexing objects from large metric spaces for similarity search queries. It uses more than one point to partition each level -The way a VP tree stores data can be represented by a circle.[5] First, understand that each node of this tree contains an input point and a radius. All the left children of a given node are the points inside the circle and all the right children of a given node are outside of the circle. The tree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the metric space.[5] Just imagine a circle with a radius. The left children are all located inside the circle and the right children are located outside the circle. |Advantages| 1.Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance.[5] Doing this, avoids pre-processing steps. 2.Updating a VP tree is relatively easy compared to the fast-map approach. For fast maps, after inserting or deleting data, there will come a time when fast-map will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start. 3.Distance based methods are flexible. It is "able to index objects that are represented as feature vectors of a fixed number of dimensions.
UB-tree
space partitioning or binary space partitioning data structure. a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree (information only in the leaves) with records stored according to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys. -Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range searches in multidimensional point data, however, an algorithm must be provided for calculating, from a point encountered in the data base, the next Z-value which is in the multidimensional search range. -The original algorithm to solve this key problem was exponential with the dimensionality and thus not feasible[1] ("GetNextZ-address"). A solution to this "crucial part of the UB-tree range query" linear with the z-address bit length has been described later.[2] This method has already been described in an older paper[3] where using Z-order with search trees has first been proposed. -proposed by Rudolf Bayer and Volker Markl
Relaxed Kd-tree
space partitioning or binary space partitioning data structure. a data structure presented as a variant of the well known K-d trees. As any other variant of K-dimensional trees, a relaxed K-dimensional tree stores a set of n-multidimensional records, each one having a unique K-dimensional key x=(x0,... ,xK−1). Different to K-d trees, in a relaxed K-d tree, the discriminants in each node are arbitrary. Relaxed K-d trees were introduced in 1998
Bin*
space partitioning or binary space partitioning data structure. a data structure which allows efficient region queries. Each time a data point falls into a bin, the frequency of that bin is increased by one.[1] -For example, if there are some axis-aligned rectangles on a 2D plane, answers the question: "Given a query rectangle, what are the rectangles intersecting it?". In the example in the figure, A, B, C, D, E and F are existing rectangles, the query with the rectangle Q should return C, D, E and F, if we define all rectangles as closed intervals. -The data structure partitions a region of the 2D plane into uniform-sized bins. The bounding box of the bins encloses all candidate rectangles to be queried. All the bins are arranged in a 2D array. All the candidates are represented also as 2D arrays. The size of a candidate's array is the number of bins it intersects. For example, in the figure, candidate B has 6 elements arranged in a 3 row by 2 column array because it intersects 6 bins in such an arrangement. Each bin contains the head of a singly linked list. If a candidate intersects a bin, it is chained to the bin's linked list. Each element in a candidate's array is a link node in the corresponding bin's linked list.
Z-order (Morton order, or Morton code)*
space partitioning or binary space partitioning data structure. a function which maps multidimensional data to one dimension while preserving locality of the data points. It was introduced in 1966 by G. M. Morton.[1] The z-value of a point in multidimensions is simply calculated by interleaving (interleave sequence is obtained by merging or shuffling two sequences) the binary representations of its coordinate values. Once the data are sorted into this ordering, any one-dimensional data structure can be used such as binary search trees, B-trees, skip lists or (with low significant bits truncated) hash tables. The resulting ordering can equivalently be described as the order one would get from a depth-first traversal of a quadtree. -The figure below shows the Z-values for the two dimensional case with integer coordinates 0 ≤ x ≤ 7, 0 ≤ y ≤ 7 (shown both in decimal and binary). Interleaving the binary coordinate values yields binary z-values as shown. Connecting the z-values in their numerical order produces the recursively Z-shaped curve. Two-dimensional Z-values are also called as quadkey ones. |applications| -linear algebra -texture mapping (Some GPUs store texture maps in Z-order to increase spatial locality of reference during texture mapped rasterization)
Implicit kd-tree
space partitioning or binary space partitioning data structure. a k-d tree defined by an implicit splitting function rather than an explicitly-stored set of splits -a k-d tree defined implicitly above a rectilinear grid. Its split planes' positions and orientations are not given explicitly but implicitly by some recursive splitting-function defined on the hyperrectangles belonging to the tree's nodes. Each inner node's split plane is positioned on a grid plane of the underlying grid, partitioning the node's grid into two subgrids. -Implicit max-k-d trees are used for ray casting isosurfaces/MIP (maximum intensity projection). The attribute assigned to each inner node is the maximal scalar value given in the subgrid belonging to the node. Nodes are not traversed if their scalar values are smaller than the searched iso-value/current maximum intensity along the ray. The low storage requirements of the implicit max kd-tree and the favorable visualization complexity of ray casting allow to ray cast (and even change the isosurface for) very large scalar fields at interactive framerates on commodity PCs. Similarly an implicit min/max kd-tree may be used to efficiently evaluate queries such as terrain line of sight
Min/max kd-tree
space partitioning or binary space partitioning data structure. a k-d tree that associates a minimum and maximum value with each of its nodes -a k-d tree with two scalar values - a minimum and a maximum - assigned to its nodes. The minimum/maximum of an inner node is equal to the minimum/maximum of its children's minima/maxima. |Construction| Min/max kd-trees may be constructed recursively. Starting with the root node, the splitting plane orientation and position is evaluated. Then the children's splitting planes and min/max values are evaluated recursively. The min/max value of the current node is simply the minimum/maximum of its children's minima/maxima. |Properties| The min/max kdtree has - besides the properties of an kd-tree - the special property that an inner node's min/max values coincide each with a min/max value of either one child. This allows to discard the storage of min/max values at the leaf nodes by storing two bits at inner nodes, assigning min/max values to the children: Each inner node's min/max values will be known in advance, where the root node's min/max values are stored separately. Each inner node has besides two min/max values also two bits given, defining to which child those min/max values are assigned (0: to the left child 1: to the right child). The non-assigned min/max values of the children are the from the current node already known min/max values. The two bits may also be stored in the least significant bits of the min/max values which have therefore to be approximated by fractioning them down/up. -The resulting memory reduction is not minor, as the leaf nodes of full binary kd-trees are one half of the tree's nodes.
R+ tree
space partitioning or binary space partitioning data structure. a method for looking up data using a location, often (x, y) coordinates, and often for locations on the surface of the earth. Searching on one number is a solved problem; searching on two or more, and asking for locations that are nearby in both x and y directions, requires craftier algorithms. -Fundamentally, an R+ tree is a tree data structure, a variant of the R tree, used for indexing spatial information. |Difference between R+ trees and R trees| -R+ trees are a compromise between R-trees and kd-trees: they avoid overlapping of internal nodes by inserting an object into multiple leaves if necessary. Coverage is the entire area to cover all related rectangles. Overlap is the entire area which is contained in two or more nodes.[1] Minimal coverage reduces the amount of "dead space" (empty area) which is covered by the nodes of the R-tree. Minimal overlap reduces the set of search paths to the leaves (even more critical for the access time than minimal coverage). Efficient search requires minimal coverage and overlap. R+ trees differ from R trees in that: nodes are not guaranteed to be at least half filled, the entries of any internal node do not overlap, and an object ID may be stored in more than one leaf node. |Advantages| Because nodes are not overlapped with each other, point query performance benefits since all spatial regions are covered by at most one node. A single path is followed and fewer nodes are visited than with the R-tree |Disadvantages| Since rectangles are duplicated, an R+ tree can be larger than an R tree built on same data set. Construction and maintenance of R+ trees is more complex than the construction and maintenance of R trees and other variants of the R tree.
BK-tree
space partitioning or binary space partitioning data structure. a metric tree suggested by Walter Austin Burkhard and Robert M. Keller[1] specifically adapted to discrete metric spaces. For simplicity, let us consider integer discrete metric d(x,y). Then, BK-tree is defined in the following way. An arbitrary element a is selected as root node. The root node may have zero or more subtrees. The k-th subtree is recursively built of all elements b such that d(a,b)=k. BK-trees can be used for approximate string matching in a dictionary
Bounding interval hierarchy (BIH)
space partitioning or binary space partitioning data structure. a partitioning data structure similar to that of bounding volume hierarchies or kd-trees. Bounding interval hierarchies can be used in high performance (or real-time) ray tracing and may be especially useful for dynamic scenes. -The BIH was first presented under the name of SKD-Trees,[1] presented by Ooi et al., and BoxTrees,[2] independently invented by Zachmann. Bounding interval hierarchies (BIH) exhibit many of the properties of both bounding volume hierarchies (BVH) and kd-trees. Whereas the construction and storage of BIH is comparable to that of BVH, the traversal of BIH resemble that of kd-trees. Furthermore, BIH are also binary trees just like kd-trees (and in fact their superset, BSP trees). Finally, BIH are axis-aligned as are its ancestors. Although a more general non-axis-aligned implementation of the BIH should be possible (similar to the BSP-tree, which uses unaligned planes), it would almost certainly be less desirable due to decreased numerical stability and an increase in the complexity of ray traversal. -The key feature of the BIH is the storage of 2 planes per node (as opposed to 1 for the kd tree and 6 for an axis aligned bounding box hierarchy), which allows for overlapping children (just like a BVH), but at the same time featuring an order on the children along one dimension/axis (as it is the case for kd trees). -It is also possible to just use the BIH data structure for the construction phase but traverse the tree in a way a traditional axis aligned bounding box hierarchy does. This enables some simple speed up optimizations for large ray bundles [3] while keeping memory/cache usage low. -Some general attributes of bounding interval hierarchies (and techniques related to BIH) as described by [4] are: Very fast construction times Low memory footprint Simple and fast traversal Very simple construction and traversal algorithms High numerical precision during construction and traversal Flatter tree structure (decreased tree depth) compared to kd-trees
Kd-tree (short for k-dimensional tree)*
space partitioning or binary space partitioning data structure. a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). k-d trees are a special case of binary space partitioning trees. -The k-d tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points right of the hyperplane are represented by the right subtree. The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree. In such a case, the hyperplane would be set by the x-value of the point, and its normal would be the unit x-axis
Segment tree
space partitioning or binary space partitioning data structure. a tree data structure for storing intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, its structure cannot be modified once it is built. A similar data structure is the interval tree. -A segment tree for a set I of n intervals uses O(n log n) storage and can be built in O(n log n) time. Segment trees support searching for all the intervals that contain a query point in O(log n + k), k being the number of retrieved intervals or segments.[1] -Applications of the segment tree are in the areas of computational geometry, and geographic information systems. -The segment tree can be generalized to higher dimension spaces as well.
Octree*
space partitioning or binary space partitioning data structure. a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional analog of quadtrees. The name is formed from oct + tree, but note that it is normally written "octree" with only one "t". Octrees are often used in 3D graphics and 3D game engines. -Each node in an octree subdivides the space it represents into eight octants. In a point region (PR) octree, the node stores an explicit 3-dimensional point, which is the "center" of the subdivision for that node; the point defines one of the corners for each of the eight children. In a matrix based (MX) octree, the subdivision point is implicitly the center of the space the node represents. The root node of a PR octree can represent infinite space; the root node of an MX octree must represent a finite bounded space so that the implicit centers are well-defined. Note that Octrees are not the same as k-d trees: k-d trees split along a dimension and octrees split around a point. Also k-d trees are always binary, which is not the case for octrees. By using a depth-first search the nodes are to be traversed and only required surfaces are to be viewed.
Quadtree*
space partitioning or binary space partitioning data structure. a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. -All forms of quadtrees share some common features: 1.They decompose space into adaptable cells 2.Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits 3.The tree directory follows the spatial decomposition of the quadtree. -Quadtrees are the two-dimensional analog of octrees. -some common quadtrees: region quadtree, point quadtree, edge quadtree, polygonal map quadtree -some common uses of quadtrees: Image representation Spatial indexing Efficient collision detection in two dimensions View frustum culling of terrain data Storing sparse data, such as a formatting information for a spreadsheet[2] or for some matrix calculations[citation needed] Solution of multidimensional fields (computational fluid dynamics, electromagnetism) Conway's Game of Life simulation program.[3] State estimation[4] Quadtrees are also used in the area of fractal image analysis Maximum disjoint sets
Interval tree
space partitioning or binary space partitioning data structure. a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree. -The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires O(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal; however, we can do better by considering output-sensitive algorithms, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees have a query time of O(log n + m) and an initial creation time of O(n log n), while limiting memory consumption to O(n). After creation, interval trees may be dynamic, allowing efficient insertion and deletion of an interval in O(log n). If the endpoints of intervals are within a small integer range (e.g., in the range [1,...,O(n)]), faster data structures exist[1] with preprocessing time O(n) and query time O(1+m) for reporting m intervals containing a given query point
R* tree*
space partitioning or binary space partitioning data structure. a variant of R-trees used for indexing spatial information. R*-trees have slightly higher construction cost than standard R-trees, as the data may need to be reinserted; but the resulting tree will usually have a better query performance. Like the standard R-tree, it can store both point and spatial data. It was proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990 |Difference between R* Trees and R Trees| Minimization of both coverage and overlap is crucial to the performance of R-trees. Overlap means that, on data query or insertion, more than one branch of the tree needs to be expanded (due to the way data is being split in regions which may overlap). A minimized coverage improves pruning performance, allowing to exclude whole pages from search more often, in particular for negative range queries. The R*-tree attempts to reduce both, using a combination of a revised node split algorithm and the concept of forced reinsertion at node overflow. This is based on the observation that R-tree structures are highly susceptible to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree that may be more appropriate than their original location. -When a node overflows, a portion of its entries are removed from the node and reinserted into the tree. (In order to avoid an indefinite cascade of reinsertions caused by subsequent node overflow, the reinsertion routine may be called only once in each level of the tree when inserting any one new entry.) This has the effect of producing more well-clustered groups of entries in nodes, reducing node coverage. Furthermore, actual node splits are often postponed, causing average node occupancy to rise. Re-insertion can be seen as a method of incremental tree optimization triggered on node overflow. |Performance| -Improved split heuristic produces pages that are more rectangular and thus better for many applications. -Reinsertion method optimizes the existing tree, but increases complexity. -Efficiently supports point and spatial data at the same time.
Hilbert R-tree*
space partitioning or binary space partitioning data structure. an R-tree variant, is an index for multidimensional objects like lines, regions, 3-D objects, or high-dimensional feature-based parametric objects. It can be thought of as an extension to B+-tree for multidimensional objects. -The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node. Hilbert R-trees use space-filling curves, and specifically the Hilbert curve, to impose a linear ordering on the data rectangles. |Hilbert curve or Hilbert space-filling curve| a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891,[1] as a variant of the space-filling Peano curves discovered by Giuseppe Peano in 1890.[2] -Because it is space-filling, its Hausdorff dimension is 2 (precisely, its image is the unit square, whose dimension is 2 in any definition of dimension; its graph is a compact set homeomorphic to the closed unit interval, with Hausdorff dimension 2). -H_{n} is the nth approximation to the limiting curve. The Euclidean length of H_{n} is 2^{n} - (1/(2^{n})), i.e., it grows exponentially with n, while at the same time always being bounded by a square with a finite area. |Back to Hilbert R-Trees| -There are two types of Hilbert R-trees, one for static databases, and one for dynamic databases. In both cases Hilbert space-filling curves are used to achieve better ordering of multidimensional objects in the node. This ordering has to be 'good', in the sense that it should group 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Packed Hilbert R-trees are suitable for static databases in which updates are very rare or in which there are no updates at all. -The dynamic Hilbert R-tree is suitable for dynamic databases where insertions, deletions, or updates may occur in real time. Moreover, dynamic Hilbert R-trees employ flexible deferred splitting mechanism to increase the space utilization. Every node has a well defined set of sibling nodes. By adjusting the split policy the Hilbert R-tree can achieve a degree of space utilization as high as is desired. This is done by proposing an ordering on the R-tree nodes. The Hilbert R-tree sorts rectangles according to the Hilbert value of the center of the rectangles (i.e., MBR). (The Hilbert value of a point is the length of the Hilbert curve from the origin to the point.) Given the ordering, every node has a well-defined set of sibling nodes; thus, deferred splitting can be used. By adjusting the split policy, the Hilbert R-tree can achieve as high utilization as desired. To the contrary, other R-tree variants have no control over the space utilization.
Rapidly exploring random tree (RRT)
space partitioning or binary space partitioning data structure. an algorithm designed to efficiently search nonconvex, high-dimensional spaces by randomly building a space-filling tree. The tree is constructed incrementally from samples drawn randomly from the search space and is inherently biased to grow towards large unsearched areas of the problem. RRTs were developed by Steven M. LaValle and James J. Kuffner Jr. [1] .[2] They easily handle problems with obstacles and differential constraints (nonholonomic and kinodynamic) and have been widely used in autonomous robotic path planning. -RRTs can be viewed as a technique to generate open-loop trajectories for nonlinear systems with state constraints. An RRT can also be considered as a Monte-Carlo method to bias search into the largest Voronoi regions of a graph in a configuration space. Some variations can even be considered stochastic fractals. -An RRT grows a tree rooted at the starting configuration by using random samples from the search space. As each sample is drawn, a connection is attempted between it and the nearest state in the tree. If the connection is feasible (passes entirely through free space and obeys any constraints), this results in the addition of the new state to the tree. With uniform sampling of the search space, the probability of expanding an existing state is proportional to the size of its Voronoi region. As the largest Voronoi regions belong to the states on the frontier of the search, this means that the tree preferentially expands towards large unsearched areas. -The length of the connection between the tree and a new state is frequently limited by a growth factor. If the random sample is further from its nearest state in the tree than this limit allows, a new state at the maximum distance from the tree along the line to the random sample is used instead of the random sample itself. The random samples can then be viewed as controlling the direction of the tree growth while the growth factor determines its rate. This maintains the space-filling bias of the RRT while limiting the size of the incremental growth. -RRT growth can be biased by increasing the probability of sampling states from a specific area. Most practical implementations of RRTs make use of this to guide the search towards the planning problem goals. This is accomplished by introducing a small probability of sampling the goal to the state sampling procedure. The higher this probability, the more greedily the tree grows towards the goal.
Range tree
space partitioning or binary space partitioning data structure. an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979.[1] Similar data structures were discovered independently by Lueker,[2] Lee and Wong,[3] and Willard.[4] The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of (in Big O notation) O(log^d n + k) but worse storage of O(n log^{d-1} n), where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query. -Bernard Chazelle improved this to query time O(log^{d-1} n + k) and space complexity O(n ((log n)/(log log n))^(d-1)
Metric tree
space partitioning or binary space partitioning data structure. any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-trees, cover trees, MVP Trees, and BK-trees -If there is no structure to the similarity measure then a brute force search requiring the comparison of the query image to every image in the dataset is the best that can be done[citation needed]. If, however, the similarity function satisfies the triangle inequality then it is possible to use the result of each comparison to prune the set of candidates to be examined. |Triangle Inequality| states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side |Back to metric tree| -The first article on metric trees, as well as the first use of the term "metric tree", published in the open literature was by Jeffrey Uhlmann in 1991.[2] Other researchers were working independently on similar data structures. In particular, Peter Yianilos claimed to have independently discovered the same method, which he called a vantage point tree (VP-tree).[3] The research on metric tree data structures blossomed in the late 1990s and included an examination by Google co-founder Sergey Brin of their use for very large databases.[4] The first textbook on metric data structures was published in 2006
M-tree
space partitioning or binary space partitioning data structure. tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this
Mergesort
split into sub-arrays Best: O(n log n) Avg: O(n log n) Worst: O(n log n)
Hash function
takes an object and tells you where to put it.
Key/value pair
the key and its associated data
Null Path length
the length of the shortest path from X to a node without two children
percolate Up
the new element is percolated up the heap until correct location is found
What is the problem size in algorithm analysis?
the number of elements, think of as n.
Root
the starting node in a rooted tree structure which all other nodes branch off
applications of graphs
topological sorting; spanning trees; minimum spanning trees; shortest path; circuits;
B-trie
tree data structure where each tree node compares a bit slice of key values. Several trie variants are suitable for maintaining sets of strings in external memory, including suffix trees. A trie/B-tree combination called the B-trie has also been suggested for this task; compared to suffix trees, they are limited in the supported operations but also more compact, while performing update operations faster
Compressed suffix array
tree data structure where each tree node compares a bit slice of key values. a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array.[1][2] These data structures enable quick search for an arbitrary string with a comparatively small index. -Given a text T of n characters from an alphabet Σ, a compressed suffix array supports searching for arbitrary patterns in T. For an input pattern P of m characters, the search time is typically O(m) or O(m + log(n)). The space used is typically O(nH_{k}(T))+o(n), where H_{k}(T) is the k-th order empirical entropy of the text T. The time and space to construct a compressed suffix array are normally O(n). -The original instantiation of the compressed suffix array[1] solved a long-standing open problem by showing that fast pattern matching was possible using only a linear-space data structure, namely, one proportional to the size of the text T, which takes O(n log|Sigma|}) bits. The conventional suffix array and suffix tree use Omega (n log n) bits, which is substantially larger. The basis for the data structure is a recursive decomposition using the "neighbor function," which allows a suffix array to be represented by one of half its length. The construction is repeated multiple times until the resulting suffix array uses a linear number of bits. Following work showed that the actual storage space was related to the zeroth-order entropy and that the index supports self-indexing.[4] The space bound was further improved achieving the ultimate goal of higher-order entropy; the compression is obtained by partitioning the neighbor function by high-order contexts, and compressing each partition with a wavelet tree.[3] The space usage is extremely competitive in practice with other state-of-the-art compressors,[5] and it also supports fast pattern matching. -The memory accesses made by compressed suffix arrays and other compressed data structures for pattern matching are typically not localized, and thus these data structures have been notoriously hard to design efficiently for use in external memory. Recent progress using geometric duality takes advantage of the block access provided by disks to speed up the I/O time significantly[6] In addition, potentially practical search performance for a compressed suffix array in external-memory has been demonstrated.
FM-index
tree data structure where each tree node compares a bit slice of key values. a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,[1] who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space.[2] It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear[clarification needed] with respect to the size of the input data. The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2".[3] A further improvement, the alphabet-friendly FM-index, combines the use of compression boosting and wavelet trees [4] to significantly reduce the space usage for large alphabets. The FM-index has found use in, among other places, bioinformatics.[5]
Suffix tree (also called PAT tree or, in an earlier form, position tree)
tree data structure where each tree node compares a bit slice of key values. a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations. The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provide one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string's suffix tree typically requires significantly more space than storing the string itself.
X-fast trie
tree data structure where each tree node compares a bit slice of key values. a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982,[1] along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.
Y-fast trie
tree data structure where each tree node compares a bit slice of key values. a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982[1] to decrease the O(n log M) space used by an x-fast trie.
Judy array
tree data structure where each tree node compares a bit slice of key values. a data structure that has high performance, low memory usage and implements an associative array. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using a Judy array is its scalability, high performance, memory efficiency and ease of use.[1] -Judy arrays are both speed- and memory-efficient,[clarification needed] and therefore they can sometimes replace common in-memory dictionary implementations (like red-black trees or hash tables). -Roughly speaking, Judy arrays are highly optimized 256-ary radix trees.[2] Judy arrays use over 20 different compression techniques on trie nodes to reduce memory usage. -The Judy array was invented by Douglas Baskins and named after his sister.
Radix tree (also radix trie or compact prefix tree)
tree data structure where each tree node compares a bit slice of key values. a data structure that represents a space-optimized trie in which each node that is the only child is merged with its parent. The result is that the number of children of every internal node is at least the radix r of the radix tree, where r is a positive integer and a power x of 2, having x ≥ 1. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes. -Unlike regular trees (where whole keys are compared en masse from their beginning up to the point of inequality), the key at each node is compared chunk-of-bits by chunk-of-bits, where the quantity of bits in that chunk at that node is the radix r of the radix trie. When the r is 2, the radix trie is binary (i.e., compare that node's 1-bit portion of the key), which minimizes sparseness at the expense of maximizing trie depth—i.e., maximizing up to conflation of nondiverging bit-strings in the key. When r is an integer power of 2 greater or equal to 4, then the radix trie is an r-ary trie, which lessens the depth of the radix trie at the expense of potential sparseness. -As an optimization, edge labels can be stored in constant size by using two pointers to a string (for the first and last elements).[1] -Note that although the examples in this article show strings as sequences of characters, the type of the string elements can be chosen arbitrarily; for example, as a bit or byte of the string representation when using multibyte character encodings or Unicode.
Right propagate the rightmost set bit in x
x | (x & ~(x - 1) - 1)
Merkle Tree (hash tree)
tree data structure where each tree node compares a bit slice of key values. a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains. -Demonstrating that a leaf node is a part of the given hash tree requires processing an amount of data proportional to the logarithm of the number of nodes of the tree;[1] this contrasts with hash lists, where the amount is proportional to the number of nodes. -The concept of hash trees is named after Ralph Merkle who patented it in 1979
B-trees*
tree data structure. a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. B-trees are a good example of a data structure for external memory. It is commonly used in databases and filesystems.
Heaps
tree data structure. a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key (the value) of node A is ordered with respect to the key of node B with the same ordering applying across the heap. A heap can be classified further as either a "max heap" or a "min heap". In a max heap, the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node. In a min heap, the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node. Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heapsort. A common implementation of a heap is the binary heap, in which the tree is a complete binary tree (see figure). -In a heap, the highest (or lowest) priority element is always stored at the root. A heap is not a sorted structure and can be regarded as partially ordered. As visible from the heap-diagram, there is no particular relationship among nodes on any given level, even among the siblings. When a heap is a complete binary tree, it has a smallest possible height—a heap with N nodes always has log N height. A heap is a useful data structure when you need to remove the object with the highest (or lowest) priority. -Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their parents, grandparents, etc. The maximum number of children each node can have depends on the type of heap, but in many types it is at most two, which is known as a binary heap. -The heap is one maximally efficient IMPLEMENTATION of an abstract data type called a priority queue, and in fact priority queues are often referred to as "heaps", regardless of how they may be implemented. -A heap data structure should not be confused with the heap which is a common name for the pool of memory from which dynamically allocated memory is allocated. The term was originally used only for the data structure. -Types Heap Binary heap Weak heap Binomial heap Fibonacci heap AF-heap Leonardo Heap 2-3 heap Soft heap Pairing heap Leftist heap Treap Beap Skew heap Ternary heap D-ary heap Brodal queue
adjacent nodes
two nodes connected by an edge
hash tables
used to implement a map
Shellsort/ Diminishing increment sort
uses an increment sequence to determine which "Shell" will be sorted
Literal
value written into program's code
What kind of Collection is Hashing?
value-orientated.
fractional cascading
variation in binary search. a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986 (Chazelle & Guibas 1986a; Chazelle & Guibas 1986b), combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events. -In general, fractional cascading begins with a catalog graph, a directed graph in which each vertex is labeled with an ordered list. A query in this data structure consists of a path in the graph and a query value q; the data structure must determine the position of q in each of the ordered lists associated with the vertices of the path. For the simple example above, the catalog graph is itself a path, with just four nodes. It is possible for later vertices in the path to be determined dynamically as part of a query, in response to the results found by the searches in earlier parts of the path. -To handle queries of this type, for a graph in which each vertex has at most d incoming and at most d outgoing edges for some constant d, the lists associated with each vertex are augmented by a fraction of the items from each outgoing neighbor of the vertex; the fraction must be chosen to be smaller than 1/d, so that the total amount by which all lists are augmented remains linear in the input size. Each item in each augmented list stores with it the position of that item in the unaugmented list stored at the same vertex, and in each of the outgoing neighboring lists. In the simple example above, d = 1, and we augmented each list with a 1/2 fraction of the neighboring items. -A query in this data structure consists of a standard binary search in the augmented list associated with the first vertex of the query path, together with simpler searches at each successive vertex of the path. If a 1/r fraction of items are used to augment the lists from each neighboring item, then each successive query result may be found within at most r steps of the position stored at the query result from the previous path vertex, and therefore may be found in constant time without having to perform a full binary search.
Collision
when a hashing algorithm produces the same index for two or more keys
Cycle
when it is possible to return to a given vertex after visiting other vertices
static binding
without virtual member functions C++ uses static binding which happens on compile time.