Data Structures Final Exam
Quadratic Probing
- +-n^2 probing -- +1 from target index -- -1 from target index -- +4 from target index -- -4 from target index -- +9 from target index...
Memory Leak
- A failure in a program to release discarded memory
Perfect Hash
- A hash where 0 collisions occur
Recursion
- A method that calls itself - Base case: recursion to stop - Self-call (maybe indirect) - Production rule: causes us to approach base case
Functional notation
- A more precise measure of the number of steps/operations that a given algorithm requires to do complete
Internal Node
- A node with Children
Minimum Spanning Trees
- A tree that spans a weighted graph such that the sum of the edges in the tree is the smallest possible
N-ary Tree
- A tree where each internal node has at most N children
Parity
- Adding extra bit(s) of data to detect data corruption -- Parity Bit: bit added -- Even Parity: sum of all on bits
Collision Resolution
- Addressing collision by relocating data elsewhere - Simplest resolution is known linear probing, we'll just traverse down the array until we find an available cell, wrapping to the top if needed
Level in tree
- All nodes that are at the same distance from the root - ie. root@level0, ABCD@level1, EFG@level2
Siblings in tree
- All nodes that have same parent
Children in tree
- All nodes that linked to by a given node
2-3 B-Tree
- Allows 2 or 3 children and 1 or 2 datapoint - 1st subtree has values less than D[0] - 2nd subtree has value greater than D[0], but less than D[1] - 3rd subtree has value greater than D[1] 1) Find the correct leaf starting at the root by comparing data elements and digging into the branches 2) Add the value in sorted order to that leaf 3) if there isn't room in the leaf, split it and promote the middle value tree 4) Keep promoting until there is room
Adjacency List
- An array of linked lists - Each list represents a vertex and vertices it connects to
Algorithm Analysis
- Analyze code to see if its speed and/or memory consumption is optimal for a given problem
Memory Compression
- Analyzing the data and re-encoding it with the goal of reducing space utilization
Sorting
- Applying 'order' to the data - ie. ascending order - ie. descending order
Chaining
- Array of link lists
Breadth First Search
- At starting vertex V(0) tries to find V(x) by examining all vertices that are closest to V(0) and then proceeding further away from the start - Use a queue to track unprocessed vertices that we need search still --- Starting with V(0), enqueue all unknown adjacent vertices of that vertex --- Dequeue and repeat
B-Tree
- Balance Search Tree - Built bottom up - Non-Leaf nodes are allowed to have between m & n children - i.e. 2-3 B-Tree
Balanced Trees
- Being balanced can allow for O(logn) operations
O(1)
- Big oh of one - Constant time
Linked Binary Tree
- Binary means there can be at most 2 children, aka 2-ary tree - Left pointer refers to the left subtree, Right pointer refers to right subtree - maximum leaf count = 2 * height
Heap
- Binary tree - Ordered --- Children that contain values greater than or equal to the parent for min heaps, where the reverse called max heaps keep larger values above - Heaps are array based - Grow top down --- Fill the tree level by level as it grows and from left to right
RAID 5
- Block level striping with distributed parity - Minimum 3 disks needed as you'll need a drive or more for parity - Parity allows for one drive to die and for the array to be rebuildable
Tri-Color Garbage Collection
- Break all objects in memory into 3 sets. Black, grey, and white set - Repeatedly process the objects, tracing through to find everything and clean what trash it finds - White set: Potential trash - Black set: Objects that have no reference to the white set - Grey set: reachable from the roots and maybe reference to the white set 1) Load grey set with the roots 2) Load white set with all other roots. White and grey set is not empty
Folding
- Break the sequence into x parts and add them together
Non-tail Recursion
- Call(s) are embedded in the function body
Hashing
- Chope and Mix - Fast data access - Cryptography - Apply a hash algorithm to a key/data produces a hash value
Hamiltonian Graphs
- Class of graphs where path can be created from V(0) to V(k) given that all vertices are encountered one time - If V(0) != V(k), it is hamiltonian trail(path) - If V(0) == V(k), it is hamiltonian cycle(circuit)
Eulerian Graphs
- Class of graphs where there exist one or more paths such that every edge in the graph is used once - Eulerian cycle exists if all vertices have even degrees - Eulerian path exists if all vertices have even except for two
Cycle Graph
- Cn, n=number of vertices, n>=3 - Circle with every vertex connected to two other vertices
Compression Strategies
- Codewords to be as short as possible - We prefer a one-one mapping - No lookahead, prefix principal, a codeword does not exist as the beginning of another codeword - Capitalize on the frequency of symbols
Graphs
- Collection of points, aka vertices and edges which connect a pair of vertices
Array Memory Addressing
- Contiguously allocated (Single Block) - ie. 10 element array of 4 byte elements is 10 * 4 = 40 bytes
Arrays
- Continuous chunk of memory - Random access ---Element 5 is just as accessible as element 20
Deleting from BST where the node has 2 children
- Copy a value close to the deleted value from 1 of the subtrees
Fragmentation
- Data that is broken apart, not continuous - Types: Data, internal, and external
Spatial Locality
- Distance based, things that are close to other resources being used are more likely to be used soon - e.g. Hard Disk Drive
Huffman Codes
- Encoding scheme that uses the frequencies of the symbols to build a customer symbol table - Create a forest of trees for each symbol - Repeatedly collapse the forest by taking the two trees with the lowest frequencies and joining them together, creating a root that holds the sum of the frequencies - After creating tree, label left branches 0, right branches 1, traverse down from root to each leaf(symbol) building the codeword from these 0s and 1s - Encoded size is equal to # bits times symbol frequency, sum that for all symbols. --Decoding: start at root, if we read 0 traverse left, if 1 traverse right, when we hit a leaf, emit symbol and return to the root and resume reading
Depth First Search
- Explores as far as possible along each branch before backtracking - Stack-base
Compaction
- External fragmentation can cause heavy problems with allocations, so we may need compact memory by moving objects around
Dijkstra's Single Point Shorted Path
- Find the least cost path in a weighted graph where the edges have non-negative weights - Greedy algorithm (when given a choice, pick the best choice at that time) - For a given vertex, look at adjacent vertices to identify possible paths or improvements to existing paths - Choose the next step in the shortest path and repeat the previous step
Best Fit Memory Allocation
- Find the smallest free space, large enough for the allocation request
Dijkstra's Minimum Spanning Tree Algorithm
- For each edge, add it to the MST, if a cycle is created, remove the heaviest edge
Dynamic Memory
- Forever
Radix Transformation
- From one base to another - ie. base 10 -> base 16
Building a Sorted Double Linked List
- Generally have two named nodes (Front/End, Head/Tail) 1) Allocate 2) Set Data 3) Set Front 4) Set End 5) Update Head/Tail
Backtracking
- Going back to a previous decision point and making a different choice - i.e. N-Queens - a non-polynomial time problem
Bipartite
- Graph that has two sets of vertices A and B, such that there are only edges between vertices in opposite sets
AVL Tree
- Height Balanced - Binary Search Tree - Allow subtrees to differ in height by no more than 1 - Build it the same way as we build BST - As we add nodes, we'll check the branch we added to and verify we have height balance, if not we'll perform 1 or 2 rotations to restore balance --- max(H(Left), H(Right)) + 1 --- BC: null root => -1
[#]
- Indexing operator - # * sizeof (element) + base address - ie. array[6] is 6 * 4 + 100 = 124
Heap Big-Os
- Insert O(logn) - Building O(nlogn) - Delete O(logn) - Destroy O(nlogn)
Complete Bipartite
- Ka,b - Every vertex in set A connects to every vertex in set B
Big O from functional notation
- Keep only the fastest growing term - Drop coefficient
Complete Graph
- Kn, n=number of vertices, n>=3 - Every vertex is connected to every other vertex
Big Oh Notation
- Least upper bound measure of the time complexity of an algorithm
In-Order Traversal
- Left->mid->right - smallest to largest
Post-Order Traversal
- Left->right->mid
Tree Height
- Length of the longest path in the tree from the root
Stack
- Linear - Two basic operations - Push: add data onto the top of the stack - Pop: remove data from the top of the stack --- O(1)
Queues
- Linear data structure - FIFO (First In First Out) or LILO (Last In Last Out) - Enqueue: add data onto the last data - Dequeue: remove data in the front
O(nlogn)
- Linear log
O(n)
- Linear time - As the problem doubles so does the amount of work
O(logn)
- Logarithmic Time - As the problem doubles, it only takes one more unit of time
Run Length Encoding
- Looks for repeated sequences of chars and replaces that sequence with a number and that char/symbol - i.e. AAABBBFFFF -> 3A2B4F - Compression Ratio: (Len(input) - Len(output))/Len(input)
First Type of Garbage Collection
- Marking / Tracing Garbage Collection - Starting root references, global vars, stack vars, etc. We'll trace through all the references that are reachable from those roots. If we can't reach an allocated object, we'll reclaim it
Base Address
- Memory address of the array which is the same elements memory address
Pre-Order Traversal
- Mid->left->right - Starting from root to left, print all the passing nodes
RAID 1
- Mirroring, no parity, no striping - No gain on write speed, but some gain in read speed
Multiple Linked Structures
- Node based - 2 or more self-referential links
Link Base N-ary Trees
- Node structure that supports a data element and N pointers
Leaf
- Node with no child
Named Nodes
- Nodes that have a pointer(s) assigned to it that are held in the list object
Trees
- Non-linear data structures - Primary node called the root from which we get to all other nodes in the tree
Binary Search
- O(logn) - Divide and conquer algorithm instead of reading the problem by 1 on each iteration, we'll reduce it by a fractional amount - For sorted array
Linear Search
- O(n) - Typically in terms of worst case - For unsorted array
Selection Sort
- O(n^2) - Repeatedly select the extreme value from the unsorted part of the array and add it to the end of the sorted part of the array - Initially, we have an empty sorted part and the rest is unsorted
Bubble Sort
- O(n^2) - Simplest - Repeatedly compare adjacent elements in the array, if out of order, swap those elements
Insertion Sort
- O(n^2) - Take the 1st element of the unsorted part and insert into its sorted position in the sorted part - Initially, start with the first element as the sorted part, all others the unsorted part
Merge Sort
- O(nlogn) - Recursively divide the array into subarrays that empty or have a single element and then merge sorted arrays back together
Quick Sort
- O(nlogn) average but O(n^2) worst case - Split the data into two parts, one part will have values less than or equal to a 'pivot' value, the second part has the other values - Pivot is one of the elements
Heap Property
- Order of data
Binary Search Tree
- Ordered tree - Left subtree contain values smaller than its parent - Right subtree contain values greater than its parent - Duplicates are generally not allowed - Built top down
Locality of Reference
- Pattern that generally occurs in processing data where resources that are used recently or close to other resources that are likely to be used soon - Temporal, Spatial, and Sequential Localities
Minimal Perfect Hash
- Perfect hash with a load factor of 1.00
Linked Base
- Pointer/Reference - One or more that link to another data point/node
Reverse Polish Notation
- Postfix mathematical expression syntax - Uses a stack to hold operands (numbers) - Operators exist in the input AFTER OPERANDS
Balanced Binary Search Tree
- Prune branches and reattach
Array Based Queue structure
- Pushing F(Front) and E(End) towards next array
O(n^2)
- Quadratic
Garbage Collection
- Reclaim dynamically allocated memory
Excessive Recursion
- Redundant recursion calls
Second Type of Garbage Collection
- Reference Counting Garbage Collection table with a list of objects and a count of how many references there are to them
Array Based Binary Search Tree
- Root at index 0 - Left child at 2 * index + 1 - Right child at 2 * index + 2
Array Based N-ary Trees
- Root is at element 0 - The next N elements are the children of the root
Binary Search Tree (Searching Process)
- Searching start at the root - Compare root to search key - Traverse left if key is less than root, right if key is greater, otherwise return that root
Linked Based Queue structure
- Singly Linked - Two named nodes
Star Graph
- Sn, n=number of vertices, n>=4 - Single point connected to all other points
Custom Symbol Tables
- Source Data is Unicode -> saves as ASCII - Read data, for each symbol encountered, add it to custom table
External Fragmentation
- Space that exist between allocated chunks of memory
Adjacency Matrix
- Square matrix of vertices where the elements of the matrix describe the edges - Vertices will be listed in sorted order - i.e. Binary Adjacency Matrix
Linear Data Structures
- Straight, effectively single dimension structure - Commonly implemented with an array
RAID 0
- Striping, no mirroring, no parity - Data is split between drives, writing is faster and reading is faster - Increased risk of data loss
Load Factor
- The number of elements divided by table size - Indicates the probability of a collision on the next insert - i.e. 0.70
Tail Recursion
- The only self-calls are on the last statement of the function
Sequential Locality
- Things are hit in order - eg. Processors order of Operations
Temporal Locality
- Time based, things that were just used are more likely to be used soon than things that have not been accessed in a while - eg. L1, L2, and L3 cache
Collision
- Two different keys yielding the same hash value
Buckets
- Two-dimensional Array - Collisions move to the right and outflow into the neighboring bucket
Floyd's All Pairs Shortest Path
- Use a reference vertex as a jumping point to see shorter paths - Identify the costs but not the actual paths
Double Hash
- Use two hash functions, one for the initial index, a second as an offset for collisions
RAID
- Used for redundancy and/or speed
Worst Fit Memory Allocation
- Utilize the largest free space
Internal Fragmentation
- Wasted space within the data structure itself - i.e. boolean needs 1 bit space but uses 32 bits
First Fit Memory Allocation
- We'll use the 1st free space large enough to service the request
Height Balanced
- Where the difference in the heights of the subtrees is small
Wheel Graph
- Wn, n=number of vertices, n>=4 - Cycle graph + Star graph
Two Dimensional Arrays
- ie. matrix[4][7] is 28 elements - Row Major Order: allocates completely --ie. matrix[1][5] is (1) * 7 * 2 + (5) * 2 + 240 = 264
Simplest Hash Algorithm (Modulo)
- key % table size = hash value - Use that value as an index into the array where we want to store the data
Prim's Minimum Spanning Tree Algorithm
1) Choose a vertex, add that to the MST 2) Look at adjacent vertices of that vertex and track their distances to the tree 3) Choose the closest vertex not in the MST and add it to the tree and repeat from 1 with that vertex
Deleting from heap
1) Return value at the root 2) Prune the right most leaf from the lowest level 3) reinsert data from the root
Pop() Operation
1) Return value from the top of the stack 2) Shift the top over 3) Clean memory
Kruskal's Minimum Spanning Tree Algorithm
1) Sort edges 2) Add the shortest edge to MST, if cycle is created, remove that edge 3) Repeat 2 until |v|-1 edges are successfully added
Indirect Recursion
A() -> B() -> A()