CS400 exam one
treap
A random priority is assigned to every key and must maintain two properties: -They are in order with respect to their keys, as in a typical binary search tree -They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor O(log N) expected time for all operations, O(N) worst case.
Trie
nil leaves imply completed word/phrase when traversing, store place in tree so that as you build words you don't have to search from scratch
time complexity of insert/delete/lookup for balanced tree vs. unbalanced tree
o(n) if unbalanced vs. o(logn) if balanced
double hashing. general formula? Why do you want a prime (ish) table size?
open addressing, uses 2nd HF to compute step size for probing. This HF is completely independent of the first, and a single key will always yeild the same step size. use this step size. formula: Hk, Hk + ss*1, + ss*2, + ss*3 if the step size is a multiple of the table, you'll jump to the same spots over and over and won't be able to place it!
B - tree: what does order m mean? what does branching factor mean in terms of m? what is the min numChildren for nodes? for root? what is the min numKeys and max numKeys for nodes? for root?
order m = branching factor = max numChildren min children: - leaf = 0 - root = 2 (either side) - internal = ceiling(m/2) min keys: - root = 1 - all others = ceiling(m/2) - 1 max keys: all = m - 1
Quadratic Probing
Checks the square of the nth time it has to check, causes secondary clustering. Not guaranteed to find an open table spot unless table is 1/2 empty. hash, hash + 1, hash + 4, hash + 9, hash + 16, etc.
git diff
Command to compare the files in staging area with the files in working directory checks for conflicts
convert 57 to binary
Divide the number by 2. Get the integer quotient for the next iteration. Get the remainder for the binary digit. Repeat the steps until the quotient is equal to 0. binary is list of remainders in revers (111001)
weighting
Emphasizing some parts of the key over another. ex: p1 * 11^1 , p2*11^2 , p3*11^3 . . . . Then added together in folding
Rehashing (when to do it, how its done, time complexity)
Expanding the table: double table size, find closest prime number (OR! subtract 1 from the doubled number if the prime is p far away because you're dealing with large table numbers) Rehash each element for the new table size. done when load factor is generally over ~.7 time complexity = O(n)
Is is better to show all results (passes and fails), just those that pass, or just those that fail?
For automated testing, we really only need to see results for tests that fail. Of course, you must make sure that other tests are being run, maybe use a test counter or some other way to show that a test has been run.
balanced vs. height balanced
HB = left subtree height - right subtree height is -1, 0 or 1 B = can't tell from picture. height of tree as it grows is bounded by O(logN). Snapshot of the tree in time will not tell you if true, must know with certainty how it will grow.
Height balanced and/or balanced: RBT, AVL
HB: AVL B: AVL and RBT if maintains HB, will be B
left rotate algorithm
store current node and parent node (x is passed in as current node), cut relevant parent/child ties reassign parent and child ties return new root of this situation (z) z x \ / x ---> z
Unit Testing
test individual units or pieces of code for a system ex. method, functionality within that method unit = data structure unit test = one of many tests of its functionality
load factor
the fraction of the table's capacity that is filled ~.7 ish is the general limit
what will java return if you call hashCode() on an int?
the int itself
git checkout -- <filename>
unmodifies a modified file, sets to last HEAD commit
end to end test
used to test whether the flow of an application right from start to finish is behaving as expected (problem: hard to execute all possible code paths)
trivial hash function (use case)
using the key itself as the hash code, if data is discrete and spaced out over a reasonable range (ie ints under 100)
full hash table
when LF > LF threshold
does hashcode need to result in an int
yes, that's the whole point, you need it to give you an index
add a type cast (do both generic and specific) to the declaration of a new list
List x = new LinkedList(); - can contain any objects List x = new LinkedList<Integer>(); - contains only integers, makes it easier to detect errors/specificity List x = new LinkedList<K>();
primary clustering
Many elements hashing to the same hash location
number of expected collisions (trying to place a key where one already exists) where N is number of keys and M is number of indexes to place them at
N(N-1)/2M
complexity of BST print, lookup, insert, delete
O(H) for all and want O(logN)
complexity of lookup, insert, and delete in a B tree
O(logN) with a base of b where b is the branching factor and N is the # of nodes
complexity of insert and delete in a red black tree
O(logbase2N)
cp -r
Recursively copy directories
linear probing, general implementation
Step size is 1. Find the index, and keep incrementing by one until you find a free space. - tends toward primary clustering, but will always find a spot
Black box test
Tester has no prior knowledge of network infrastructure testing expected results with actual hard to know where problems originate, need many unit tests bennies: anyone can implement, just need to know interface
BST delete and time complexity
delete(key) node = lookup(key) if (isLeaf) >> just delete if (hasOneChild) >> replace with child if (hasTwoKids) >> replace with in order successor, delete in order successor worst o(n)
techniques for generating hashchode
extraction (break up into parts) weighting (weigh some parts to be more important) folding (combine weighted vals back into an int)
cp filename filename cp -r Src_file1 Src_file2 Src_file3 Dest_directory cp -r directory1 directory2
filename filename: copies first's contents into second, overwriting it. if 2nd one doesn't exist, creates it multiple src files then directory: copies all files to directory. must end with directory name if multiple files are to be copied directory directory: if 2 doesn't exist, creates it and copies 1 into it. if 2 does exist, 1 becomes a subdirectory of 2
repository (repo)
files, tracking data, configurations etc. that is being tracked
Stages of Team Development
forming (getting to know, polite, strong leader needed) storming (conflict) cycle between these norming (resolve, bond) performing (hard work, no friction)
checkin out
get earlier version of files from you repo to your local working directory (git checkout abdf)
how to propose changes in git
git add <filename>
create a working copy of a local repository
git clone /path/to/repository
commit changes to the head
git commit -m "Commit message"
how do you "commit" ? what does this mean, what step in the process is it?
git commit -m "Commit message" this DOES NOT put it in your remote repo, but it does commit it to the HEAD
create a new repository
git init
to study repository history
git log
send changes from HEAD your remote repository
git push origin master where "master" is whatever branch you're pushing to
displays the state of the working directory and the staging area
git status
two steps to go from key to hash index
hashcode() generates a number, and that number is modded by the table size to get the index. hash.index = hash_code % TS
hashtable delete(k key)
hashtable[hash(key)] = null; (assuming no collisions)
hashtable insert(k key, d data) {
hashtable{hash(key)} = data;
points to the last commit you made (current node reference)
head
calculate balance factor code. what is the balance factor of a node (pos neg)
height of left - height of right
height of subtree code
if (node == NULL) return 0; /* compute the depth of each subtree */ int lDepth = maxDepth(node.left); int rDepth = maxDepth(node.right); /* use the larger one */ return max(ldepth, rdepth) +1
branching factor
in a search tree, the number of children of a given node. Often, the branching factors of individual nodes will vary, so an average value may be used. To guarantee a branching factor of 2 to 4, each internal node must store 1 to 3 keys.
where and why does .hashCode() use and XOR
in dealing w/ doubles. as far as I know, it puts it into binary, splits the bits up, shifts them on top of each other and does an XOR to determine the hashcode. if you fill out an XOR truth table, you'll find that it'd be true 50% of the time, whereas ANDOR is 3/4 true, and both is 1/4 true.
BST Insert and time complexity
insert(parent, node, key) if (node = null){ if (key<parent) key = parent.left else parent.right; return;} if (key<node) return insert(node, node.left, key) if (key>node) return insert(node, node.right, key) best o(1) worst o(n)
how to compile in linux with the arguments "10 1 2"
javac *.java java MyProgram 10 1 2
redirecting output to a file once compiling
javac *.java java TestPQ java TestPQ PQ01 PQ02 PQ03 MyPQ > results.txt
List<String> ls = new ArrayList<String>(); // 1 List<Object> lo = ls; // 2 lo.add(new Object()); // 3 String s = ls.get(0); // 4 Why won't this work?
line 4 attempts to assign object to a string. should just use generics In general, if Foo is a subtype (subclass or subinterface) of Bar, and G is some generic type declaration, it is not the case that G<Foo> is a subtype of G<Bar>. Instead, the supertype of all Collections<type> is Collection<?>
ls
list files in current directory
list files in current directory - command line
ls
the default branch when you create a repository
master
B+ Tree
1) Maintain a copy of all keys in the leaves of the tree. 2) Create a linked-list out of the leaf nodes of the tree. 3) all data actually stored in leaf nodes. internal nodes simply act as a road map to get near to desired value 4) advantageous for range queries 5) insertion and deletion behaves similarly to b-tree
in order traversal method and what it does
// print in ascending order! inOrder(node node) if (n!= null) print(inOrder(n.left)) print(node) print(inOrder(n.right))
pre order traversal method and what it does
// print self, print immediate left child, all the way down, print right children back up preOrder(Node n) if(n != null) print(n) print(preOrder(n.left)) print(preOrder(n.right))
perfect hashing (use case/when is it possible)
-zero collisions -best when few inserts and deletes, static data like a dictionary -constant search time as worse case O(1) because HF returns correct HI every time
post order traversal recursive method and what it will print
// far left leaf node, its sibling, then its parent. It then repeats this pattern on its parent.s sibling, always visiting the root of things last void printPostorder(Node node) { if (node == null) return; printPostorder(node.left); printPostorder(node.right); print(node) }
".." vs "."
"cd ../folder/folder/folder" lets you navigate here without knowing the whole path "cd ..\" goes up one level in the directory basically .. represents parent directory, . represents cwd
Version
(aka revision, or COMMIT) - name (or number) for a given copy
local repository
- Stored on local computer.
properties of a good hash function
- deterministic (if you put the key in the HF, it returns the same thing every time ie not dependent on date/time/random) - SHOULD achieve relatively uniform distribution (clusters lead to worse clusters) - SHOULD minimize collisions (mostly mapping uniquely) - ALSO a problem if all values seem to be entered equidistant (clustering) - SHOULD be fast and easy for a COMPUTER to compute
max/min heap insert and delete
-binary trees, no bearing on left or right -min: child always greater than parent, min at top -max: child always less than parent, max at top -insert: always add at bottom level, from left to right. say its a max heap. if you add a key that is greater than its parent, swap with parent all the way up until its not anymore
cp vs. scp vs. pscp
-cp: - used to copy file, files, directories to a new location on disk with different name. generally its "cp source destination" -scp: windows and linux allow copying to remote directory without establishing connection. "scp sourcefile remotedestination", can also use this to copy on one local machine (you can use it on ya own computer) basically functions same as cp -pscp: putty's scp command, allows windows users to do the above without launching putty
RBT insert
1. If empty, new root set black 2. Create node, color red a. If parent is black, done b. If parent is red, check uncle color i. If uncle is black/null 1. Straight line (left left or right right) a. Rotate to fix straight line, switch colors of parent and gpa 2. Triangle (left right or right left) a. Do according rotations and recolor like normal, swap parent and gpa ii. If uncle is red 1. Push black down from grandparent a. Gpa is red, unc and parent black 2. If grandparent is root node, make it black and exit 3. Go up to gpa and great gpa and so on, making fixes if needed 3. Always move back up the line and check these properties as you back up
B Tree Delete
1. leaf node - steal from sibling - merge with sibling and parent if none can be spared (move all the way up to root) 2. interior - steal in order successor or in order predecessor - if not possible, delete and combine children - if ever something is not possible, combine with sibling and parent 1. steal in order successor or predecessor (if children) 2. delete, combine children 3. steal from sibling 4. combine with sibling and parent
three steps to rehashing
1. new table with double table size to the "nearest" prime, may be far away, tablesize * 2 - 1 is probably good enough 2. rehash all keys into table 3. reassign ht pointer to this new table
what are the three cases of BST delete when you've found the key to delete
1. no children, just delete 2. one child, replace with child 3. two children, replace with in order successor and delete in order successor
AVL insert algorithm
1. recursive BST insert 2. check balance factor, do rotation ( ll, rl, lr, OR rr ) 3. exit method (all ancestors will be recursively checked because of recursive BST insert)
111001 convert to binary
1⋅2^5 + 1⋅2^4 + 1⋅2^3 + 0⋅2^2 + 0⋅2^1 + 1⋅2^0 = 57
when weighing characters in string for HF, what is the suggested number to weigh them by?
31, i.e. C1 * p^1 + C2 * p^2 + C3 * p^3 . . . .
Integration Testing
After unit testing, integration testing is done to see that the modules communicate the necessary data between and among themselves and that all modules work together smoothly.
generic list that can include any objects of the subtype comparable
LinkedList<K extends Comparable>
three trees of your local repository
Working Directory which holds the actual files. Index which acts as a staging area HEAD which points to the last commit you've made.
ASCII code
a code that defines how keyboard characters are encoded into digital strings of ones and zeros - important for exam: given A you should be able to say what X is (add number of letters between, goes up in order)
code coverage
a measure of how many parts of a program have been tested. hard to create tests and run the program so that all paths execute
complete (empty tree, one node)
all levels are full except maybe not last level, but all those nodes are pushed to the left
avl vs redblack advantages
avl- many rotations, redblack maximum of two rotations avl better for lookup, RB better for many inserts and deletes
left bias/right bias
b tree, when there is an uneven order, thus each node has an even amount of keys, this decides whether you promote the left of middle or right of middle key when separating
what do you do if an insert to RBT gets you a double red and the uncle is red? if the uncle is black?
black - do avl rotation and recolor parent and granparent, move up tree and check for rules broken red - push blackness down to children, make gpa red. move up tree and check for violations (if root node is gpa, must turn black)
extraction
break up number into pieces which will then be added together via folding
BST - search() and best/worst case time complexity
bst lookup(node, key) { if (node.key = key) return node; if (ley < node.key) return lookup(node.left ,key); if (key > node.key) return lookeup(node.right, key); best = o(1) worst = o(n) (straight line)
cat
can be used for lotta stuff -cat filename shows contents of file -cat filename filename shows contents of both -cat >newFile creates file newFile -can also use extra verbiage to specify how copying and showing contents is done
rm
can delete any file or directory. won't do anything unless you include -d (delete empty directory) or -r (delete directory and all contents)
cd
change directory. "cd ../directory/directory"
> and >> (git)
command > file command = ls -al, cat, tree, etc > writes the command line resulting output to the file, overwriting what is there >> appends the command line output to the file
javac
compiles, which is NECESSARY before running it javac *.java javac *.java *.java *.java
Suggested method for hashing String keys, reasoning
convert all string characters to ascii, Ci C0 * p^i + C1 * p^(i-1) + C2 * p^(i-2). . . . then mod by table size, which should be a prime or prime-ish must weight letters differently, bc CAT would get same as ACT P = 31 generally for letters if you're considering only lowercase, otherwise next prime 53 for upper and lower
check out a repo, what's it mean, how to do it locally and remotely? (github)
create a working copy of local repo git clone /path/to/repository remote: git clone username@host:/path/to/repository
pwd
print working directory, in form of /src/src/fold/dir/
what does "adding to the index" mean? how do you do it? what step in the process is it?
proposing changes, comes first after you edit git add <filename> git add *
update your local repository to the most recent commit
pull
checkin in
put new and changed local working directory files into repo (git commit)
AVL Delete
recursive BST delete, followed by checking node for balance and doing relevant rotation (ll rr lr OR rl)
max height of red black tree
red black tree has a max height of 2 * log(n)
rmrdir
removes directory only if empty, gives error if not empty must use rm on non-empty if you want to delete still
remote repository
repository on a different computer or network
white box testing
require knowledge of the implementation, and access to all fields of the program that are being tested
complexity of rehashing and resizing
resize - o(1) rehash - o(n)
naive expand
resizing and putting elements at their same index. -creates clusters -rehashing leads to diff indexes (% table size)
hashtable lookup(k key)
return hashtable[hash(key)];
commit
saves and names changes as a new "version" in the repository
bit shift in hashing (right vs left) what does it do to a number if you shift left 4 times? why represent numbers this way in hash code?
shift right - slide one place to right, ie divide by 2 and eliminate remainder (1101 -> 101) shift left - slide one place left, doubling the number (1101 -> 11010) shift left 4 times = n * 2^4 = n * 16 easy operations for computer to do quickly. Can make numbers very large and then mod by table size