247 exam 2

Ace your homework & exams now with Quizwiz!

multiplicative hashing

-b(c) = the floor of ((C*A )mod 1.0) * m mod 1.0 is the fractional part of x so 47.2465 mod 1.0 is 0.2465 CA mod 1.0 is in [0,1) so b(c) is an integer in [0,m) an index. floor is the next lowest integer so an integer between 0 to 1 gets mapped to 0 to m -initial observations: A should not be too small because then too many hashcodes would be mapped to 0. suggest picking from [0.5,1) -if q= CA mod 1.0 is distributed uniformly in [0,1) then we can use any value for m and still get uniform indices (we could use m=2^v if we want)

merge sort from a cloud

-basically you take down info from arrays X and Y in the cloud down to arrays on the computer A and B. the length of the arrays in the computer are the size of the chunk you are pulling down. -then you run through arrays A and B and put the minimum of the two arrays in a third array in the computer called C. then you increment the counter for the minimum variable. also always increment k (# of elements incrementer) you keep doing this until a multiple of the chunk in one of the arrays and then read down more starting at the incrementer of that array. when both reach the end of the chunk you write C up to the cloud in the Z array. then you keep going until there is nothing left in the cloud arrays (read when imod b = 0 or j mod b = 0)

special rules for master theorem

-case 2 only: this happens when log_b a is equal to the degree of n in f(n). if there is no log then master method does not apply. if there is a log, look at the power. if the power of the log is > -1, then T(n) = theta log _ b a * log^k+1 n where k is the power of the log if the power of the log = -1 then T(n) = theta log_ b a log log n if the power of the log < -1 then T(n) = theta log_ b a

how to build a dictionary

-conceptually just a bag of records -what concerte data structure do we use to implement? must support efficient dynamic add/remove and find

radix sort vs counting sort

-consider n integers, max size m, k options for digit -counting sort: one pass of size (n+m) -radix sort: d passes of size (n+k) -for large n, dn vs. n >> m vs. dk. basically do passes of size ~n rather than just one do radix sort for space: k vs. m buckets per pass -or for non numerical / mixed data: sort stent records by years in school, name, grade, etc.

radix sort cost

-d passes of counting sort -each pass takes time theta (n+k) -hence total time is theta d(n+k)

radix sort

-divide each input integer into d digits -digits many be any base k, we will use base 10 -sort using d successive passes of the counting sort -jth pass uses jth digit of each input as sorting key -key requirements: -sort using successive passes of counting sort -we sort by least significant digit first -sort in each pass must be stable, never inverts order of 2 inputs w same key first look at the least significant digit. put them in buckets 0-9 and then after append them back in order of the buckets. then do the sort again at the next digit, etc. FIFO order

perils of division hashing

-does every choice of M yield SUH like behavior? ex. suppose that m is divisible by a small integer d. claim if j= c mod m then j mod d = c mod d. if a =2 then even hashcodes map to even indices, so natural subsets of all hascodes do not map uniformly across the entire table (not SUH behavior) another bad case is mod 10 because it chops off a lot of bits (leaves you with the last 2) and so the inputs that are the same in the low order but differ in high map to the same index even tho they are much diff

goals for mapping to indices from SUH

-each hashcode should be equally likely to map to any value in [0,m) -mappings for different hashcodes should be independent, hence uncorrelated- knowing the mapping for one should give little or not info about the mapping for another

chaining

-each table cell becomes a bucket that can hold multiple records -a bucket olds a list of all records whose key maps to it -find(k) must traverse bucket h(k)'s list looking for a record with key k -analogous extensions for insert() and remove() -the performance of a hash table is the cost to do find -insert is constant cuz put at that key, delete is constant if handle, otherwise n to traverse thru bucket worse case for find is all in one bucket and treated like an unsorted list so n

lower bound of comparison sorts (concept from decision tree)

-every comparison takes time Ωn (this could be improved! cuz this needs to represent ANY possible comparison sorts) -a correct sorting algorithm must inspect every element of its input array at least once -each comparison inspects only < elements so n/2 comparisons actual lower bound is found from the decision tree... the running time is θ the height of the decision tree or θlog n!

decision trees relation to sorting algorithms

-for any comparison sort, we can construct its decision tree on inputs of size n -just trace what the code does for every possible input of size n -for any decision tree representing a correct comparison sort, we can derive equivalent code -hence a lower bound on running time for all decision trees holds for all comparison sorts and vise versa

master theorem

-if f(n) = O n^(logba -∈) for some const ∈>0 then T(n) =θn^(logb a) -if f(n) =θn^(log b a ) then T(n)=θn^(logba) logn -if f(n) =Ω(n^logba + ε) for some constant ε>0 and if af(n/b) ≥ cf(n) for some constant c<1 and all sufficiently large n, then T(n) = θf(n) f(n) dominates g(n) iff grows polynomially faster than g(n). this is a stronger condition than f(n) = ωg(n) does not apply for the gap between the tree cases (little o and little omega)

why does radix sort work?

-invariant-> after j sorting passes, input is sorted by its ith least significant digit -stability is needed to show that invariant hods for inputs w equal valued jth digits for the proof, you have to say the base case is that after 1 pass through the loop, the 1st least significant digit is sorted. thus the invariant is satisfied because it is sorted by the firt least significant digit. then, given the invariant is true for j = n-1, the radix sort is sorted for the first n-1 terms. On the nth pass, it has already sorted the first n-1 terms and then it will now sort the nth least significant digit. If 2 of the n significant digits are equal, it will still be sorted because the n-1 digits from the inductive hypothesis are sorted. Since the radix sorted up to the nth significant digit the invariant holds. The invariant is true for the whole sort because we proved that the invariant holds for any n so if n =d then this radix sort will hold for the whole sort

why is multiplication a good strategy?

-mapping c-> q = CA mod 1.0 is a diffusing operation -i.e. most significant digits of q depend in a complex way on many digits of c (makes q look uniform, obscures correlations among c's) -hence bin # floor of q * m looks uniform, uncorrelated w c . same is true if we replace digits by bits and work in binary

how do we approach ideal performance for hashing

-must have SUH assumptions -must distribute keys equally, independently across range [0,m] -need to talk about how to design a good hash function h(k) -input kes we see must have average behavior

is every choice of A for multiplication equally good?

-not all A's have equally good diffusion/complexity properties -fractions w few nonzero digits (0.75) or repeating decimals have poor diffusion and or low complexity advice: pick an irrational # btwn 0.5 and 1 ex. root 5 - 1 over 2 (0.618033998...) knuth

how to break the n log n barrier

-our lower bound is for comparison sorts, which work on items from any totally ordered set -to sort faster, we need to be able to inspect input using operations other than comparisons -will limit attention to sorting integers

generic lower bound for comparison sort (can be applied to any)

-suppose every decision tree for a problem of size n has at least t(n) leaves -moreover, the operation labeling each internal node has at most w possible outcomes the problem requires at least logw t(n) operations to solve proof: if the tree starts w the root, every level of the tree increases # of nodes by a factor <= w. need enough levels h subject to w >= t(n). hence h>= log w t(n) w comparison sorting, every node of the tree is a comparison using >. hence w= 2 ( w is max branching factor). and t(n) = n! (t(n) is number of leaves) so in general log w t(n)

advice on division hashing

-table size m should be chosen so that no obvious correlations between hashcode bit pattern and index. index depends on all bits of hashcodes, not just some idea: made m a prime # (no small factors) and avoid choices of m close to powers of 2 or 10

simple uniform hashing

-this is a weaker performance estimate -assume that given a key in U, hash function h is equally likely to map k to each value in [0,m] independent of all other keys (evenly distributed)

knuth's constant

1737350767

outcomes for A[i] < x

2. either yes or no

how many bits to represent colors

256 options so that is 2^8 so 8 bits for each colors * 4 colors = 32 bits every color is realizable, there are about the same number of hashcodes in the same bin for the 32 bins. deviations are small. because one long bin and one smaller bin which is faster in the smaller bin

recurrence for merge sort

2T(n/2) + cn

how to map hashpoints to codes

31*x + y (some points will have the same hashcode though)

add vs. addfirst

If the person accesses the most recently inserted item, addFirst would likely yield better performance because the most recently inserted is at the front of the list (that way we don't have to iterate through the whole list to access it, so it would be constant time to access). If the person wants to access the least recently inserted items, then we would use add because then the least recently inserted item is at the front of the list again so we don't have to iterate through the whole list to access it, so there is constant time to access.

find method table

In the find method, I called the stringToHashcode helper method by sending the record's key. Now the key has been converted to hashcode, so I then can send that hashcode to the toIndex helper method, which will turn this hashcode to an index. Then, I used a for each loop so that I could iterate through the records in the list at that particular index (bucket) in buckets. When I iterated through these records, I checked to see if the parameter key was equal to the key of the record in the list. If it was, then I have found the record with that key, so I returned that record. If I make it through the entire list, that means the record is not currently in the list, so I returned null.

insert for table

In the insert method, I called the find helper method, which will see if the record I am trying to insert is already in the list buckets. If the find method returns null, then the record is not in the list so it needs to be inserted into the the instance variable list buckets. To figure out where to insert the record, I called the stringToHashcode helper method by sending the record's key. Now the key has been converted to hashcode, so I then can send that hashcode to the toIndex helper method, which will turn this hashcode to an index. Then, I could insert the record into the instance variable linked list, buckets, into the linked list at the index (at a bucket). Then, I incremented the instance variable size and returned true because the record was inserted. However, if initially the find method did not return null, then that record already exists in the list. In that case, I just returned false because the record should not be inserted in the list.

table remove method

In the remove method, I called the find helper method, which will see if the record I am trying to remove is actually in the list. If the find method returns a record (it is not null), then I need to remove it from the list. To figure out which the list/index to remove it from, I converted the key from the parameter into a hashcode and then into an index by first calling the stringToHashcode helper method (to convert the key to a hashcode) and then I called then toIndex helper method (to convert the hashcode to an index). Then, I removed the record from the list at that index in buckets. Then I decreased the size by 1.

toindex method

In the toIndex method, a parameter hashcode is sent to the method, and this hashcode needs to be converted to an index in the array. To do this, I used the diffusion method from the slides. I first multiplied the hashcode by an irrational number between 0.5 and 1 (I chose root 5 minus 1 divided by 2). Then I took this number mod 1.0 to get the fractional part of the number. Then I multiplied this by the instance variable nBuckets, or the total number of buckets, to map the number to a certain bucket. I then took the absolute value of this, just in case the hashcode was negative (so that way I stay in bounds in the list), and casted the number to an integer so that I get a integer index for the bucket.

hash function

-a hash function h maps keys k of some type to integers. h(k) in a fixed range [0, N) -the integer h(k) is the key's hashcode under h -if N= U, h could map ever k to a distinct integer giving us a way to index our direct table basically if you have a word you could use some hash function to convert to integers (add the ascii numbers or something) and then take that number in the universe and put into k records and map to an array

limited decisions for sorting

-all the sorting algorithms we listed work on any comparable data type -the only way they inspect the input is by comparing pairs of elements to each other! -can answer "is x>y?" in costant time

running time for a decision tree

-as many comparisons it takes to get from root to leaf -worst case is max depth -hence the running time of a decision tree is its height

counting sorts

-assume our inputs are n integers in range [0, k) -count how often each value occurs in input -write that many values to output basically you have a table with one column with every single possible value listed out and then the second column has the count for each value -the output then goes through each value and writes how many of that value based on the count -this is a linear time sort in n (but it depends on k) cost = theta (n+k) -can extend to sort arbitrary items w integer keys -highly efficient when max value k is small vs. n (less possible values for the item than the number of items) cuz then running time is just n -need a discrete range. you gotta know the # of values and options, but not necesarily have to know what the actual values are until you scan

decision tree

-at each node you choose if element i belongs before j. the left branch is yes and right is no (left is less than or equal, right is greater) -choice of later comparisons depends on the results of the earlier ones -at the leaf you will have your resulting array and all possible orderings should be at the bottom how to read a decision tree: -start at root, do specified operation at each interval node and follow edge based on outcome. leaf reached represents answer

division hashing

-b(c) = c mod m -bucket index = hashcode modulo table size -very easy to imlement (mod in java is %) -result is surely in range [0,m) and non negative

why do we want a constant load factor

It is a constant run time to access each list because we have an index. What matters for the run time is the length of each list because we will be iterating through items in these lists in the find method (which affects insert and remove.) A constant load factor helps the performance of the table because the constant load factor ensures that the length of each list is maintained and does not get too long.

stringtable constructor

The constructor initializes buckets as a new linked list, with a size nBuckets which means that the list created has a size equal to the number of buckets created. The size, or number of elements in the linked list called buckets, is initialized at 0. Then with a for loop, I traversed through the linked list buckets, and created a linked list at each index (at each bucket).

dictionary ADT description

a data structure that stores a collection of objects -each object is associated with a key -objects can be dynamically inserted and removed -can efficiently find an object in the dictionary by its key

when master theorem does not apply

a must be >= 1, b must be >1 (they don't necessarily need to be integers) can check this by the limit test and leaving the + ∈ and if there is not an ∈ you can pick to make it polynomially faster, then the rule does not apply f(n) is negative ex. nlogn / n^1+e cuz both have n^1 in it if a = 2^n (cuz not constant) if constant work is subtracted

A x h vs. h' x a if h = h' mod m

a x h is smaller than h' x a

comparison sort

any sorting algorithm that inspects its input only via pairwise comparisons once we have one comparison, we can test the rest in constant time bc if we can test x>y then we can test x<= y (not x>y), hence we can test x= y (x<=y and x>=y) x<=y (x=y or x>y) and x<y (not x>=y)

keys

are strings so you need the .equals() method

average cost of find hash table

average size of bucket is n/m (because total number of n elements distributed into m buckets) + 1 to find the bucket

proof by induction on the binary search

base case: on the 0 iteration, x exists in A or x never existed in A at all inductive hypothesis: x exists in A iff x lies in the subarray A[L...H] inductive step: if(A[mid] <= x) L<- mid else H<- mid-1 if A[L] = x return L

merge sort

based on linear merge of 2 sorted arrays. divide and conquer algorithm

order of methods to call to turn key into an index

call the stringtohashcode (send the key) then send hashcode toindex

applying the master theorem given recurrences

coefficient is a, division is b and function is constant work compare n^logb a vs. f(n)

examples of dictionary

database -actual dictionary: a collection that maps words to definitions -class list: set of students w name or ID as key -DMV database: collection of cars accessed by license plate

2 main approaches to index mapping

division hashing, multiplicative hashing

fixed point decimal

do the multiplication but then divide by a base of 10

how to draw a decision tree with code

for binary search you are chacking to see if = or if not. start in the middle. if equal then return that value, if not check to see if < or >. if < then check the lower values and > check the greater value and then keep going until u get to the end and if so return not found

hash function pipeline

go from an object with key k to integers (hashcode c) to buckets (indices j) assumptions: -object to be hashed have been converted to integer hashcodes -hashcodes are in range [0, N) -need to convert hashcodes to indices in [0, m) where m is table size

selection sort

goes thru and finds the smallest element and brings it to the front of the array

how many comparisons do we need to sort an input array of size n?

if each comparison takes constant time, and comparison is the dominant cost of sorting then the # of comparisons gives time complexity algorithms use n log n comparisons to sort an array of size n. hence # of required for fastest possible algorithm is O n log n (doesn't take longer than n log n ). any fixed sorting algorithm gives upper bound on cost of fastest possible algorithm

problem w modulus operator for toIndex method

if the hashcode is negative, then the modulus will return a negative number which would be out of bounds for the list. make sure to use absolute value for the index

finding the work of the tree asymptotic complexity (in general)

if the top of tree work dominates, θf(n) if bottom of the tree dominates, θn^logb a if the top and bottom balance, θf(n)logn (all levels contribute equally and there are logn levels)

what is worrisome about hashpoints

if there are duplicates

diffusion

if you multiply two long decimals together, you have to add placeholders and stuff. and then the only digit where none of them have 0s is the digit right after the decimal (bc further right is placeholder 0 and left could be initial 0). these digits determine the bin number

dictionary operations

insert (Record r) add to the dictionary find (key k) - return one/some/ all records whose key matches k, if any remove(key k) - remove all records whose k matches k if any other versions are possible. (ex remove might take a record to remove rather than a key) other operations might exist (isEmpty(), size(), iterator())

min heap time complexity dictionary

insert log n delete and find are not supported

heap sort

insert n inputs into the heap. call extractmin n times. values sorted least to greatest

time complexity dictionary sorted array

insert n: , delete: n, find: log n w binary search space: n

time complexity dictionary direct table

insert: 1 delete: 1 find: 1 space: size of the universe

time complexity dictionary unsorted list

insert: 1 delete: n find: n space: n

simple uniform hashing average running times

insert: 1+ alpha or 1 delete: 1+ alpha or 1 find: 1+ alpha or 1 space: m+n or n

hash table worst case complexity

insert: n delete: n find: n space: m+n

time complexity dictionary sorted list

insert: n delete: n find: n space: n

when does the merge sort terminate

k = m+n+ 2 because you do one extra call for both even after u have gone thru the lengths

direct addressed table

let U be the set of the universe of all possible keys allocate an array of size U if we get a record of key k put in k cell so basically have an outer circle of all the possible and an inner circle of k records kept and then you make the keys to a position in the array and put the values into the array BUT this could be wasting a lot of space if the universe is large and there are not a lot of keys another problem is how do you index and array if the keys aren't integers

lower bound decision tree

log n (cuz max height)

how big is logn!

log n! = θ n log n because log (n*n-1*n-2*...*1) logn + logn-1 + ...+ log1 <= log n + log n + log n .... + log n = n log n and also greater than or equal to logn/2 + logn/2 + ...+ logn so omega as well

java hashcodes

may be positive or negative. may have to take absolute value or make >= 0

asymptotically optimal comparison sorts

merge sorts and heap sorts

number of outcomes for binary search

n + 1 (n in the search + not found)

impossible to sort in less than ___ time w comparisons

n log n

quick sort running time

n log n

how many orders for n elements in decision tree

n! because n options for first spot in array, then n-1, etc.

total work merge sort in cloud

n/b log n where b is significant so it takes a lot less time (vs. computer memory is n log n)

load factor

n/m the larger m the lower the load factor, smaller length, sparser table and less space. so this is a difference between space vs. time m has to be proportional to n for constant running time

insertion sort running time

n^2

running time of bubble sort

n^2

shell sort running time

n^2 or n^4/3 or n log^2n

polynomially larger? n^2logn vs. n^2

no

are powers of 2 good for hashcode

no because the mod 32 shifts the bits over and then the 2 will basically move it back so the hashcode will do nothing

running time of decision tree

proportional to the height of the tree

difference between size and nBuckets

size is the number of elements in the list nbuckets is the number of lists/buckets you have

less uniform distribution for hashing

smaller # of bins some values of m might be better than others (kind of depends bc of mod)

where to allocate the list for buckets

the constructor. then also allocate a new linked list for each bucket (each index. do a for loop)

hash table

we often can not afford to store a table the size of the universe so instead we map keys to a smaller space of size m where m is less than the universe. but the problem is that by the pigeonhole principle you are guaranteed that some keys will share the same spot in the table cell (collide). and this must still work in the presence of collisions. how to fix-> chaining

model of computation

what we can do in constant time. sorting algorithm can only do limited work in constant time. they can make limited decisions about their inputs in constant time

hashpoints equality

where both x and y are equal for both points

heap for cloud

would not work because for the cloud the items need to be sequential and that is not how heap is structured

number of calls to read and write in cloud

write: n/b times (# of chunks) read: n/b +2 times because if there is still more in the other array it will read once more

simplest possible hashcodes where 2 have the same code

x+y

polynomially larger? n^2 vs. n

yes

polynomially larger? n^2.001 vs. n^2

yes

polynomially larger? n^3 log n vs. n^2

yes

polynomially larger? nlogn vs. n^log4 3

yes


Related study sets

Unit 9 Chapter 16: Fiscal Policy

View Set

Computer science - Programming languages

View Set

Market Research Exam Chapters: 5,6 & 8

View Set

Economics: Unit 3, Macroeconomics

View Set

Chapter 62: Management of Patients with Cerebrovascular Disorders - PrepU

View Set

Chapter 13- Marketing: Helping Buyers Buy

View Set