COMP10002 Foundations of Algorithms
What is the range of values for different bit sizes for twos-complement?
32-Bit Computer → range from -(2^31) to 2^31 - 1 64-Bit Arithmetic (type long long) → range from -(2^63) to 2^63 - 1 8-Bit (type char) → range from -(2^7) to 2^7-1 so range from -(2^(w-1)) to w^(w-1)-1
What are IDEs and their purpose?
IDE: integrated development environments developed for multimodule program development → Xcode for Mac → Eclipse for Mac and PC Functionality provided → automatic makefile generation → built-in debugging tools → conditional compilation → cross-module type checking → pretty-printing → language-directed editing
What is the generate and test approach to problem solving?
If a solution space can be enumerated and mapped onto a sequence of integers, then systematically try candidate answers until one that meets specific criteria is found. → if solution space infinite, include loop counter and "no answer found" exit point → if solution space is multiply-infinite, ensure that dimensions are treated fairly
What are heaps?
In an array A[0...n-1] → two children of A[i] are A[2i + 1] and A[2i + 2] → parent of A[i] is A[(i-1)/2] → heaps are balanced trees based on array positions → root is A[0] → leaves are in A[n/2...n-1] → no extra space required for pointers → no child may be larger than its parent
What type are relational and logical expressions?
Int
What is the standard pseudocode for memory allocation?
Key points: → always use sizeof(), dont hard-code sizes → test the pointer that is returned → free pointers → match every malloc() with a corresponding free() → always set access pointer to NULL to prevent improper re-access → use realloc() to grow multiplicatively, not additively
What does the function fopen() take as arguments?
1 A filename as a string 2 An access mode of either → "r" open for reading → "w" open for writing, previous content deleted at moment of opening → "a" open for appending, previous contents retained if a "+" is appended, the operations fseek() and ftell() are also available for random access seek/read/rewrite processing
What is the logic behind binary search and its efficiency?
1 Array is sorted 2 Search by repeated halving Takes O(log n) time
What are the different layers of analysis for the efficiency of an algorithm?
1 Best case → for academic interest only 2 Average case → randomness required in input: high risk unless input can be guaranteed to be randomised → randomness enforced by algorithm, regardless of input 3 Worst case → best your life on it
What are the five generic approaches to problem solving?
1 Generate and test 2 Divde and conquer 3 Simulation 4 Approximation 5 Adaption
What are the four key benefits of C?
1 High level control, direct mapping to hardware and relatively unrestricted access to memory 2 Relatively light execution footprint 3 Compilers robust, efficient, and portable 4 Useful for applications and systems programming
What are two drawbacks of C?
1 Run time diagnostics and lack of error trapping can trap careless programmers 2 Low-level operation ---- libraries are required for operations and types that may be native in other languages
What is the logic behind quick-sort and its efficiency?
1. Partition all the values smaller and larger than given value 2. Sort small half and big half recursively 3. Return sorted array Best case O(n log n) Worst case O(n^2) when array is already sorted Average case O(n log n)
What is heap sort?
1. build heap 2. swap and sort heap
What is the order of precedence for operators?
1. post inc, post dec 2 not, negation, casting 3. multiplication and division 4. addition and subtraction 5. comparions 6. equalities 7. and 8. or 9. assignment
Why is seperate compilation preferable?
A big program might containt tens or hundreds of thousands of lines of code → may take many minutes or longer to compile → only need recompilation if program changed → .o file dependent on corresponding .c files → these are dependent on any .h files it references via #include directives → final executable depends on .o files
What is complexity theory?
A branch of computing that seeks to prove, for a given problem, that . "no there can't be"
What is a file pointer?
A connection between an executing program and an input or output device → files need to be opened before they can be used → text files are manually edited and viewed → binary files provide faster input and output for arrays and structures → cannot be processed manually
What is big O notation?
Allows asymptotic costs to be compared → best case and average case analysis that requires randomness is risky → worst case and average case with randomness in algorithm is better Quicksort → requires O(n log n) on average and O(n^2) in worst case
What does seperate compilation mean?
Any non-trivial C program gets written as a sequence of modules → each module provides a public interface and private implementation → interface is described by a .h file ---- lists the public functions associated with the module ---- also includes data types used to access that functionality → modules separately compiled without needing main function → compiled object files linked together with gcc for executable program
What are the fundamental differences between arrays and structures?
Array is a pointer and structure is an object. → structs can be assigned, arrays cannot → arrays can be compared as a pointer → the address of an array cannot be taken → arrays can be used as pointer → both can be an argument to a function and returned from a function ---- however, structs are copied and arrays are as pointers Array of structures behaves as an array Structure that has arrays as elements behaves as a structure
What is the pseudocode/logic behind linear search and its efficiency?
As the program requires n steps of size n, it is linear o(n) time
What are assertions?
Assertions: argued statements about what must be true as a program executes. → C pseudo-function asset is defined in assert.h and used → if argument expression is violated, program execution will be halted and invalid assertion identified
What is a bit and byte?
Binary digits are bits → either 0 or 1 Byte is a unit of eight bits → most words are units of either four or eight bytes → typically words store a set of 32 or 64 bits
Are for loops and while loops different or similar?
Can feel different but are almost identical
What is the function cmp and its resulting values?
Compares two elements and returns: → -ve if first item is smaller → 0 if two items are equal → +ve if first item is larger e.g. strcmp and strncmp
What are floating point representations?
Float and double floating point types are stored as → one bit sign → w_e-bit integer exponent of 2 or 16 → w_m-bit mantissa normalised so that leading binary or hexadecimal digit is non-zero When w=32 → float variable has around w_m = 24 bits of precision in mantissa → corresponds to around 7 or 8 digits of decimal precision → in double, around w_m=48 bits of precision are maintained
What is the logic/pseudocode behind KMP pattern search and its efficiency?
For matching between strings → Start with first match against string → when mismatch occurs, shift pattern as far as possible In every iteration either → i goes up by one and s is unchanged or → s goes up by the same as i decreases or → s goes up by 1 and i remains zero
What does isalpha() do?
Function in string.h library that ensures the characters are alphabetic
What is the difference between a queue and stack?
Fundamental data structures that allow data to be processed systematically in orders other than it was received in Queue → FIFO first in first out → insert at tail → extract from head Stack → LIFO last in first out → insert at head → extract from head
What does sizeof() do?
Gives the size of types and values
What is the fastest way to search through a dictionary?
Hashing: insert, search and delete in O(1) average time) Balanced search tree → can do O(log n) time for all operations → but complex to implement Binary search tree → not robust and hence, vulerable Sorted array → insertion is slow Array → search and ordered processing is slow
What are three approaches when there is a collision in the hash?
Linear Advance: move cyclicly forward to next vacant cell in the array → simple but no that effective → chains get longer → deletion is problematic Cuckoo Hashing: displace the previous object to allow a new one to be in its spot and re-insert using another has function → effective but needs a suite of hash functions → still needs complex deletion Seperate Chaining: use a secondary data structure to maintain the set of colliding objects: → objects are linked list, tree, or dynamic array
What does the program gdb do?
Manages an executing program and provides debugging information → compile using additional -g flag → executing gdb prog accesses additional symbol table information that connects source code and the executable
What is the adaption method of problem solving?
Modify the solution or approach already used for another similar problem → essentially borrowing → always give full attribution
What are octal and hexadecimal values?
Octal: base 8 Hexadecimal: base 16 Any integer constant that starts with 0 is taken to be octal
What is the logic behind merge sort?
→ break array into two parts → continue breaking into two parts and single parts → sort elements from smallest to largest → merge the sorted elements together
What are pointer values?
Pointer values map variables and compound structures to addresses in memory. → provides operations that manipulate pointer values → derive their types from underlying variables ---- e.g. int* is type "pointer to int" → functions that need to alter their arguments use pointers → void* allows untyped pointers → arrays are pointers
What are recursive struct types and examples of one-, two-, and higher-dimensional structures?
Recursive struct types include pointers of their own type One-dimensional → lists → queues → stacks Two-dimensional → trees → tries Higher dimensional → undirected and directed graphs
What are dictionary abstract data type?
Requires the operations insert() and search() with the following structures
What are scope rules and the different scopes?
Scope rules determine which variables can be accessed at each point in the program → local: argument variables are considered to be local to the function ---- can be shadowed by local variables declared within the function → automatic: variables declared within a function are local or automatic → global: variables declared outside any function ---- local and global variables can be declared with the modifier static ---- static variables are initalised once and therefater retain their value through the execution
How do you implement seperate chaining for hashing collisions?
Seperate chaining → easy to implement, robust against overflow → when records are large, overhead of extra pointers is small → whenever table loading gets too high, double size of array of lists, rehash all current objects, and resume
How do you deal with negative numbers?
Sign-magnitude vs Twos-Comparison Sign-magnitude → first bit reserved for the sign → w-1 bits reserved for the magnitude of the number ---- where w is the number of bits Twos-complement → the leading bit has the weight of -(2^[w-1]) ---- rather than just 2^(w-1)
What is the 4-bit integer representation and the difference between unsigned, sign-mangified, and twos-comparison?
Sign-magnitude: negative starts at -0 twos-comp: negative starts at -8
What is the difference and similarities between symbolic and numeric processing?
Similarities → above all else, correct → straightforward to implement → efficient in terms of memory and time Symbolic Processing → e.g. sorting strings → scaleable and/or parallelizable (for massive data) → statistical confidence in answers and assumptions made Numeric Processing → effective i.e. correct answers with broad applicability and/or limited restrictions on use → stable and reliable in terms of underlying arithmetic
What are the similarities and differences of C and Python?
Similarities → both imperative languages → offer range of arithmetic and logical operations → offer range of control structures incl. selection, iteration, recursion → arguments received as initial values of local variables → libraries available for wide range of other operations Differences → C structure indicated by semicolons and braces; Python by layout → C integer arithmetic is bounded and silently overflows → C does not have an explicit bool type → C has static typing and requires declarations; Python has dynamic typing → C is compiled whereas Python is interpreted → Python provides built-in list, set, and dictionary structures with operations → C provides explicit pointer variables and operations
What is the approximation method of problem solving?
Solve a simpler problem with either → known degree of fidelity relative to original → clear strategy of estimating the extend of the error Need to be careful with floating point arithmetic being performed.
What is the logic/pseudocode behind suffix array pattern search and its efficiency?
Suffix array S[0...n-1] is an array of pointers S[i] such that T_S[i] lexicographically precedes T_S[i+1] → average time O(n^2 log n) time → worst case O(n^3) time Not cheap
What is the idea behind hashing?
Take each key and use a hash function h() to deterministically construct a seemingly-random integer in a constrained range. → use integers to index an array A of size t, putting x in location A[h(x)] → if lucky, everything fits fine and there are no collisions → if location is used, put the item "somewhere else" that makes sense → when searching for h, look first in the location A[h(x)] → if not there then look in other possible locations → if not there then not in dictionary → try to keep set of 'elsewhere' location options small When collision choices are → linear advance → cuckoo hashing → seperate chaining
What is the efficiency of heap sort?
Takes O(n log n) time in the worst case
What is the logic behind ternary quick-sort and its efficency?
Ternary Quick sort: partition on one character at depth d in the string and do three recursive calls. → shaves a factor of up to n off execution from suffix array pattern search → worst case becomes O(n^2) → the average case still requires randomness
What does gprof do for profiling?
The program gprof reads a file gmon.out that is created by running a program compiled with -pg → provides list of functions int he program → also provides number of times each was called and time spent → used to find hot spots that will benefit from careful tuning
What are the three files stdin, stdout and stderr?
Three files are always provided when a program is executing stdin = for input from the keyboard → available for redirection by the shell stdout = for output to the screen → available for redirection by the shell stderr = output to the screen → available for seperate redirection by the shell printf() is a call to fprintf(stdout) scanf() is a call to fscanf(stdin)
What does it mean to be polymorphic for binary search trees?
To allow type-free functions, comparison function must also be polymorphic → captured at the time the tree is created → rather than being passed into every tree manipulation function
What are Monte Carlo methods of problem solving?
Use pseudo-random number generations to model a physical system. → srand() used to initialise random-number system → each call to rand() returns the next seemingly unrelated in in the sequence
What do the functions fread() and fwrite() do?
Used to transfer blocks of data between files and arrays → file pointer of type FILE* must be opened before its used for either operation
What are abstract data types?
Where different algorithms require different suite of operations → dictionary abstract type → priority queue O(n log n) worst-case time sorting
What is an array?
Where every element is of the same type and the i'th value can be accessed independently of other elements via A[i-1] → arrays are an address mapping directly on to the computer's memory structure → without subscript, the array is essentially a pointer constant → size of each element of an array must be declared ---- but not number of elements → in a function array is essentially pointer variable used to alter array elements ---- A[i] is essentially shorthand for *(A+i)
What are unsigned types?
Where negative numbers cannot be stored → unsigned char → unsigned short → unsigned int (or just unsigned) → unsigned long → unsigned long long Will still be printed if using "%d" or "%u" or %'lu" or %'llu"
What is the logic/pseudocode behind sequential pattern search and its efficiency?
Worst case = O(nm) time Average case = O(n) time
What is the efficiency of insertionsort?
Worst case: n(n-1) comparions therefore O(n^2) time
What is the divide and conquer approach to problem solving?
→ break problem into smaller instances → solve instances (can be recursively) → combine solutions to create solution to original problem e.g. quicksort
What is the efficiency of binary search trees?
→ if input data is a random permutation, average depth will be O(log n) → average search cost is O(log n) → worst case O(n)
What is the equivalent of true and false in C?
false = zero true = non-zero
What does malloc(), realloc() and free() do?
malloc() = new memory can be sized and requested realloc() = amounts that have already been allocated can be resized free() = hands back the memory when it is no longer required
How do you pop the front element off the chain in a linear linked structure?
→ if p is a pointer to non-empty chain, can pop the front element off the chain and discard it → must be used as list = pop(list) ---- not newlist = pop(oldlist) If data from front of list is to be returned and shorted → front_data = pop(&list) → with list holding a different value after call than before
What are the similarities and differences between merge sort and quick sort?
→ both are divide and conqeur sorting algorithms → quick sort is "split hard, easy join" → merge sort is "easy split, hard join"
What are the operations for bits in int and unsigned variables?
→ << left shift → >> right shift → & bitwise and → | bitwise or → ^ bitwise exclusive → ~ bitwise complment
What is the efficiency of hashing?
→ After n objects inserted, average list length is n/t items long → If hash function is good and data not pathological → if n/t < K for some K, then insert and search are O(1) → provided hash function can be evaluated in O(1) time per key → can be enhanced by moving any item that is accessed to front of its list → accelerates search unless access pattern is uniform
What are operations supported by the dictionary abstract data type?
→ D make_empty() → D' insert(D, key, item) → item_ptr search(D, key) → D' delete (D, key) → find_smallest_key(D) → find_next-key(D)
What are strings in C?
→ No pre-defined string type in C → instead, stored as null-terminated array of characters ---- null-terminated array does not have pre-specified number of characters unlike normal arrays → string operations carried out using character pointers → if characters given longer than string length then will return error → functions to manipulate strings make use of char* pointers → arrays of char* are used to manipulate sets of strings ---- including argv, the initiating command line
What are linear linked structures?
→ a struct type that includes a pointer to itself (rather than a pointer to a string) → needs a forward declaration → type node_t declared before it is defined → sequence of elements of struct type are threaded together in a chain of pointers → last item in chain has a null pointer
How are arrays essentially pointers?
→ arrays are an address mapping directly on to the computer's memory structure → without subscript, the array is essentially a pointer constant → size of each element of an array must be declared ---- but not number of elements → in a function array is essentially pointer variable used to alter array elements ---- A[i] is essentially shorthand for *(A+i) ---- script indexing is another form of pointer dereference → functions cannot return arrays ---- instead, allocate new memory space and return a pointer to it
What are the 7 commands of gdb for debugging?
→ b nn: set a breakpoint at line nn → run: execute the program through until a breakpoint → continue: continue the program until a breakpoint → print var: print the value of var → next: execute a single line (bypasses functions) → step: execute a single line (goes into functions) → return: do the last command again
What is the struct type?
→ in struct, heterogeneous-typed data is aggregated with individual elements identified by component name → accessed via the . selection operator → -> allows direct access to the components of a structure identified by a pointer → if an array of structures is passed to a function, it is passed as a pointer → when a structure is passed, it is copied → usual to pass structure pointers to functions
What are some key things to keep in mind with int, float and double representation?
→ int and int operations result in ints ---- even if division → float and double can result in order and rounding issues
What are the advantages of twos-complement representation?
→ only one representation for zero → integer arithmetic is easy to perform
What are some pitfalls of numeric computation?
→ subtracting numbers that may be close together → absolute errors are additive → relative errors are magnified → adding large sets of small numbers to large numbers one by one ---- precision is likely to be lost → comparing values that are the result of floating point arithmetic e.g. zero may not be zero
What are preprocessors and some facilities provided by the c-preprocessor?
→ symbolic substitution via #define → parameterized "string replacement" substitution in #define definitions and expansions → conditional compilation via #if and #ifdef → access to compile-time variables
What is the efficiency of merge sort?
→ worst case O(n log n) time → however, takes space