CPE357

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Empty String

"" is a string that is really just "\0". In other words it still takes up one byte for that terminating NULL.

Pointer Cast

"(SomeType *) somePointer" is the proper format for casting a type to a void pointer. After being casted, it can be dereferenced.

Output/Input Redirection

"< inputFile" after a command gives it the inputFile as its stdin. "> outputFile" redirects the output to outputFile.

EOD Code

"end of data" - appears only once, as the last piece of compressed ouput in the LZW compression project. Takes the code value 256.

Make

"make" is the Unix tool for build management.

Printing (and writing) in Hexadecimal and Octal

%o - the format specifier for octal. %x - the lowercase format specifier for hexadecimal. %X - the uppercase format specifier for hexadecimal. Writing a number in C as hexadecimal merely requires preceding it with 0x (ex: int i = 0x8D31).

Bitwise Operators

& - bitwise and: everywhere they do not both equal 1 is set to 0. | - bitwise or: every bit position where at least 1 of the 2 numbers being or'd is a 1 will be a 1. ^ - bitwise exclusive or: only bits where the input numbers differ will be set to 1. ~ - bitwise flip (unary, only one operand): flips all 0's to 1's and 1's to 0's. << - shift bits left: shifts the bits in the lefthand operand to the left by the righthand operand (doubles). Topmost bits are shifted off and lost while the bottommost bits are filled in with 0's. >> - shift bits right: shifts the bits in the lefthand operand to the right by the righthand operand (divides by 2). Bottommost bits are shifted off and lost while the topmost bits are filled in with whatever the MSB is.

Compression Terminology

*Symbols* are what uncompressed data is viewed as. These symbols are drawn from an allowed *alphabet*, or collection of existing symbols. Most compression algorithms produce a series of *codes* as their compressed data. Each code usually represents either an individual symbol, or a group of symbols. In the LZW algorithm, we'll have codes for individual symbols, and also for commonly used sequences of symbols. Every compression algorithm tries to make the most out of the bits in its compressed output, so the output is invariably in binary form.

BST (Binary Search Tree) Review

1. BSTs have nodes. Each node has a key and a left and right pointer to other nodes, termed its child nodes. It is their parent node. 2. One node, the root is at the top of the tree - the ultimate parent. You diagram/visualize the tree as "hanging" from that root node, growing downward from it. 3. A parent node P's child nodes may in turn point to other child nodes (unsurprisingly called P's grandchildren or descendants) which in turn may point to further child nodes, etc. in a potentially infinitely branching pattern. 4. One or both of a node's child pointers may be null, which is how the BST reaches its bottom. If both pointers are null, the node is a leaf. 5. In a BST, all descendant nodes reached from a parent P's left pointer have key values less than P's, and all descendant nodes to P's right have larger key values. In effect, the left pointer from P points to a mini-BST of children, called a subtree, all of which have keys less than P's, and its right pointer points to a mini-BST of children having keys greater than P's. P's left and right children are the subroots of their own small BSTs. 6. Each subtree is similarly organized, with nodes having keys less than its subroot's key falling to the left of the subroot, and those with keys greater falling to the right of the subroot. The BST thus divides the key values more and more finely at each level, in a recursive pattern. 7. You search for a key K in the BST by starting at the root. If K matches the root's key, you're done. If not, then if K is less than the root's key, any node holding K must be to the root's left, and if K is greater than the root's key, any node holding K must be to the root's right. So you go to the left or right child, and repeat the process. At each node, you either find K, or are guided to the left or right. This process guarantees finding K, if it's in the BST at all. 8. If K is not in the tree, then you ultimately reach a node N whose left/right pointer that should have led to K is instead NULL. (Note this doesn't necessarily mean N is a leaf, since N's other pointer might be non-NULL.) You can correctly add a new node containing K by hanging a new leaf node containing K off of that pointer. Doing this ensures you'll find K by the process in point 7, should you search for it again. 9. The search process in point 7 is fast if the tree doesn't have too many generations, or levels in it. If the tree is reasonably balanced, with most nodes having both left and right children, this wil be so. Indeed, a tree of L levels may have 2L-1 nodes in it - one of 20 levels may have over a million nodes, meaning such a balanced tree lets you find one key out of a million with 20 steps.

Dictionary

A *dictionary* is a collection of all existing codes (including those 1 symbol ones in the alphabet).

Degenerate BST

A BST with no two-child nodes. Effectively just a linked list at this point.

Porting Bug

A bug that only shows up when moving to a new machine.

Memory Block

A larger chunk of the runtime heap - say large enough for an array of ints.

Tag Name

A name for a struct given after the struct keyword that allows the struct to self reference.

Two's Complement

A representation of negative integers that is formed by changing each 1 bit to a 0 and each 0 bit to a 1 and then adding 1.

Linked List

A series of structs that contain pointers pointing to other structs of the same type.

Makefile

A textual configuration file describing dependencies and rebuild commands.

Address

A unique identification number for a byte.

Pointer

A variable that contains an address. All pointers (no matter the type) are 4-bytes (at least on Unix 32-bit). 'ASTERISK' preceding a declaration (delcarative asterisk) will make a pointer of the variable (NOT INITIALIZE SPACE IN MEMORY FOR, VALUE IS GARBAGE UNTIL ASSIGNED). 'ASTERISK' on a previously declared pointer (dereferencing asterisk) will dereference it and give the value of its target. '&' address of operator. Gives address of a variable when used before it (ex: &index).

Void Pointer

A void pointer is a generic pointer that can point to anything. Note that because these pointers don't have real type that they must be casted in order to perform arithmetic on.

Integer Overflow

Adding to the maximum or subtracting to the minimum. Technically the result is undefined, but on almost all systems, adding 1 to a max value results in a "wrap around", back to the minimum value, and subtracting 1 from the minimum results in a wrap around back to the max.

Commandline Arguments

Additional strings after the program name to indicate options, filenames, etc. The first argument is passed via the system as the name of the program file being run.

Strings in C

All C strings are a character pointers that points to an area in memory containing a series of characters followed by a terminating NULL character ('/0' as a char). This NULL character acts as an end of string marker and has the lowest ASCII code of 0.

Dynamic Allocation

Allocating (reserving) memory from the runtime heap for your variables.

Storage Leak

Allocating memory in a program that is not freed after the it's usefulness is gone. Especially in long running programs, will cause dead space in the runtime heap to build up until no more can be allocated. REMEMBER YOUR FREE CALLS (unless the program is literally about to exit, then not so important - but do it anyway for [the majority of] this course).

Sign-Magnitude

An alternate way of representing negative numbers where the MSB becomes an indicator of a minus sign. Not used on most machines because the hardware to do it is more complex due to it having a +0 and -0.

lvalue

An lvalue is anything that could go on the left side of an assignment, anything that signifies a location in memory and not just a value.You may only take the address of an lvalue.

Pointer Arithmetic

Arrays are simply pointers with specified sizes. Therefore, general pointers can be assigned them. Using "array[i]" is shortcut notation for *(array + i). The address type of a pointer not only tells it where to point to, but how much to point to. The multiplication by the size of the type is done for free when adding to the index of a pointer.

Pad Bytes

Because of the limitations of the bus, integers and doubles must lie on addresses divisible by 4 (the double may require an 8 byte divisible address on some architectures). If smaller types are put into a struct before them (chars/shorts), the compiler will add in extra empty bytes to make sure that they fall on properly divisible addresses.

Benchmark Place Values

Breaks in the naming convention for a number, like when you cross from 999 to 1000 or 999,999 to 1,000,000.

"Heisenburg Bugs"

Bugs that occur only when not trying to be observed. Trying to observe them (like by adding a printf) may cure the bug because it shifts the contents of the string table around.

Limits.h

C library containing constants for different number types in the signed and unsigned cases. These take the form <type>_MAX or <type>_MIN, e.g. SHRT_MAX for the maximum short, INT_MIN for the minimum signed int, etc.

Bus error

Caused by the fact that bytes and not fetched one at a time. Typically, they are fetched 4 at a time, hence, all valid integer addresses are at addresses divisible by 4. Garbage data or a trashed *int may contain an address not divisible by 4 and this will cause a bus error.

segfault

Caused by your program trying to access past its own area (or segment) of memory.

ln

Command to create a link to a file or directory. By default makes a hard link which is a directory entry pointing to the same file. The -s flag makes a soft link (symbolic link) that gives a path to the file.

gcc <filename> -o <outputFile>

Compiles a file to a.out unless the -o flag is used and an output filename given.

Trashing the Runtime Heap

Corrupting the header of a allocated block will lead to very hard to trace errors in all of the memory allocation functions. Don't do it.

Configuration Data

Data needed for a callback function in order for it to know what to do and when. Typically given via a void pointer parameter.

Code Complexity Metric

Don't use needless parenthesis, memorize the orders of operation.

Node

Each struct in a linked list. The last one points to NULL.

Base Conversions

Every group of three binary numbers (range: 0-7) represents a single octal digit. Therefore a 9-bit binary number can be viewed as a 3-digit octal number. Every group of four binary numbers (range: 0-15) represents a single hexadecimal digit. Therefore a 16-bit binary number can be viewed as a 4-digit hexadecimal number.

String Constants and the String Table

Every string constant used in your code is saved in the "string table" one after the other. The string table is simply a designated area of code that contains all the string constants used in the code.

Function Pointer

Example of a function point to a function that takes 2 integer parameters and returns an integer, "int (*fp)(int, int);". Function pointers are great ways of customizing the behavior of programs to be divergent in one part, but then re-converge. Main utility is in passing to other functions to be used by them. Dereferencing of function pointers is done for free, so you can make calls in either way "(*fp)(4, 5);" or "(fp)(4, 5);".

Headers

Extra space containing information for the system about the size (and sometimes more) of allocated memory (allocated via malloc or calloc).

Variadic Function

Function of with a "..." following its formal parameters. This "..." permits varying numbers and types of parameters. printf and scanf are examples of variadic functions.

Bit Fields

Groups of bits within the larger integer, with each bit field holding a small value in its own right. A field of 6 bits, for instance, can hold unsigned values from 0 to 63. Configuring hardware tables, as in an MMU, often requires breaking an integer into bit fields.

Call-by-value

How C parameters are all passed. The formal parameters in the function header are copies of the actual parameters passed to them

Building

In large projects, the process of compilation becomes so signficant that it's referred to as *building* the project, rather than merely compiling it.

Garbage Collector

In many modern languages, this is a feature which senses when you have a reference or pointer to a dynamically allocated object and frees it automatically. In C, YOU are the garbage collector, don't forget that or you'll be saying hello to storage leak galore.

Automatic Promotions

In order to combat bus errors and general confusion with variadic parameters, smaller sized types automatically get promoted to their larger sized counterparts. This means that for printf, there is actually no need for short integer format specifiers (%hd or %hu) because they automatically get promoted to full sized ints anyway. Note that longs still need the %ld and %lu specifiers because you are reading more not less. char -> int short -> int float -> double

Dynamically Sized

Initially some reasonable size is given as a size for a memory block to store data. If we fill up that block then we reallocate our data into a larger memory block.

Masks

Integer values with a group of 1 bits and 0 bits everywhere else. Very useful for bitwise arithmetic.

Most Significant and Least Significant Bits (MSBs and LSBs)

Just remember your powers of 2, they're important. Ex: 2^15 = 32768

LZW Compression

LZW stands for Lempel Ziv Welch, the algorithms creators. The algorithm is as follows: 1. Start with a dictionary having one code per original alphabet symbol. Use as many bits per symbol as needed to represent the entire alphabet. Convert the input symbols into equivalent output codes. 2. On each step, pick the code that represents as many of the next input symbols as possible. Initially, that will be only one symbol per code, but as we add codes to the dictionary, we may be able to represent two or more symbols with a single code. 3. After each code C is output, add a new code that is C plus the next input symbol that C did not include. The new code presumes that (C + next symbol) is a pattern you'll see later in the input. When you do, the new code will be able to represent the pattern compactly. 4. When the number of codes passes a power-of-2 boundary, so that one more bit is needed to represent all the codes, extend every code by a bit. At any point in the compression, all codes are the same length and can be distinguished from one another in the continuous sequence of compressed bits.

Little/Big Endian

Little Endian is order in memory, left to right, of least to most significant (index: 01234). Used on Intel processors. Big Endian is order in memory, left to right, from most to least significant (index: 43210). Common on all other processors but Intel.

List Traversal

Moving down a list. In a linked list is done by assigning the current pointer to a node to the nodes pointer to the next node (current = current->next).

malloc(size_t size) and calloc(size_t nitems, size_t size)

Part of C's <stdlib.h> library. Allocates space from the runtime heap and returns a void pointer to the address.

free(void *ptr)

Part of C's <stdlib.h> library. Frees dynamically allocated space in the runtime heap that can then be reused/recycled. DO NOT ATTEMPT TO ACCESS DATA AFTER IT HAS BEEN FREED, IT IS GARBAGE.

MMU (memory management unit)

Part of the CPU that catches any attempt by a program to access memory not its own.

Dictionary Recycling

Periodic recycling of the dictionary in LZW compression keeps the dictionary from taking up too much memory and keeps it current to the distribution of sequences in the file if the compression runs for a long file.

Order of Operations

Primary Expression Operators (left-to-right) () [] . -> expr++ expr-- Unary Operators (right-to-left) * & + - ! ~ ++expr --expr (typecast) sizeof Binary Operators (left-to-right) * / % + - >> << < > <= >= == != & ^ | && || Ternary Operator (right-to-left) ?: Assignment Operators (right-to-left) = += -= *= /= %= >>= <<= &= ^= |= Comma (left-to-right) ,

makedepend

Primitive Unix makefile generator.

Assignments as Expressions

Recall that expressions such as *source++ will have the postincrement performed before the star. Also recall that assignment "statements" are actually expressions which have a value of their own. Hence, we can do things like... c = 4 + (a=42); // Copies 4 + 42, or 46 into c arr[a=42] = 0; // indexes arr by 42 d = sqrt(a=42); // takes square root of 42

Cyclic Arithmetic

Recall the "wrap around" effect. Adding 1 to the max value wraps the value of an number over the top to the bottom, and subtracting 1 from the min wraps it under the bottom to the top.

Polling

Repeatedly calling an operating system function to check on the event or data. However, this is time-wasteful, requires a dedicated loop in your code, and makes it difficult for your program to do anything else while polling. The preferred approach is to arrange for some function in your code to be "called back" by the operating system when the event or data is ready.

Sign Extension

Right shift fills in on the left end by "extending" the sign bit, whether it is 0 or 1. Assembly programmers will also recognize this as a "right arithmetic shift".

Signed vs Unsigned Integers (Range Limits)

SIGNED 1 byte -128 127 (covers all the ASCII codes) 2 bytes -32768 32767 4 bytes -2 billion 2 billion 8 bytes -9 quintillion 9 quintillion UNSIGNED 1 byte 0 255 2 bytes 0 65535 4 bytes 0 over 4 billion 8 bytes 0 over 18 quintillion

chmod ### <filename>

Sets the given file to the permissions specified by ###, which is a number in octal where each digit represents the rwx permission. A value of 700 hence gives read, write, and execute permission to owner only (rwx------), while 451 would indicate read for the owner, read and execute for groups, and execute only for everyone else (r--r-x--x).

Dynamic Masks

Shifts and subtractions can easily create multi bit masks: (1 << bits)-1 produces a mask with bits 1-bits at the bottom.

Scanf Format Specifiers

Signed short - %hd Unsigned short - %hu Signed long - %ld Unsigned long - %lu Unsigned int - %u (and of course) Signed int - %d Read but don't assign (suppress assignment) of a string composed only of dashes - %*[-] Read but don't assign the letters that are in a string, but starting at any digit, assign them into an integer - %*[^0123456789]%d

Double and Triple Pointers

Something along the lines of... int **dblPtr; int ***trplPtr; These function just as you might expect, and can be dereferenced by providing the appropriate number of asterisks to do so. Using double and triple pointers as parameters allows the function to change their values and pass them back without the need of a return.

StrCmp

Take note of how this function walks down strings and compares them.

rvalue

Temporary value that only exists temporarily and does not have a lasting memory address of its own (i.e. 10, 7 + 1).

Protected/conditional/short-circuited and/or

The && operator will not run any following tests if the one(s) before it fail(s). In a linked list this allows the special case of temp == NULL to be ruled out before getting to the second term which would otherwise cause an error if run when temp == NULL. The || has similar behavior, only if the first test is true then it won't bother running the second one.

Pass-by-reference

The address to a value is passed and by changing the target you can change its value in the caller.

Build Dependencies

The dependency of a .c file on the headers it includes, or that of an executable on the .o files that link together to make the executable.

Head Pointer

The initial pointer in a linked list.

Runtime Heap

The large block of memory that is reserved to be allocated for new variables that show up as the program runs.

Collating Order

The order of characters as determined by their numerical values. Note that codes 65->90 for 'A' - 'Z' are "less" than all lowercase letters codes 97->122 for 'a' - 'z'.

Postincrement vs Preincrement

The postincrement (someValue++) returns the old value and increments the value. Th preincrement (++someValue) immediately increases the value and returns the new value. Preincrements are slightly faster than postincrements because they do not need to save the old value temporarily.

NULL Pointer

The return of malloc when there is no space left to allocate to. Essentially is just a pointer to "nothing" which is represented as a 0.

GDB

The standard Unix debugger ("gdb" from commandline). It does the following (use *-g* when compiling): 1. It lets you run the program, under control of the debugger, including allowing commandline arguments and optional input and output file redirection. 2. It reports the source code location of any faults, and lets you display the source at that point. 3. It lets you print the value of variables or expressions at the point of the fault. 4. It shows you the series of calls that led up to the fault. 5. It lets you set breakpoints to stop execution at any point. 6. It lets you automatically print the value of variables, without stopping the code, at any point in the execution, in the way you might do by hand with debugging printfs.

vi

The text editor that you must learn to use for this course. MOVEMENT 'ENTER' to the start of the next line. '-' to the start of the prior line. '0' to beginning of current line. '$' to end of current line. '^d' and '^u' scroll down/up a half screen. '^f' and '^b' scroll down/up a full screen. SEARCHING '/pattern' cursor goes forward in file until pattern is found. 'G#' goto line number (don't hit return); no number goes to the bottom of the file. 'f' searches forward in the current line. 'F' searches backward in the current line. EDITING 'o' open a new blank line under the current. 'O' open a new blank line above the current. 'a/i' insert (forward/at current) mode (normal typing, ESC to exit). 'dd' delete line, precede with number to delete that many lines in the direction specified by arrow keys. 'dfx' delete forward until 'x'. 'd#w' delete the # of words. 'd$' delete the rest of the line. 'u' undoes last change. 'p' put whatever was last deleted/copied on the line below the current one. 'y' copy. Functions like 'd'. 'Y' copies the current line. '.' repeats last edit. 'x' delete a single letter at the cursor. 'r' replace a single letter at the cursor. 'c' replace an entire section of the current line, works like 'd'. 'J' joins this and the next line together. FILE MANAGEMENT ':x' or 'ZZ' write and quit ':w' write file ':q' quit ':q!' quit and discard changes

Target

The value a pointer is pointing to.

Pass-by-value

The value passed is a copy of the original and does not modify it.

Formal parameter

The variables found in a function header. Also called arguments.

Dependency Rules

These rules take the form: target: dependencies build commands (must be indented w/ single tab) where target is a file that might need rebuilding, dependencies is a list of files on which the target depends, and build commands is a list of commands that must be run (e.g. compilations or linking) to bring the target up to date, if any of the dependencies have changed.

Arrow Operator (->)

This is used to dereference a piece of data from a pointer to a data struct without first dereferencing the struct (ex: student->grade).

LZW Decompression

To decompress: 1. Start with the same original dictionary as the compressor. Add new codes to it just as the compressor did. 2. Obtain the next code C from the compressed bits. You can tell how many bits it has based on the current dictionary size. 3. Add to the uncompressed data the symbols that C represents. If you are on the second or later code, then also use C's first symbol to fill in the missing final symbol from the last iteration's new code. 4. Add another new code to the dictionary, representing C extended by one more unknown symbol -- the first symbol of the next code. Leave a placeholder in the new code until you learn that symbol on the next iteration. 5. Increase the bits-per-code if the dictionary size has crossed a power-of-two boundary. 6. Repeat steps 2-5.

Node Removal

To remove a node in a linked list, assign the node before it NodeToBeRemoved->next if NodeToBeRemoved is not NULL, otherwise just give it NULL. Then free(NodeToBeRemoved).

valgrind <executable>

Unix tool helpful in finding difficult bugs. Using *-g* flag with gcc when compiling will get us locations in terms of C source lines for debugging.

touch <filename>

Updates the access and modification times of each FILE to the current system time.

Callback

Used to notify when an event has occurred or data has become available. Much more efficient than polling.

Integer Truncation

When assigning a larger sized data type into a smaller one, for example by setting a char value to that of an int or long, the uppermost bytes are cut off. In the cases where the targets value exceeds the destinations maximum value, this will result in a different value when the destination is printed.

Flushing

When using printf to test benchmarks in code, make sure to use a '\n' at the end to make the buffer flush, or use fprintf and have stderr be the output. Not flushing the buffer when checking for errors may cause output to be lost as the content of the buffer will never reach its intended destination. Buffering of output is done to speed up the operation of print functions.

argc, argv

argc - the number of parameters passed to the program. argv - a double pointer with each pointer in the block pointing to a single string argument (char pointer). In short, a double char pointer with length given by argc.

Variety of Integer Sizes

char <= short <= int <= long. Typically... long = 8 bytes, int = 4 bytes, short = 2 bytes, char = 1 byte, but this is not guaranteed.

File Permissions

ls -l will bring up all files in the current directory with permissions in "drwxrwxrwx" format where the 'd' indicates it is a directory, 'r' is read, 'w' is write, and 'x' is execute. The three groups of "rwx" give permissions for owner (user), group, and everyone.

Printf Field Width

printf("%*d", width, val) is how you would assign a minimum width for an integer output.

Other basic commands

rm: remove the specified file. mkdir: make a new directory. rmdir: remove a directory. rm: remove a file, -r flag removes a directory and its contents, recursively.


Set pelajaran terkait

Taku Tlingít textbook 1 Exercises

View Set

CLEP Social Sciences and History

View Set

The Twentieth-Century Poetic Revolution

View Set

Unit 3, NERVOUS SYSTEM, (MODULE 11.3) BIOLOGY 106

View Set

Perfect Competition (Pg. 255-279)

View Set

Chapter 23-State and Local taxes

View Set