Data Structures (From InterviewCake and GeeksForGeeks)
How many different numbers can we express with 1 byte (8 bits)?
2^8 =256 different numbers. How? if we map it out it creates a binary tree. Each level is double of the first. so It's an exponential pattern. 2^n. level 1 is: 0,1 level 2 is: 00, 01, 10, 11 level 3 is: 000, 001, 010, 011, 100, 101, 110, 111
What is 101 in base 10?
5
binary tree
A binary tree is a tree where every node has two or fewer children. The children are usually called left and right.
How to express decimals in binary?
Also two numbers: 1) the number with the decimal point taken out, and 2) the position where the decimal point goes (how many digits over from the leftmost digit).
graphs: directed
In directed graphs, edges point from the node at one end to the node at the other end
How to express fractions in binary?
Store two numbers: the numerator and the denominator.
What does log10(100) mean?
What power must you raise 10 to in order to get 100?
What is the default number system?
base 10
1-12 in binary
0 = 0000 1 = 0001 2 = 0010 3 = 0011 4 = 0100 5 = 0101 6 = 0110 7 = 0111 8 = 1000 9 = 1001 10 = 1010 11 = 1011 12 = 1100
The places in binary (base 2) are sequential powers of _____
2 2^0 = 1 2^1 = 2 2^2 = 4 2^3 = 8
Spots in binary
256 128 64 32 16 8 4 2 1
32-bit integers have x possible values
2^32 possible values—more than 4 billion
64-bit integers have x possible values
2^64 possible values-- more than 10 billion billion (10^19)
How many bits is a byte?
8
Circular Buffer
A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams. The useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed. (If a non-circular buffer were used then it would be necessary to shift all elements when one is consumed.) In other words, the circular buffer is well-suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer. Circular buffering makes a good implementation strategy for a queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a linked list approach may be preferred instead. In some situations, overwriting circular buffer can be used, e.g. in multimedia. If the buffer is used as the bounded buffer in the producer-consumer problem then it is probably desired for the producer (e.g., an audio generator) to overwrite old data if the consumer (e.g., the sound card) is unable to momentarily keep up. Also, the LZ77 family of lossless data compression algorithms operates on the assumption that strings seen more recently in a data stream are more likely to occur soon in the stream. Implementations store the most recent data in a circular buffer.
Minimum spanning tree
A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.
queue
A queue is like a line at the movie theater. It's "first in, first out" (FIFO), which means that the item that was put in the queue longest ago is the first item that comes out. "First come, first served." Queues have two main methods: enqueue() : adds an item dequeue() : removes and returns the next item in line They can also include some utility methods: peek() : returns the item at the front of the queue, without removing it. isEmpty() : returns true if the queue is empty, false otherwise
What is an address in RAM?
A shelf in the "stack" of ram. Each shelf holds 8 bits. A bit is a tiny electrical switch that can be turned "on" or "off." But instead of calling it "on" or "off" we call it 1 or 0. Actually, it has a series of caches. But we can picture them all lumped together as one cache like this.
stack
A stack is like a stack of plates. It's "last in, first out" (LIFO), which means the item that was put in the stack most recently is the first item that comes out. A stack has a top and a bottom. Stacks have two main methods: push() : adds an item pop() : removes and returns the top item They can also include some utility methods: peek() : returns the item on the top of the stack, without removing it. isEmpty() : returns true if the stack is empty, false otherwise
Base of hexidecimal?
Base 16
What is a single doubling append and what is it's time complexity, and why?
Create a new array when the array indices are not enough. Worst case: O(n) time operation since we have to copy all n items from our array. While the time cost of each special O(n) doubling append doubles each time, the number of O(1) appends you get until the next doubling append also doubles. This kind of "cancels out," and we can say each append has an average cost or amortized cost of O(1)
linked list
Each character in our data structure is a two-index array with the character itself and a pointer to the next character. We would call each of these two-item arrays a node and we'd call this series of nodes a linked list. We'll also sometimes keep a pointer to the tail. That comes in handy when we want to add something new to the end of the linked list. So the tradeoff with linked lists is they have faster prepends and faster appends than dynamic arrays, but they have slower lookups. That's i + 1 steps down our linked list to get to the ith node (we made our function zero-based to match indices in arrays). So linked lists have O(i)-time lookups. Much slower than the O(1) time lookups for arrays and dynamic arrays. Not only that—walking down a linked list is not cache-friendly. Because the next node could be anywhere in memory, we don't get any benefit from the processor cache. This means lookups in a linked list are even slower.
How to keep arrays fast
Each item in the array needs to be the same size, and you need a big block of uninterrupted free memory to store the array. You can't be skipping over memory addresses. Needs to be one chunk. These things make our formula for finding the nth item work because they make our array predictable. We can predict exactly where in memory the nth element of our array will be. But they also constrain what kinds of things we can put in an array. Every item has to be the same size. And if our array is going to store a lot of stuff, we'll need a bunch of uninterrupted free space in RAM. Which gets hard when most of our RAM is already occupied by other programs
How can computers get extra speed boost when reading memory?
Even though the memory controller can jump between far-apart memory addresses quickly, programs tend to access memory that's nearby. So computers are tuned to get an extra speed boost when reading memory addresses that're close to each other. Here's how it works: The processor has a cache where it stores a copy of stuff it's recently read from RAM.
What is a memory leak?
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in such a way that memory which is no longer needed is not released.
hashing function
In our hash table, the counts are the values and the words ("lies," etc.) are the keys (analogous to the indices in an array). The process we used to translate a key into an array index is called a hashing function.
graphs: undirected
In undirected graphs, the edges simply connect the nodes at each end.
How do pointers make arrays faster?
Instead of storing the strings right inside our array, let's just put the strings wherever we can fit them in memory. Then we'll have each element in our array hold the address in memory of its corresponding string. Each address is an integer, so really our outer array is just an array of integers. We can call each of these integers a pointer, since it points to another spot in memory. The items don't have to be the same length—each string can be as long or as short as we want. We don't need enough uninterrupted free memory to store all our strings next to each other—we can place each of them separately, wherever there's space in RAM.
Linked lists worst case append time
O(1) because a linked list keeps a pointer to the tail.
Binary trees have a few interesting properties when they're perfect...
Property 1: the number of total nodes on each "level" doubles as we move down the tree. 1, 2, 4, 8, 16 Property 2: the number of nodes on the last level is equal to the sum of the number of nodes on all other levels (plus 1) calculate number of nodes by 2^h-1
What is the trade-off using points with arrays?
Remember how the memory controller sends the contents of nearby memory addresses to the processor with each read? And the processor caches them? So reading sequential addresses in RAM is faster because we can get most of those reads right from the cache? Our original array was very cache-friendly, because everything was sequential. So reading from the 0th index, then the 1st index, then the 2nd, etc. got an extra speedup from the processor cache. But the pointers in this array make it not cache-friendly, because the baby names are scattered randomly around RAM. So reading from the 0th index, then the 1st index, etc. doesn't get that extra speedup from the cache. That's the tradeoff. This pointer-based array requires less uninterrupted memory and can accomodate elements that aren't all the same size, but it's slower because it's not cache-friendly. This slowdown isn't reflected in the big O time cost. Lookups in this pointer-based array are still O(1) time.
How to express negative numbers in binary?
Reserve the leftmost bit for expressing the sign of the number. 0 for positive and 1 for negative.
Explain the difference in performance between reading sequentially and non sequentially.
So if the processor asks for the contents of address 951, then 952, then 953, then 954...it'll go out to RAM once for that first read, and the subsequent reads will come straight from the super-fast cache. But if the processor asks to read address 951, then address 362, then address 419...then the cache won't help, and it'll have to go all the way out to RAM for each read. So reading from sequential memory addresses is faster than jumping around.
Advantage of dynamic arrays
The advantage of dynamic arrays over arrays is that you don't have to specify the size ahead of time, but the disadvantage is that some appends can be expensive.
What happens if we have the number 255 in an 8-bit unsigned integer (1111 1111 in binary) and we add 1?
The answer (256) needs a 9th bit (1 0000 0000). But we only have 8 bits! This is called an integer overflow. At best, we might just get an error. At worst, our computer might compute the correct answer but then just throw out the 9th bit, giving us zero (0000 0000) instead of 256 (1 0000 0000)! (Javascript automatically converts the result to Infinity if it gets too big.)
graphs: Degree
The degree of a node is the number of edges connected to the node.
memory controller
The memory controller does the actual reading and writing to and from RAM. It has a direct connection to each shelf of RAM. It means we can access address 0 and then immediately access address 918,873 without having to "climb down" our massive bookshelf of RAM.
What does a processor do to get even faster access to memory addresses?
The processor has a it has a series of caches where it stores a copy of stuff it's recently read from RAM. Actually. This cache is much faster to read from than RAM, so the processor saves time whenever it can read something from cache instead of going out to RAM.
"perfect" binary tree
There are no "gaps." We call this kind of tree "perfect." Every level of the tree is complete fully
How are characters represented in a string?
This mapping of numbers to characters is called a character encoding. One common character encoding is "ASCII". For example A: 01000001 B: 01000010 C: 01000011
Explain "random access" in RAM
We can Access the bits at any Random address in Memory right away. Spinning hard drives don't have this "random access" superpower, because there's no direct connection to each byte on the disc. Instead, there's a reader—called a head—that moves along the surface of a spinning storage disc (like the needle on a record player). Reading bytes that are far apart takes longer because you have to wait for the head to physically move along the disc.
The 256 possibilities we get with 1 byte are pretty limiting. So...
We usually use 4 or 8 bytes (32 or 64 bits) for storing integers.
How does hexidecimal work?
We've talked about base 10 and base 2...you may have also seen base 16, also called hexadecimal or hex. In hex, our possible values for each digit are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f. Hex numbers are often prefixed with "0x" or "#".
hash tables tradeoff
You get fast lookups by key...except some lookups could be slow. And of course, you only get those fast lookups in one direction—looking up the key for a given value still takes O(n) time. Note that our quick lookups are only in one direction—we can quickly get the value for a given key, but the only way to get the key for a given value is to walk through all the values and keys. Same thing with arrays—we can quickly look up the value at a given index, but the only way to figure out the index for a given value is to walk through the whole array.
How to pick the right data structure?
You have to know what's important in the problem you're working on. What does your data structure need to do quickly? Is it lookups by index? Is it appends or prepends?
graph
an abstract data structure with nodes (or vertices) that are connected by edges. They're useful in cases where you have things that connect to other things. Nodes and edges could, for example, respectively represent cities and highways, routers and ethernet cables, or Facebook users and their friendships.
graph loop
an edge that connects a node to itself.
graphs: Two nodes connected by an edge a----b
are adjacent or neighbors.
dynamic array
array that resizes doing the following... Make a new, bigger array. Usually twice as big. Copy each element from the old array into the new array. Free up the old array. This tells the operating system, "you can use this memory for something else now." Append your new item.
What base is binary?
base 2
If a graph is weighted...
each edge has a "weight." The weight could, for example, represent the distance between two locations, or the cost or time it takes to travel between the locations.
fixed-width integers take up______ space or _______ space.
fixed-width integers take up constant space or O(1) space.
The process we used to translate a key into an array index is called a
hashing function.
Note, for hash maps, our quick lookups are only
in one direction—we can quickly get the value for a given key, but the only way to get the key for a given value is to walk through all the values and keys. Same thing with arrays—we can quickly look up the value at a given index, but the only way to figure out the index for a given value is to walk through the whole array.
In a directed graph, nodes have an _________ and an ___________.
indegree and outdegree
How to fix a hash collision
instead of storing the actual values in our array, let's have each array slot hold a pointer to a linked list holding the counts for all the words that hash to that index. One problem—how do we know which count is for "lies" and which is for "foes"? To fix this, we'll store the word as well as the count in each linked list node.
A graph is cyclic if
it has a cycle—an unbroken series of nodes with no repeating nodes or edges that connects back to itself
A graph is acyclic if
it has no cycle
When the processor asks for the contents of a given memory address...
the memory controller also sends the contents of a handful of nearby memory addresses. And the processor puts all of it in the cache.
Adjacency matrix
var graph = [ [0, 0, 0, 1], [0, 0, 1, 1], [0, 1, 0, 1], [1, 1, 1, 0], ]; A matrix of 0s and 1s indicating whether node x connects to node y (0 means no, 1 means yes). Since node 1 has edges to nodes 2 and 3, graph[1][2] and graph[1][3] have value 1.
graph adjacency list array
var graph = [ [3], [2, 3], [1, 3], [0, 1, 2], ]; A list where the index represents the node and the value at that index is a list of the node's neighbors: Since node 1 has edges to nodes 2 and 3, graph[1] has the adjacency list [2, 3].
graph adjacency list dictionary
var graph = { 0: [3], 1: [2, 3], 2: [1, 3], 3: [0, 1, 2], }; We could also use a dictionary where the keys represent the node and the values are the lists of neighbors.