Interview Prep 2023
append to list
the_list.append(value)
Geometric Series
1 + 2 + 4 + 8 + 16 .... + N converges to 2N which if you take the big O it is O(N). Mostly
Python division
5 / 2 = 2.5 5 // 2 = 2 this is division then the floor
Nines of Availability
5 Nines 99.999% is the gold standard, the best. How to increase availability? Remove single points of failure aka redundancies,
Binary Search Tree
A Binary Search Tree (BST) is a type of data structure that facilitates efficient data storage, retrieval, and manipulation. It's defined by the following properties: Binary Tree Structure: A BST is a binary tree, meaning each node has at most two children, commonly referred to as the left and right child. Ordered Elements: In a BST, the nodes are ordered such that for any given node: All elements in the left subtree are less than the node's value. All elements in the right subtree are greater than the node's value. This ordering principle enables efficient operations like search, insertion, and deletion, typically with time complexities of O(log n) in balanced trees, where 'n' is the number of nodes. However, in the worst case (e.g., when the tree becomes skewed), these operations can degrade to O(n).
Key Value Store
A Key-Value Store is a flexible NoSQL database that's often used for caching and dynamic configuration. Popular options include DynamoDB, Etcd, Redis, and ZooKeeper. These are great because they are simple and flexible. But querying is much less robust here. Some write to disk some write to memory (redis). Some are strongly consistent other are eventual.
Linked list, high level
A collection of nodes. A node is value then next in memory is a pointer pointing to the next node in the linked list. This is for a singelly linked list. Key point here is the values not have to be stored contiguously. Starting and end nodes are head and tail Often enough in an implementation of a linked list you will have the head and the tail stored so you can easily access both. Doubely linked list has 2 pointers at each node. One pointing to the next node and one pointing to the previous node.
Relational Database
ACID: A: Atomicity Multiple steps like adding money from one bank account to another are considered one step. They are an atomic unit so will all fail or succeed together. C: Consistency All transactions will confirm to the rules in the database. And future transactions take into account prior transactions. I: Isolation Multiple transactions can be done at once but will actually be executed in a sequence D: Durability When you do a transaction it is permanent. Database Index: You create another data structure that allows you to more rapidly search through a given column. Under the hood there is a another data structure maybe a tree, that is made for efficient searching and the values are pointers to the actual data in the database. The downsides to the index is that it takes up more space as you are storing an extra data structure and writes will take longer as you need to update the db and the index data structure. Strong consistency means it basically is up to date always. Eventua
What is a protocol
An agreed upon set of rules that specify the behavior of a system.
Static array
An array of a predetermined size. You must define the size at the start when you initialize the array, not a python thing.
Dynamic Array
An array that doesn't necessarily have set size when initialized. Or maybe if you did specify the size, about 2 times the size you said will actually be allocated. This has all the same operations as a static array besides inserting at the end of the array that is different. This is what python has.
Greedy Algorithms
At each step you take the best step you can. The local maximum and hope that taking all the local maximums results in the global one. No back tracking. You look at things once. I guess it is possible for the best solution in a coding interview to be a greedy algorithm but that doesn't mean it always gives a perfect algorithm. It seems often enough some sort of sorting is used here. Greedy algorithm problems often (or always???) need a minimum or maximum result, they are optimization problems. There will often be some sort of constraint or constraints then after finding all solutions that meet those constraints finding the min or max solution depending on the problem. My understanding is evolving. It seems like the approach is you only consider options that feasible. Then you look for the most optimal. To find the most optimal order by what makes things more optimal then add them one by one till you are full up. I am not sure how general this solution is, maybe good?
Steaming
Can you be used when you updates fast instead of polling very frequently. This is basically done through a long lived connected through some sort of web socket. Streaming is the server pushing the data to the client. Where polling is the client is pulling data every so often from the server. So there is a bit of difference there.
Cascade delete
Cascade delete is a referential action related to the integrity constraints of foreign keys in a database. It is indeed as you described: when a row in a parent table (the referenced table in a foreign key relationship) is deleted, the cascade delete action will automatically delete the corresponding rows in the child table(s) (the tables that contain the foreign key).
Things to remember for System Design Part 1
Do I need to consider access control? For RDS or similar I should think about the tables specifically. You might think about Indexes here and foreign keys. When thinking about the backend breaking out the services at least a little to do different jobs, just like you normally would. Make sure to state things I am taking for granted. LIke if I am working with AWS I think scalability and what not for granted, but it is good just to mention that it is a thing. Rate limiting to I want to add that anywhere? Logging Monitoring? Mention where I assume there is auto scaling or how scaling will happen. I will take this for granted. Don't.
Behavioral Question: How do you think about receiving and giving feedback? Describe a time when you received tough feedback and/or a time when you gave tough feedback. How did you react to it? How did you give it?
Don't take it personally if someone is giving it to me. This is my chance to grow. It shows they care, supposing it is in good faith. Giving make it clear this isn't something personal. Giving or recieving hard feedback, it doesn't start in that moment. It started when your relationship started. If you have a good healthy working relationship then you can give or recieve the feedback. Otherwise it will not go over well. I believing in being very positive and suppotive and also clear and direct when feedback needs to be given. Working with JR member of the team, pair programming. Not large but small and fairly often. Brainstorming ideas together. When there was an issue always high light the good in the idea. And point how help it is good. Then guide it a more optimal solution. I think it is important to be encouraging always. Stay positive and build that good relationship. I could also discuss working with the mentees at codedays. About keeping our schedule. About diagnosing why it was not working. And creating actionable steps they can take to then hit the goals we had established. An example was moving s3 data to the back up buckets. TB of data. He wanted to do python script but that would take too long, many days to complete. I suggested GO which had much more concurency. We did that and it went much faster.
Queue High level
FIFO (first in first out). Think of this like the line at the bank. The first person that was in the line will be the first one to be serviced. This is implemented as a linked list Normally enqueue and dequeue are the operations
Behavioral Question: Describe a challenging project that you worked on. Why was it challenging? What was your role in the project? How did you deal with the various difficulties of the project?
Farmlet! Farmlet latency. Directory of Eng asked someone to look into it. I volunteered. It took several weeks. This had been a hidden problem for years. Legacy code. I was able to come to understand the code the issue. Because of this I became the expert for this system in the company. It was critical to our main product as well. It is the reason I been kept on the transition team. The solution came in phases. A full rewrite would be ideal but that was not practical. An immediate fix was change how the service scaled. Then we were getting read for a middle term solution decomposing the service in 3 parts. I was able to architect this solution. We didn't end being able to do this.
Behavioral Question: Describe a time when you had to deal with an outage at work. How did you handle the situation? What steps did you take after the issue was resolved?
Farmlet! Farmlet latency issues. Give what the pager duty alert was, I was on call. What was my high level for delving into the issue? Farmlet often had latency issues. I got a call when I was starting to really get into the weeds with this legacy code base. It had a large amount of latency in its requests. So I went and looked at the service and the logs. I noticed some weird gaps in the logs. I thought the system might be over loaded. So I scaled in extra instances. After observing the latency for a while I felt good that it was going down and ok. Wrote post mortum The future work here was to fix farmlet. This issue was starting to be come more common as we were getting some change in traffic patterns so there were spikes of latency. This resulted in the changing of the scaling policy for the asg. Then designing more log term solutins as well.
Behavioral Question: Describe a time when you went out of your comfort zone. Why did you do it? What lessons did you learn from the experience?
Farmlet, volunteering to work on the latency issue and other similar work from the director of engineering. I learned a lot of new things about this service, its frameworks, concurrency in python. And I ended up being coming the expert in this area. It was good. If I want to I can get into anxiety and how I have had a hard time in the past. But that when I have taken the leap to come out of my shell it has been good and I have been consistently doing that now.
Proxies
Forward Proxies are often just called proxies. The job here is to forward requests to desired destination. The client will make a request to the proxy then the proxy makes the actual request to the backend server. A VPN is kind of like this. It can be used to hide the client to hide who made the original request. I don't think this will be used all that much in system design. Reverse Proxies The client will want to make a request to the server but actually there is a reverse proxy in the middle that the client doesn't know is there. So the clients request goes to the reverse proxy then the server. The reverse proxy can filter out requests or maybe have a cache attached to it. It can also be used as a load balancer. It can be used or analytics or logging gathering.
Arrays Big O (dynamic has some specific cards for it)
Get = O(1) Set = O(1) Initialize = O(N) (for static where N is the size) Travers: time = O(N) space = O(1) copy = O(N) insert = O(N) time because this is static and contiguous you have to copy the whole thing and find a new place to put it with the new element. This is true of an insert anywhere in the array. pop = N(1) if it is at the end, else where it is O(N)
Linked List O(N)
Get and Set = O(i) time complexity, i is the index, O(1) space init = O(n) space time copy = O(n) space time traversal = O(n) time, O(1) space insert = O(1) space time, this is when you have node preceding it or the head. Other wise you must traverse the linked list till you get to the node before then you insert it. So if you don't have the node before it is O(i) time, O(1) space.
HTTP
Hypertext Transfer Protocol This works with requests and response
Behavioral Question: What aspects of software engineering do you think you're very good at? What about areas where you'd like to improve? How do you plan on improving?
I think I am very good at looking at the big picture. I am good at seeing the business need at how priorties fit together. The large scale architecture of system and how we can update and modify to meet or goals. I am less good at some of the smaller more minute details. What should the name of the variable be, should this method be in this class or the other. I want to be better by practicing more. Working more on these experiences. I have been slowly reading the book clean code to get better at this.
one-to-many relationship and similar
IDK go learn about it!
whats up with python numbers?
IDK learn How big are they? How big can they be?
Immutable Infrastructure
Immutable components that are replaced in every deployment rather than being updated in-place. The practice you're referring to is known as "immutable infrastructure." In this approach, servers and other infrastructure components are never updated after they are deployed; if you need to make changes or updates, you simply replace them with a new instance that includes the desired changes. This concept is rooted in the principles of treating infrastructure as code and automation, which are core tenets of the DevOps philosophy.
Hash Table Complexity
Insert = O(1) time, Search = O(1) time, Delete = O(1) time All of the above are O(1) on average but in the absolute worse case (like everything collides) they would be O(n) as you travers the linked list. But hash functions are so good that basically we just consider all these O(1). The actual hashing function is generally considered to be O(1) A note on inserting. Eventaully the underlying array will get fully. It will expand like a dynamic array and all keys go through the hash function again. But inserting will be O(1) still for the same reason it is for a dynamic array where basically O(N)/N about equals 1. So O(1) For the actual storage of a hash table it is O(N) because while we will store the values we generally point to the keys so we are not actually storing those. Though I guess even if we did that would still be O(2N) -> O(N)
Stack and Queue O(N)
Insert O(1) space time Delete O(1) space time Search O(N) time and O(1) space Peek O(1) space time (you just look at the last element but don't remove it)
MapReduce
Is for processing very large amounts of data across multiple machines. Parallelizing the work. There are two steps for data processing to map the data then reduce it. Thus the name. Map is some function that splits the data into key value pairs. Reduce then groups those key value pairs to reduce to some result. There is a central control plane that coordinates all the efforts. You send the map function to the data, not the other way around. If something breaks during these processes the idea is to just do it again, so it is important the map and reduce parts are idempotent. This was like the only way to do this years ago I am fairly sure we have other ways these days, but IDK how they compare.
Graph high level
It is a collection of vertices (aka nodes) that might be connected via edges. Some graphs have direction, some don't
Hash Table high level
It is an array that contains multiple linked lists (when using chaining as the solution to handle collisions). Each node in the linked list is some sort of object that contains the key and value pair.
What is a char in python?
It is just a string of length 1. No official data type
Dynamic programming
It seems finding sub problems then using the answers to those to build the actual answer is pretty core to this approach. Also can often involve sorting. Used to solve optimization problems
Behavioral Question: Why do you want to work at <company-name>?
It would probably be good to mention the tech stack that I like to work with exciting tech stack they have. Do reading on other things about there company, maybe the mission or the product. I think it is ok to mention working remote as well.
Binary search
Iteratively O(log(n)) time O(1) space Recursive O(log(n)) time O(log(n)) space (call stack) General logic: Have pointers to the start and end of the section you are looking at. Then find the middle. Then adjust the section you are looking at based on if the middle is high or lower then what you are looking for. Or if it is what you need then you are done.
Behavioral Question: How would you go about distributing work for a project across a team of software engineers? If you've led a project in the past, describe what you did.
Key to distributing a project is first planning it out. If you have a clear idea of the work that needs to be done that will better enable you to spread the work out across team members. Breaking things out into tickets. THe smaller the better while not being too small. A challenge at my last job is while we didn't often have one big effort we were working on but several smaller projects and had high priority things coming in. It was my job to lead sprint planning. I had to balance what they were working on verse the priority of the up coming work. I had to look at deadlines and see how that compared to the current tasks. Also keeping and eye to being too disruptive if possble to there current work so they can get more done. I needed to consider areas of knowlege. While we tried to not have silos still some people had more experince in somethings rather then others.
Stack high level
LIFO (last in first out). The last thing you put it in is the first one you take out. Think of this like a stack of dishes. You would take the top one off which is the last one you put there. This is implemented as a dynamic array Normally push and pop are the operations
What is pagation?
Look it up how to implement
Increasingly bad time/space complexity
O(1) < O(log(N)) < O(N) < O(N*log(N)) < O(N^2) < O(2^N) < O(N!)
Append to dynamic array complexity
O(1) Time, this is using amortized analysis. Where does this come from? When when there is more room in the dynamic array at the end appending is O(1) you just slap that value at the end. Easy. But what happens when you have used all your extra room. You have create a new array that is about 2 times as big. So you need to copy your current array and put it some where else with about 2 times the size and delete the old data. So each time you do this the series of operations will look like this. N + N/2 + N/4 + N/8 ... 1. This is going backwards. Meaning if you started from 1 then go up it is the backwards time complexity operations. Well that series converges to 2N. Which is do O(2N) -> O(N). So we did a total of O(N) work over N operations so it is O(N)/N = O(1). So appending a value to the end of the dynamic array is O(1). Inserting anywhere but the end is still O(N)
Python slice of a list
Syntax: sublist = original_list[start:stop] start is the starting index (inclusive). stop is the stopping index (exclusive). to go from the start of the list sublist = original_list[:3] to go from middle to end sublist = original_list[4:]
1 to 1 relationship
One entity can be related to only one other entity and vice versa Like a marriage One husband and One wife, they can't have multiples. A few more examples: User and Profile Employee and Parking Space National ID and Citizen Design: Often this kind of relationship is done just in one table, as multiple columns. This might be a primary key and a unique column. But it can be across tables if needed. This is better when each entity has multiple attributes you want to store about them
HTTP put vs post
Post is to create a new resource Put is to update one or create it if doesn't exist
Time series DB
Prometheus
What is a rest api?
REST API is way for applications to communicate over http/s. Often in JSON format it can be things like xml as well. It is stateless, each request contains all the info it needs. It implements verbs like get, post, put, delete.
Load Balancers
Round Robin, make the rounds Weighted Round Robin, so that some servers might get multiple requests in a row because they have a higher weight. Based on Load, it basically has metrics it can track for which server is being used the least and set it more traffic IP based, where it hash the IP address then they will consistently go to the same server. This is important if you are caching things on each server. So then if a client wants several of the same thing (which is more likely for one person) then the stuff will be cached already. Path based: so depending on the endpoint the client is trying to access you send it to different servers
Publisher Subscriber Pattern
So things like Kafka or AWS SQS. Publishers push messages on to topics (or channels) then subscribers get the messages. These messages are stored in persistent storage. And these systems garuentee at least once deliver, it might be more but it will be at least once. This can be accomplished by the subscriber acknowledging that it got the message. But the ack might get lost so the message would be sent again in that case. So It is atleast once. Ordering is guaranteed here too. Because these messages might be sent. more then once, the result of the subscriber receiving a message needs to be idempotent.
IP
Stands for Internet Protocol AN IP packet is what is sent between machines There is a header section containing the meta data like where the packet is going and what version of IP it is using. It is quite small. Then there is the body which contains the data being sent. IP packets can not be very big so there is basically always multiple used. But to ensure all the packets arrived and are in the correct order, is not something IP can handle. That is the job of TCP.
Configuration
Static Config This is config that is bundled with your code so if you want to update the config you and to redeploy. This is seems bad on average. The advantage here is that it might have more thorough review and maybe testing. Dynamic Config The config can be deployed independent from the code. So it can be updated more often. This can be less safe. There is more complexity here as this must be in a db or something and then service needs to be able to pull it and update itself with the new values. It is worth noting I think you can make Dynamic pretty safe. A lot of good stuff with the static config can be made to the dynamic configs.
Tree O(N)
Storage O(N) N is the number nodes Traverse through all the tree is O(N) time Binary Search Tree search is O(log(N)) if the tree is balanced
Graph O(N)
Storing a graph is O(E+V) edges + vertices Traversal is done via depth or breadth first search. I will have separate cards for these. Either of these traversal methods will be O(V+E) time You can also add and what not to the graph but the complexity of those operations will be pretty specific to the situation. SO basically graphs are pretty specific to the given scenario.
Behavioral Question: Describe a time when there was a conflict within your team. How did you help resolve the conflict? Did you do anything to prevent it in the future? I think the following question can apply to this example as well. Describe a time when you strongly disagreed with a coworker about an engineering decision. How did you go about making the final decision? What did you do after the decision was made?
Talk about depdency pinning with dippery. We disagreed on what the best approach was. We had his points and I had mine. I think we both made good points, but we did ultimately disagree. I eventually went with his suggestion with had some good points even thought we saw different. Clarify I didn't think his solution was bad just different with different strengths and weaknesses. Then afterwards I came to him and discussed an optimal solution we could not do at the time he had suggested and stressed how I thought it was very good idea. So we ended in a good place. I could mention that I assume good intent from the other person. And try to find common ground. Both of these things can be seen in my example. I thought he just thought his was better nothing against me. Afterwards I went out of my way to discuss our common ground. Also we don't attack people, discuss ideas. And when people are disucssing my idea that doesn't mean they are attacking me, it is about the ideas. Having that separation is important.
What does ** in python?
The ** operator in Python has a couple of distinct uses related to dictionaries: Dictionary Unpacking: The ** operator can be used to unpack the contents of a dictionary and pass them as keyword arguments to a function. When used in a function call, each key-value pair in the dictionary corresponds to a named argument and its value. def greet(name, greeting): print(f"{greeting}, {name}!") person = {'name': 'Alice', 'greeting': 'Hello'} greet(**person) # This will output: Hello, Alice! Merging Dictionaries: As you've seen, ** can be used to merge two or more dictionaries into a new dictionary. If the same key appears in multiple dictionaries, the value from the last dictionary will be used. dict_one = {'a': 1, 'b': 2} dict_two = {'b': 3, 'c': 4} merged_dict = {**dict_one, **dict_two} # merged_dict is {'a': 1, 'b': 3, 'c': 4} Creating New Dictionaries with Additional Pairs: You can use ** to create a new dictionary based on an existing one, with additional key-value pairs. original = {'a': 1, 'b': 2} new = {**original, 'c': 3} # new is {'a': 1, 'b': 2, 'c': 3} Keyword Argument Variadic Functions: In function definitions, ** can be used to accept an arbitrary number of keyword arguments. The function will receive the arguments as a dictionary. def print_kwargs(**kwargs): for key, value in kwargs.items(): print(f"{key}: {value}") print_kwargs(a=1, b=2) # Output will be: a: 1 \n b: 2 The ** operator is versatile and provides a convenient way to work with dictionaries and functions in Python. It's part of Python's approach to handling arguments, which also includes * for unpacking iterables and for variadic positional arguments in functions.
Behavioral Question: Describe a time when you made a mistake. How did you deal with the repercussions of the mistake? What lessons did you learn from the mistake?
The time Zach commited all our keys to ansible and I was a code review and I missed it. I found the mistake and brought it forward. Then I led the effort to fix the issues. So going back and removing it from the code base and removing the pr from github. I need to be more thorough in code reviews specifically with large files they need to viewed. These credentials are in vault files. This is not good. We should not have had them there and this highlighted this. THey should have been in some centeralized tool that is actually designed for credential management. Blameless culture
Python not operator
There is no ! at least for not ing a bool, you must do not for example blueShirtSpeeds.sort(reverse=not fastest)
Behavioral Question: Imagine you and your team are in the middle of a major project at work, with many moving parts, complicated context, a lot of work, etc.. A new software engineer joins your team, and you're tasked with onboarding them; what do you do?
There should be a high level effort given to this so they can become productive quickly. Meeting with them to go over the ropes is super important. Having good documentation is super important here. Meeting giving context and assigning them something small. The check back to see how they are doing and asnwer any question. Then repeat this. It will probably be at a higher frequency at first and slower later. Be careful to with what work you first give them. Something that isn't too bad to just get used to working with everything. Do pair programming or work that is relavant to what they will be doing. This will help them. Start them small and let them build. Be open to answer questions they might have. Walk with them till they feel comfortable.
Tell us about youself?
Think about this and write it down
Peer To Peer network
This can be useful if you need to transfer a lot of data to a lot of servers. It is complex though. At least in the example that was show in the video you can send things in small chunks. So that the peer to peer network can start sending those small chunks right away pretty much. SO there is almost an exponential growth of how the data is being sent around. I mean basically you get many many machines (what is in the peer network) to all work for you instead of just the normal work servers. BUt how do the peers know who to send data to? One way is with some sort of centeralized DB or similar that can organize where stuff should be sent. The centeral db is called a tracker. There is also the idea of a gossip or epedemic protocol where each machine basically has a hash map of what peer has what piece of the data. So then each time they talk to another peer they can build up there hash table with the info the other peer has. This hash table thing is called a DHT or distributed Hash Ta
Selection Sort
Time = O(n^2) Space = O(1) Basically you go through the array find the smallest value. Move it to the start. Then move one past the smallest one you just place at the start and do this again process again. So you basically find the second smallest value, then move it to the second spot. This is basically the most niave way to sort a list. Find smallest put it in first. Find second smallest put in second. Continue till done. So not very effective This is my implementation def selectionSort(array): start_index = 0 smallest_num_index = 0 cur_index = 1 for i, _ in enumerate(array): starting_index = i cur_index = i smallest_num_index = i while cur_index < len(array): if array[cur_index] < array[smallest_num_index]: smallest_num_index = cur_index cur_index += 1 temp = array[starting_index] array[starting_index] = array[smallest_num_index] array[smallest_num_index] = temp return array
python sort space time complexity
Time Complexity Both sort() and sorted() have a time complexity of O(n log n) on average. This complexity is derived from the Timsort algorithm that Python uses under the hood for sorting. Timsort is a hybrid sorting algorithm derived from merge sort and insertion sort. It's designed to perform well on many kinds of real-world data. Best Case: O(n) - This best-case scenario occurs when the data is already mostly sorted. Timsort can recognize this and perform optimizations that lead to a linear time complexity. Average Case: O(n log n) - For most unsorted or randomly sorted data, Timsort performs at this efficiency. Worst Case: O(n log n) - Even in the worst-case scenarios, Timsort manages to keep the time complexity within O(n log n). Space Complexity The space complexity of the sort() method is O(1), which means it sorts the list in place and does not require additional space that depends on the size of the input list. However, for the sorted() function, the space complexity is O(n), because it returns a new list containing all the sorted elements, which requires allocating additional space for the entirety of the input data.
depth first search really for a tree
Time O(v+e) for a general graph or least more general then a tree Space is O(v) in the problem I did we stored an array of the result so that is an easy O(v). But lets say you just print it. It is still O(v) because of the call stack. See the image of the worst case senario, ignore extra e. To clarify for a tree like BST time would be O(n) which is the number of nodes. The above is for graphs in general. Algorthim (for a tree) def depthFirstSearch(self, array): array.append(self.name) for child in self.children: child.depthFirstSearch(array) return array
Python sorting an array
To sort an array in place: array = [ {'key': 'a', 'key2': 3}, {'key': 'b', 'key2': 1}, {'key': 'c', 'key2': 2} ] array.sort(key=lambda x: x['key2']) print(array) How to think of the lambda function: def function(x): return x['key2'] If you want to create a new array not change the one in place: sorted_array = sorted(array, key=lambda x: x['key2']) Sorting a simpler list: numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] numbers.sort() print(numbers) Sorting in Descending Order To sort the numbers in descending order, you can pass the reverse=True argument: Using sorted(): sorted_numbers = sorted(numbers, reverse=True) Using sort(): numbers.sort(reverse=True)
TCP
Transmission Control Protocol Built on top of IP to solve having multiple packets. So you can ensure all have arrived and they arrive in the correct order. If we can't get all the packets through it will also know that and be able like say hey all didn't arrive despite my best effort so this data will not work. In the IP packet there is also a tcp header section which contains things like the ordering of the packets.
String O(N)
Traverse O(N) time O(1) space Copy O(N) space time get O(1) space and time Strings are immutable in python So when you "modify a string" you are really making a new one so it is O(N) time O(1) space Concatenate will be O(n+m) space time for space I guess it depends on how you look at it. In the end the total size of the new string is O(n+m) Since in python or similar languages strings are immutable if you have to do a lot of string manipulations you will want to first convert the string to an array which is mutable.
Tree high level
Trees are graphs Tree has a root or the top node Each node has child nodes The nodes flow down and can't flow back up No cycles A node can only have one parent The whole tree is connected Binary tree is where each node has at most 2 child nodes A tree is considered balanced if it keeps about O(log(N)) time complexity when searching Any node that goes from the root to that node is a branch Bottom nodes are leaf nodes There are levels in a tree A complete tree is when everything is full except maybe the last level where it is filled left to right A full tree is when all the nodes have no children or they have the max number of children they can have Notes something can be full and not complete Depth is how far down a node is A perfect tree is one that is completely filled up
Behavioral Question: Imagine you had a low performer on your team. How would you handle the situation? What would you do to help them?
Try to understand why they are struggling, is it lack of knowledge or motivation? Offer to work with them to help if they are struggling with something to give extra support. Make yourself available to them. Be thoughtful to find things they can take the lead on. Bolstering confidence can be a big help.
Hashing for system design
Using modulo of the number of servers does not work because you will probably have servers scaling in and out. And if you need something like caching to work this will not be ideal. Consistent hashing: Think of the circle and placing servers on there and requests on there. Then the requests go to the next closest server that is clockwise. That is the general idea. Then you can add and remove servers in this circle as they scale in and out and it mess up everyone of them like modulo would have. It might just shift one or two. And you can also accomidate having a server you want more traffic to go to by adding that server to the server multiple times by passing it through the hashing function more times. Rendezvous Hashing: Basically you hash the requests then calculate which server (based on its hash) is the best match, and a second and a third and so on that you have a list of best matches. Then you assign the request to the top server, the best match. But what is good about this i
python naming conventions
Variables: employee_name, total_amount, user_id Functions: calculate_total(), print_message() Classes: Employee, BankAccount, DatabaseConnector Constants: MAX_OVERFLOW, TOTAL_WIDTH Module Names: json, my_module Method Names (inside classes): start_engine()
Map and reduce in python?
What do these functions do? Look it up
When is the complexity O(log(n)) on a high level
When each time you are cutting all the given elements in half. So if 32 then 16 then 8 then 4 that is being reduced at a rate of log(n). Similarly when you are doing things with binary trees it can be log(n) since you can cut the options in half each time.
Client server model
When the client makes a request to the server it first makes a dns query to find the ip address of the server so it make the request to the actual address. Then the request made is an HTTP request to the actual server. The server lists to a specific port. There are something like 16,000 ports on a given machine. Depending on the protocol you are using to talk you can know what port to send the request to. HTTP: port 80 HTTPS: port 443
Database Replication
Where you have a copy of a database. It can be used also to have something in stand by ready to take over if the main one goes down. Where you want your replica exactly up to date with the main db, if the write fails to replica it will fail to the main as well because we don't want them to get out of sync. It might also be that the replica gets updated async if it isn't super time sensitive that it always has the most up to date stuff. THen updates only happen maybe every few seconds or something.
Types of cache
Write through cache When you go to update the data at the source you also update it on its way there in the cache so the cache and the source maybe a db are both up to date. Write back cache The data is updated in the cache but not the database. Then async later maybe on some interval the database is updated with the new value. But that makes the original request not have to go to the database. Sometimes it will be super essential that caches are kept super up to date, like comments on a youtube video. Other times not so much like views on a youtube video. So we can handle cache differently depending on the need.
python whole division and remainder
Yes, in Python, you can perform whole division (also known as integer or floor division) and obtain both the quotient and the remainder in a single operation using the `divmod()` function or by using the `//` (floor division) and `%` (modulus) operators separately. Here are both methods illustrated: ### 1. Using the `divmod()` function: The `divmod()` function takes two arguments, the dividend and the divisor, and returns a tuple containing the quotient and the remainder. ```python dividend = 17 divisor = 5 quotient, remainder = divmod(dividend, divisor) print(f'Quotient: {quotient}, Remainder: {remainder}') # Output: Quotient: 3, Remainder: 2 ``` ### 2. Using the `//` and `%` operators: You can use the `//` operator to perform floor division and obtain the quotient, and the `%` operator to obtain the remainder. ```python dividend = 17 divisor = 5 quotient = dividend // divisor remainder = dividend % divisor print(f'Quotient: {quotient}, Remainder: {remainder}') # Output: Quotient: 3, Remainder: 2 ``` Both methods will give you the same result, so you can choose based on your preference or the specific requirements of your code. The `divmod()` function may be more concise if you need both the quotient and the remainder, while the separate operators allow you to obtain just the quotient or just the remainder if that's all you need.
Do arrays have to allocated back to back? In contiguous memory?
Yes, otherwise it is a linked list. For some reason I didn't remember this.
If you have 2 variables like array of length N and array of length M do you keep both N and M in the space time analysis?
Yes, so it might look like O(m+log(N)) just a random example
Leader Election
You have multiple services doing the same thing, that can do the same thing. But you really want just one doing it at any given time. The rest are just waiting to something if needed. Then you do leader election to elect that one server that is actually going to be doing something. Actaully having all the servers agree on who the leader is can be hard so you often use tools for this. You might use Zookeeper or Etcd So these tools you might use. THey will have a key value pair containing who the leader is. And becuase these tools are strongly consistent and highly avalible you can feel good that they are up to date with the correct leader.
Database sharding
You partition your database data up so that several sort of mini databases have parts of the data. This is a way you can do horizontal scaling so to speak for data bases. How the data is distributed between shards is important. Some sort of hashing probably makes sense.
many-to-many relationship
between two entities in which an instance of one entity is related to many instances of another and one instance of the other can be related to many instances of the first entity Think of a students and classes. The students each have multiple classes. But each class has multiple students. Some other examples: Authors and Research Papers Actors and Movies Design: To implement this you will need 3 table. Lets say we are talking about classes and students. You will have class table and a student table. Then you will have an intermediary table which holds references to both of those tables. So each row will have a class id and a student id. These are forgien ids. This way you can have multiple rows with the same class or multiple rows with the same student but each class student pair is unique. So we can properly map these relationships.
python create set and check if value is in set and add a num to the set
create an empty set the_set = set() # I'll demonstrate how to create a set in Python and check if a given number is in the set. # Create a set example_set = {1, 2, 3, 4, 5} # Check if the number 3 is in the set number_to_check = 3 is_number_in_set = number_to_check in example_set # Adding a number to the set number_to_add = 6 example_set.add(number_to_add)
how to remove an element from a list
del the_list[2]
dict initialize
dictionary = {'key1':1, 'key2':2, 'key3':3}
1 to many relationship
each object in the origin table can be related to the multiple objects in the destination table Like a user and comments on a website. That 1 user has many comments, but the comments have just one user. Like a professor and classes. That profess teaches multiple classes but the classes have just 1 professor. Some more examples: Author and Books Teacher and Students Design: You will have two tables here. The 1 part of the relationship doesn't have the many part in it at all. But the many table each row has a forgien key pointing to the 1 table. So you might have a professor table which has a prof name and prof id and that is it. Then you will have a classes table. The classes table has the class id and the class name and the prof id. This way you can have multiple classes taught by the same prof. To find all classes taught by a given prof you look at the many table and filter by all classes where the prof id matches the one you are looking for.
Python generate a range of values
for i in range(1, 8): print(i) start is inclusive end is exclusive To count down for i in range(10, 0, -1): # From 10 to 1 print(i) You can also do this to count down for i in reversed(range(5)): print(i) which will give 4 3 2 1 0
Python if, else if and else statement
if condition1: # code to execute if condition1 is True elif condition2: # code to execute if condition1 is False but condition2 is True elif condition3: # code to execute if both condition1 and condition2 are False, but condition3 is True # You can have as many "elif" clauses as you want else: # code to execute if all the above conditions are False
combine a list of strings
it takes the form of delimiter . join(the_string) Examples Joining Without Delimiter list_of_strings = ['hello', 'world', '!'] combined_string = ''.join(list_of_strings) print(combined_string) # Output: 'helloworld!' Joining With a Space Delimiter list_of_strings = ['hello', 'world', '!'] combined_string = ' '.join(list_of_strings) print(combined_string) # Output: 'hello world !' Joining With a Custom Delimiter list_of_strings = ['hello', 'world', '!'] combined_string = '-'.join(list_of_strings) print(combined_string) # Output: 'hello-world-!'
check length of list
len(array) You use this also for checking if it is empty len(array) == 0
python convert number to string
numb = 44 string_numb = str(44)
check if key in dict
my_dict = {'a': 1, 'b': 2, 'c': 3} key_to_check = 'b' if key_to_check in my_dict: print(f"Key '{key_to_check}' is in the dictionary.") else: print(f"Key '{key_to_check}' is not in the dictionary.")
iterate over each char in a string
my_string = "Hello, World!" for char in my_string: print(char)
python infinite
positive_infinity = float('inf') negative_infinity = float('-inf') for whole numbers the best we can do is positive_infinity = 10**100 # A very large number negative_infinity = -10**100 # A very large negative number
To create a list of all keys that have a value of True in your repeats dictionary
return [char for char, value in repeats.items() if value is True]
turn string into set
string = 'aabbcc' string_set = set(string) # result = 'abc'
python initialize a list with all the same value
value = 2 list = [value] * 5 will result in [2,2,2,2,2]
python iterate dicts
work_days = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0} iterate over keys for day in work_days: print(day) iterate over values: for hours in work_days.values(): print(hours) iterate over keys and values: for day, hours in work_days.items(): print(f"Day {day}: {hours} hours")