SA General Questions
What are the Cons of Block Storage
Can only be accessed by one server, size limitations, metadata limitations, more expensive
In a distributed system, what is concurrency
Concurrency refers to the system's ability to handle the access and use of shared recourses. This is important because if there is no measure implemented it is possible for data to get corrupted or lost by two nodes making different changes to the same resource such that the system can carry this error through different processes causing an incorrect result.
What is a Web App Framework
Frameworks are, in short, libraries that help you develop your application with ease
What is RPO
RPO is the amount of data loss or info that will have to be re-entered during a system outage
What are SAN, NAS, and DAS?
Storage array network Network attached storage Direct attached storage
How does the process of translating a hostname to an IP address work?
A client types google.com into the search bar and which initiates a DNS lookup. A recursive query is sent to a resolver which then has the onus to either to provide an ip or an error. The resolver, provided by the ISP or an open source resolver, then submits a series of iterative requests to different names servers starting with the root server, then TLD server, then to authoritative server and returns the ip The resolver then caches the A record for faster retrieval
When you are trying to build a high-volume distributed web application, what are some of the key areas you have to be concerned with?
A strong API, stateless, caching when possible Redundant resources. Heterogeneity, Openness, Transparency, Fault tolerance, concurrency
What is a superblock
A superblock is a record of the characteristics of a filesystem, including its size, the block size, the empty and the filled blocks and their respective counts, the size and location of the inode tables, the disk block map and usage information, and the size of the block groups.
What is a virtual interface?
A virtual network interface (VIF) is an abstract virtualized representation of a computer network interface that may or may not correspond directly to a network interface controller. Resides at the OS level Os kernel keep track of VIFs so communication over VIFs can be independent of physical network interfaces
What are some common Web App Frameworks
ASP.NET, Play (Java), Angular/Ember/Meteor (JS), Play!/Lift (Scala) CakePHP, Joomla, Zend (PHP), Django (Python) Rails/Sinatra (Ruby), etc
How would you increase I/O capacity to a server
Add a larger cache will improve the read write performance Use an SSD and HDD will be slower Change RAID type Place commonly accessed data on hotter storage devices Use a scheduler like noop Simple queue Wont have to do full retrievals for commonly accessed data
What is an inode
An inode is a data structure on a filesystem on a Unix-like operating system that stores the information about a file except its name and its actual data. It contains information about the file, such as its size and location the disk
In a distributed system, what is openness
An open distributed system offers services according to clearly defined rules. An open system is capable of easily interoperating with other open systems but also allows applications to be easily ported between different implementations of the same system
Tell me some ways you'd stop/start or configure an Apache webserver
Apache: /etc/init.d/apache2 reload/stop/start, service apache2 restart, etc
What is load balancing at Layer 7
At the application layer and the LB can decrypt the data sent through the request to be smart about where to direct the data Can have caching, smart balancing, can support a microservices architecture Must share the TLS cert and has multiple TCP connections, involves decryption of data
What is load balancing at Layer 4
At the transport layer so the LB has limited knowledge of the data being requested and transmitted which is the port and the IP of the client Only one TCP connection is made Is secure and efficient and only opens one TCP connection Cannot handle microservices architecture well No caching or smart balancing
What are some tools available in a CI workflow
Automation of the parts of software development related to building, testing, and deploying
What are the Cons of Object storage
Bad for modifiability, OS cannot access block storage like it can an attached disk, will be slower than attached Block storage
Can you talk to me about how you optimize your code for performance?
Benchmarking and Big O notation to analyze time and space complexity
What is continuous integration?
CI is committing code several times per day where it is then built and "integrated" with other developer's code so that errors and bugs can be found quickly. And to ensure that there are fewer conflicts and that bugs are caught
What techniques do you use to insure that your code has a low bug rate and performs well under load?
CI, Test Driven Development or automated testing (such as unit testing and more) hopefully some form of performance testing.
What are the Pros of Object Storage
Cheaper, good for read heavy workloads, good metadata, can be used to create datalakes, virtually unlimited in size
You've been tasked with identifying the level of efficiency of hardware use in your datacenters, how would you approach this?
Check the performance of my resources Check things like Cpu utilization, energy usage, Survey end users to see how performance has been
What is a CIDR block?
Classless inter-domain routing Network prefix and host identifier Uses a bitmask to denote where the network prefix is and what is the host id Top down approach When written, the number after the slash represents how many bits in the mask are 1
What's the difference between a scripting language and a compiled language? Can you tell us the pro and cons of each? (CL)
Compiled lang has a compile step when code is translated to machine language Tends to be faster Additional time needed to complete the entire compilation step before testing Platform dependence of the generated binary code
Without using caching, what are some ways to speed up delivery of assets to end users from a web server?
Compress/minify javascript and CSS Fewer HTTP requests Optimize images
Imagine you are responsible for ensuring that a particular blog post on your company's site could handle an extreme amount of traffic when it was published, how would you handle the preparation for that traffic?
Consider using a S3 static website Use caching to lower latency and to relieve stress on DB Use multiple instances behind a load balancer with an auto scaling group Active-Active replication or read replicas of db
Have you ever used Infrastructure/System Monitoring tools?
Datadog vROPS New relic SCOM Can monitor resource utilization and health of resources across different platforms for holistic view on system performance
What is denormalization of tables
Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database Precompute common joins into additional tables for read efficiency
What are the Pros of Block Storage
Faster performance, better for R/W, easier to modify files
In a distributed system, what is fault tolerance
Fault Tolerance is the ability for the system to handle such failures, this is achieved by using recovery and redundancy. Recovery is where a component will act in a predictable, controlled way if it relies on a component. Redundancy is where crucial systems and processes will have a backup that takes over if a system fails
What does it mean to federate an identity?
Federated identity management relies on strong agreements. Identity providers and service providers develop an understanding of what attributes (such as your location or phone number) are representative of who you are online. Once those credentials are verified, you're authenticated across multiple platforms. Security Assertion Markup Lang OAuth OpenID
What are some Version Control Systems
Git Each person edits his or her own copy of the files and chooses when to share those changes with the rest of the team by commiting and pushing changes back up to the main repo
You are in the process of migrating data centers. You're seeing poor throughput when transfering a lot of data from one server in one data center to another server across the country. You've got a high bandwidth and high latency link between the two. What steps could you take to improve the throughput?
I would look to see where there are bottlenecks
Tell me some ways you'd stop/start or configure an IIS webserver
IIS Start->Run->IISReset UI: IIS Manager->Actions->Start/Stop, etc CLI: net stop WAS, net start W3SVC
If you were to design a distributed and scalable system, what are the key issues to consider in this design and why?
If i were designing a scalable system, I would ensure Heterogeneity, Openness, Concurrency, fault tolerance, and transparency
You just deployed an virtual machine in the cloud. You configure your instance to be a webserver. You know you can access the internet from the instance as you downloaded the necessary packages. However, when trying to access your webpage, you receive an error message. How would you troubleshoot?
If in the cloud and behind a load balancer, check and see if the server is recognized and is passing health checks Make sure the server is listening telnet host port Makes sure the ports are open and the firewalls are allowing connections Make sure there is a route from the IGW to the node
In cryptography, what is the difference between symmetric algorithms and asymmetric algorithms?
In Symmetric-key encryption the message is encrypted by using a key and the same key is used to decrypt the message which makes it easy to use but less secure. It also requires a safe method to transfer the key from one party to another. Asymmetric uses a public and private key to encrypt data. Data is used by server's public keys and sent back to server so that only they can decrypt
What's the difference between a scripting language and a compiled language? Can you tell us the pro and cons of each? (SL)
Interpreted lang Is executed line by line More flexibility and is platform independent Execution speed is lower
What is iotop
Iotop is linux command that is a free utility similar to top command, that provides an easy way to monitor Linux Disk I/O usage details and prints a table of existing I/O utilization by process or threads on the systems.
What are ephemeral ports?
It is a temporary port used by the sender in some connections The sender, based on OS and some other factors, will pick a high range ephemeral port as a source port for connections
What is the difference between a layer 2 and a layer 3 network
Layer 2 switches only deals with MAC addresses so no IP addresses are known Layer 3 switches can do everything the layer 2 switch can do and also can as an IP routing table Can handle intra-VLAN communication and packet routing between VLANs
What are other ways you can secure login access besides a password?
MFA Biometrics touchID, faceID SSH
What is ioping
Measures input output latency on a disk as you would measure network latency with ping
How would you increase the size of an on prem DB
Microsoft SQL: In Object Explorer, connect to an instance of the SQL Server Database Engine, and then expand that instance. Expand Databases, right-click the database to increase, and then click Properties. In Database Properties, select the Files page. To increase the size of an existing file, increase the value in the Initial Size (MB) column for the file. You must increase the size of the database by at least 1 megabyte.
Without using caching, what are some ways to speed up delivery of assets to end users from a web server?
Minify/compress the files Js, css, html files Fewer redirects Optimize images
What is MPLS
Multiprotocol Label Switching (MPLS) is a routing technique in telecommunications networks that directs data from one node to the next based on labels rather than network addresses In an MPLS network, labels are assigned to data packets. Packet-forwarding decisions are made solely on the contents of this label, without the need to examine the packet itself Removes dependence on layer 2 of the OSI model and provides a unified data transmission service that works for packet-switching clients and circuit based clients Can carry many different types of traffic, such as packets, ethernet
In a datacenter/on-premise environment: what are the minimum requirements for making a website available on the internet?
Need a server that will serve the data Need any related storage and/or db to be up and running and connected to server Need to install code, dependencies, frameworks etc. and configs on server A public IP and a domain name Configure it to allow HTTP connections Enable allow inbound permissions on firewalls for access from public internet
Tell me a few ways passwords or credentials can be compromised, and how you would prevent it.
Phishing Social engineering Malware Dictionary attack Using a list of commonly used pw as a bank Brute force attacks
Can you talk about any layers of the OSI model?
Physical, data link, network, transport, session, presentation, application
Can you talk about the importance of internationalization or localization when delivering content to your end users?
Placing the data closer to end users improves user experience by lowering latency Can also ensure you are abiding by any kinds of laws governing data in a country
What is Readahead:
Readahead is a system call of the Linux kernel that loads a file's contents into the page cache. This prefetches the file so that when it is subsequently accessed, its contents are read from the main memory (RAM) rather than from a hard disk drive (HDD), resulting in
What is RTO
Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in continuity How much time can system recovery take after a notification of failure
How are certificates used to validate the authenticity of a server?
SSL handshake
Imagine your employees are spread worldwide. What methods might you employ to make sure only authorized employees could get content from your CDN?
Signed URL and cookies Another way you can help protect your content is to use signed URLs or signed cookies provided by CloudFront.
What is CI/CD, Explain its benefits on how development changes get moved into production
Solves the problem of integration hell which arises when hundreds of contributors are working on a project Uses automation to do things like testing and merges and conflicts to properly manage changes to enterprise level projects
What is Block Storage
Storage model that builds on contiguous sets of bytes called blocks that is attached to servers, typically has a max file size and is accessed via network protocols or direct attachment
What is Object storage
Stored as objects that have unique identifier. S3, Blob
What is a write buffer
Stores evicted data from the L1 cache in a buffer to perform faster loads from read misses
Can you think of an example of when you'd need to pass query-string parameters to a CDN?
Suppose that your website is available in five languages. The directory structure and file names for all five versions of the website are identical. As a user views your website, requests that are forwarded to CloudFront include a language query string parameter based on the language that the user chose. You can configure CloudFront to forward query strings to the origin and to cache based on the language parameter. If you configure your web server to return the version of a given page that corresponds with the selected language, CloudFront caches each language version separately, based on the value of the language query string parameter.
What is the purpose of garbage collection and what are some considerations for this technique?
The biggest benefit of Java garbage collection is that it automatically handles the deletion of unused objects or objects that are out of reach to free up vital memory resources.
Could you explain what "Tiered Storage" is and how it would be used?
Tiered storage is a method for assigning different categories of data to various types of storage media to reduce overall storage costs and improve the performance and availability of mission-critical applications. Matching different data requirements to different types of storage
In a distributed system, what is transparency
Transparency in a distributed system refers to the idea that the user perceives that they are interacting with a whole quantity rather than a collection of cooperating components. Transparency can be split into the following 8 sub-characteristics defined in following table.
You are designing an application that is going to deliver real-time media. Would you choose TCP or UDP for your transport layer? Why?
UDP because it can handle losses and when delivering real time media, TV show or movie or sports game, drops in definition and resorting to lower performance while still delivering an acceptable amount of media to keep up with the stream is better than high latency due to handshakes and retransmission delays UDP uses a checksum and it stateless. Can handle multicast support TCP treats comm as a stream of sequence of bytes. TCP requires a handshake and as a result has a slower speed of transmission Occasionally has need for retransmission or reordering "heavy" as it needs to set up socket connection and close it
Describe the difference between unicast and multicast. Can you explain when you would use unicast vs multicast transmission?
Unicast is a one to one mapping of device to device for transmitting data Scaling unicasting will be more costly on bandwidth e.g. websurfing, file tranfer Multicast is a one to many mapping of devices and is more efficient with bandwidth comparatively e.g. a switch
How would you help a customer with short life span content that needs to be massively distributed globally?
Use a CDN like cloudfront
How would you calculate RPO
Use factors such as Max data loss allowed before catastrophic fail, Cost of data loss, Industry specific requirements
Can you explain to me the concept of containerization?
Using small VMs to package code, configs, and dependencies all in one object to ensure reliability and consistency in deployment Multi tenancy on guest OS via software virtualization Docker Want to add a new service or feature to an existing application with minimal affect on the system as a whole Can localize a process as a microservice and attach necessary database right that all serving a single function
Can you explain to me the concept of containerization?
Using software virtualization to take a server allow multitenancy on one guest OS in the form of smaller "VMs" Docker All the code, configs, and dependencies all packed into an object Why use? Gives consistency and reliability when deploying apps in different computing environments How would you leverage Microservice architecture Better methods for testing and deployment, devs can spin up test environments on their pc
What is RAID
What is RAID Redundant Array of Inexpensive|Independent Disks Virtualization that takes multiple physical drives and combines into 1 or more logical units 1 -6 depending on how much mirroring or parity for the sake of redundancy and availability 6 can handle two failed drives 1 is the more performant
What is iostat
iostat is a linux system monitor tool used to collect and show operating system storage input and output statistics. It looks at how long devices are active in relation to their average transfer rates. Can use it to compare I/O performance between disks and make changes to balance this out
What are three different network routing protocols?
link-state routing, distance vector routing, exterior gateway protocols
In a distributed system, what is Heterogeneity
refers to the ability for the system to operate on a variety of different hardware and software components. This is achieved through the implementation of middle-ware in the software layer. The goal of the middle-ware is to abstract and interpret the programming procedural calls such that the distributed processing can be achieved on a variety of differing nodes.