Cloud Computing
Argue cloud security is better than enterprise
1) dedicated security experts 2) regular auditing 3) better support for data recovery 4) physical security (amazon building security) 5) amortized hardware 6) Patched OS's --> up to date 7) Blessed OS's --> verified 8) No public cloud security breaches known to the public yet 9) no possibility of disgruntled enterprise employees
Argue enterprise security is better than cloud
1) only my humans, no amazon employees 2) I know where my data is 3) more specific design (maybe you don't need something too sophisticated depending on nature of data) 4) faster, more specific reaction to breach 5) smaller attack surface 6) no wire attacks, no "in-flight" attacks
Why virtualization?
1) reduce capital expenditures (like buying hardware) and OpEx (operational expenditures). 2) Avoid downtime with VM relocation. 3) dynamically re-balance workload to guarantee application SLAs. 4) enforce security policy
MAC address
A MAC address is given to a network adapter when it is manufactured. It is hardwired or hard-coded onto your computer's network interface card (NIC) and is unique to it. The ARP (Address Resolution Protocol) translates an IP address into a MAC address.
Elastic beanstalk
A PaaS that we used in PA2 to run Docker containers. Takes care of provisioning ECS for you.
containers
A isolated system. Multiple containers are run on a single control host and access a single kernel. Because containers share the same OS kernel as the host, containers can be more efficient than VMs, which require separate OS instances. Containers hold the components necessary to run the desired software. Host OS constrains the container's access to physical resources (CPU, memory) so a single container cannot consume all.
EFS (elastic file system)
A mountable file system that can be stored on multiple VMs at the same time. Pro: Remote access; well-known interface (NFS: network file system). Con: Lack of structure; NFS scalability
DynamoDB
A noSQL database, which gives up some benefits of querying but gain scalability. Pro: scalability and performance. Con: Poor analytics/ aggregates; does not have a well known interface; may be difficult to see everything in it's entirety (can look at one entry really easily, but hard to get an overview)
public IP
A public (or external) IP address is the one that your ISP (Internet Service Provider) provides to identify the network to the outside world. It is an IP address that is unique throughout the entire Internet. Could have an IP address that never changes (a fixed/static IP address). But most ISPs provide an IP address that can changes (a dynamic IP address). A machine may or may not have this, but always have a private IP.
Script vs. program
A script is interpreted at runtime, a program is compiled. A script gives rapid prototyping and allows for dynamic typing
AWS security group
A security group acts as a virtual firewall that controls the traffic to and from virtual instances
Protocol
A sequence of well defined messages. Ex: HTTP
sockets
A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to. An endpoint is a combination of an IP address and a port number.
Monolithic
A software system is called "monolithic" if it has a monolithic architecture, in which functionally distinguishable aspects (for example data input and output, data processing, error handling, and the user interface) are all interwoven, rather than containing architecturally separate components.
Synchronous vs. asynchronous
A synchronous operation blocks a process till the operation completes. An asynchronous operation is non-blocking and only initiates the operation
VPN
A virtual private network (VPN) extends a private network across a public network, and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network.
Lambda functions
AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. Functions must have an endpoint and some triggering event. Used in PA3 and PA6.
S3
AWS blob storage. Pro: simple, good for large data. Con: poor analytics/aggregates, no structure.
horizontal scaling (scale out)
Add more machines to ensure reasonable performance as traffic increases (what most cloud systems do)
Xen
Allows multiple operating systems to execute on the same computer hardware concurrently
AMI
Amazon machine image. PRO: faster boot (than DevOps script). Con: less transparent than DevOps script
VPC
Amazon virtual private cloud......
RDS
Amazon's DB services: Relational database as a service. Pro: queries, Con: not super scalable.
Amdahl's law
Amdahl's law states that in parallelization, if P is the proportion of a system or program that can be made parallel, and 1-P is the proportion that remains serial, then the maximum speedup that can be achieved using N number of processors is 1/((1-P)+(P/N). If N tends to infinity then the maximum speedup is 1/(1-P).
EBS (elastic block storage)
An external drive. A single device attached to a single VM. Pro: not intra-VM. Con: only one VM.
ports
An identifier to a particular subsystem on a machine --> a gateway
sandbox
An isolated computing environment in which a program or file can be executed without affecting the application in which it runs. Sandboxes are used by developers to test. Sandboxes cannot: write to the filesystem, open a socket and access another host directly, spawn a subprocess or thread, or make other system calls. An example of a PaaS.
Heroku
Another example of a PaaS. Hosted and managed within AWS. Bought by salesforce.
API
Application interface: a collection of signatures of functions
Why choose one cloud storage option over another?
Availability, scalability, security, interface, speed, cost, preference for latency or bandwidth?
typical scripting languages
Bash, Javascript, PHP, Perl, Python
Availability
Can I get the data when I want it?
Import/Export snowball
Cloud storage: Amazon service for transitioning terabytes of data in the cloud. Can UPS the data and amazon will upload it. AKA sneaker net AKA fedexnet
DLL
Collection of code that is the body of the API. When an app is booted, the app is combined w/ all the other libraries it uses at runtime
DevOps
Combining the roles of software developers and IT professionals; automating the process of software delivery and infrastructure changes. Docker is an example of a devOps tool
CLI
Command line interface
declarative vs. procedural
Declarative = what, procedural = how
Load balancer
Determines what web server to go to. Could be stateful or stateless, depending on architecture (stateful if it stores info about previous server traffic). CON: single point of failure.
Docker
Docker is an open-source program that enables a Linux application and its dependencies to be packaged as a container. Makes containers portable amongst Linux systems. Recently expanded to support Windows containers. Capabilities: packaging --> creating images with dependencies, execution/scheduling, versioning/deployment
Durability
Does the data persist (in bad times)?
OS-level virtualization
Dynamically create/destroy containers that are "bigger" than processes but smaller than "virtual machines". PRO: lightweight (fast creation/destruction, little overhead switching btw. instances, no emulation), good isolation (security, resource usage). CON: generally runs "the same OS" as the host machine (i.e. you cannot run windows on linux via OS-level virtualization).
AutoScaling
Dynamically scaling and descaling. Takes time: condition must exist for some time, cloudwatch takes time --> separate infrastructure, autoscale is a service --> takes time, VMs take time to boot, ELB (elastic load balancer) takes time to determine liveness
ECS
EC2 container service. Elastic beanstalk managed ECS for us in PA2. You can use ECS directly if you want more control. Steeper learning curve than beanstalk. Supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances.
ELB
Elastic load balancer. Determines what web server to go to.
EULA
End-user license agreement: what the customer can/cannot do with the product, and ramifications of violations.
Why would you pick one OS over another?
File system architecture (speed or reliability), it's what you used last time, cost, efficiency (resource management, boot time, minimal operating resources), U.I., security, dev tools
Full virtualization vs. paravirtualization
Full: Unmodified OS/application code. Performance hit because hypervisor mediation. x86 architecture problems. Full is slow! Para: OS cooperates with hypervisor. OS code must be modified for this cooperation. Xen uses paravirtualization.
Least privilege
Giving access only to the bare minimum of information and resources necessary to complete the legitimate purpose
google app engine
Good for running web apps. Client interacts over http(s). Supports Java, Python, Go, PHP. Simple app configuration. Scalability is no longer's dev's concern.
IAM
Identity and access management.
AWS region
Independent geographic location. Each region consists of multiple availability zones.
IaaS
Infrastructure as a service. Maintain complete control of software, but don't want to maintain the hardware. IaaS providers provide the VM, and you can put whatever software you like on it. Ex: EC2
IDE
Integrated development environment: some type of built-in debugging
IIS
Internet Information Services (IIS) is a flexible, general-purpose web server from Microsoft that runs on Windows systems to serve requested HTML pages or files. Used this in PA1.
latency
Latency is the amount of time a message takes to traverse a system. In a computer network, it is an expression of how much time it takes for a packet of data to get from one designated point to another. It is sometimes measured as the time required for a packet to be returned to its sender. Analogous to length of pipe.
LXC
Linux container: looks like a VM, but has the speed of a process. Make each process have its own filesystem
DNS
Maps names (strings) to IP addresses. Amazon has a DNS server that resolves an instance's hostname to its IP address. Does not assign IP addresses (that's DHCP).
Microservices
Microservices architecture is an approach to application development in which a large application is built as a suite of modular services. Each module supports a specific business goal and uses a simple, well-defined interface to communicate with other sets of services. Containers and microservices are a good fit --> scale individual microservices, roll out new version of a microservice
AWS availability zone
Multiple availability zones per region. Isolated zones, but connected via low latency links to the other zones in its region.
Defense in Depth
Multiple layers of security are placed throughout a system
NLP
Natural language processing
NIC
Network interface card: when the machine comes on, this thing has a logical/physical sequences of bits used as identification (the MAC address)
force.com
PaaS for business apps by salesforce
PaaS
Platform as a service. Provides application developers with tools to develop that particular platform. Not a blank slate like IaaS, but not a finish product like a SaaS. Write/run/debug in local emulation environment, then give to platform for deployment. No concept of SSH here: use browser to control the application. Ex: Microsoft windows azure. Pro: simplifies work if you do things the way the platform wants you to. Con: more app logic to learn, can cause vendor lock-in.
Intra-VM cloud storage
Pro: It works; fast development and prototyping. Con: VM death = data death! Single point of failure. Can't scale out, only scale up.
Database schema (pros & cons)
Pro: keeps data homogenous. Con: conforming to a schema means you require data for each field, and you may not know or care about a specific field for an entry. Requires design. Difficult to add attributes halfway through.
HTTP(S)
Protocol for sending webpages. A means of communication between server and client. 4 HTTP verbs: post, put, get, delete. HTTP over SSL is HTTPS.
programming assignment 1 (health plans)
Purpose: gain experience setting up an IaaS cloud application and using rds. Created a program client to hit our site to measure latency before scaling. Used RDP to get on windows VM, because windows VM doesn't support SSH.
Cloud storage options
RDS (relational database as a service), BLOB storage (unstructured, ex: S3), Block device (disk abstraction, single device attached to a single VM), EFS-NFS, DynamoDB (noSQL)
SSH
Secure SHell. Program designed to allow users to log into another computer over a network, to execute commands on that computer and to move files to and from that computer. No certificates. Involves public key cryptography.
SSL
Secure socket layer. Protocol by which to secure websites. Server authentication, optional client authentication. Involves public key cryptography.
"Serverless"
Serverless computing, AKA function as a service (FaaS), is a cloud computing code execution model in which the cloud provider fully manages starting and stopping of a function's container platform as a service (PaaS) as necessary to serve requests, and requests are billed by an abstract measure of the resources required to satisfy the request, rather than per VM per hour (never pay for idle time) Does not actually involve running code without servers. Called "serverless computing" b/c the person that owns the system does not have to purchase, rent or provision servers or VMs for the back-end code to run on.
SOA
Service oriented architecture. Good: incremental update (versioning), speed to market/customers. Bad: communication cost, requires discovery.
SaaS
Software as a service. Uses the web to deliver applications managed by a thrid-party vendor. Interface is accessed on the clients' side. Eliminates the need to install and run applications on individual machines. Examples: gmail, salesforce
SDK
Software development kit: The extra "stuff" other than code itself that one might use in an IDE. Ex: the documentation associated with each function
Stateless
Stateless: a stateless protocol does not require the server to retain information
Why would a restarted VM have the same IP address/DNS name?
The IP lease is not yet up. Starting/stopping the VM does not automatically generate a new IP address).
Why would a restarted VM have a different IP address/DNS name?
The IP lease is up
TCP
Transmission control protocol. TCP enables two hosts to establish a connection and exchange streams of data. TCP guarantees delivery of data and also guarantees that packets will be delivered in the same order in which they were sent.
vertical scaling (scale up)
Use a better machine to ensure reasonable performance as traffic increases. CON: single point of failure.
Programming assignment 2 (auto-grader)
Used the PaaS elastic beanstalk to run a Docker container to implement an autograder system.
programming assignment 3 (slack)
Using AWS lambda functions to create a slack chatbot. API gateway is the function's endpoint
sudo
a Linux program that allows users to run programs with the security privileges of another user, by default the superuser
DevOps concerns/issues
expressiveness/complexity of the language, efficiency/speed of language implementation, sub-changes to a deployed infrastructure, offline error checking, tooling/ease of use, security
private IP
local devices see another device on the same network via it's private IP address. However, the devices residing outside of your local network cannot directly communicate via the private IP address, but uses your router's public IP address to communicate. If you ask a machine who am I, it only knows its private IP.
Glacier
long term storage on blue-ray. Pro: super super cheap. Con: slow (4 hour latency)
MFA
multi-factor authentication: 1) who you are (i.e. usernames), 2) what you know (e.g. security questions or passwords), 3) what you have (e.g. fingerprint)
SLA
service-level agreement: what does the customer get. Public cloud SLAs are "lousy". Amazon promises monthly uptime percentage of 99.95. If they violate, they give post-facto service credit (not refund)
bandwidth
the amount of data that can be transmitted in a fixed amount of time. Analogous to width of pipe.
Virtualization
virtualization is software that separates physical infrastructures to create various dedicated resources. It is the fundamental technology that powers cloud computing.