Cisco Devnet Assoc -- Infrastructure and automation
Stateless / No state to store
This app requires only atomic/synchronous interactions between client and server: each request from client to server returns a result wholly independent of prior and subsequent requests. An example of this application is a public web server that returns an HTML page, image, or other data on request from a browser. The application can be scaled by duplicating servers and data behind a simple load balancer.
what is declarative automation
creating a declarative, static model that represents the desired end product. This model is used by middleware that incorporates deployment-specific details, examines present circumstances, and brings real infrastructure into alignment with the model, via the least disruptive, and usually least time-consuming path.
automation mitigation - self-heal
allocate resources according to policy and automatically redeploy failed components as needed to return the application to a healthy state in current conditions.
how to run a ansible playbook
ansible-playbook -i <inventory file> <yaml_file>.yml
Idempotence
is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.
GitOps
operations by pull request
What is Immutability
the state of being unchangeable, but in DevOps parlance, it refers to maintaining systems entirely as code, performing no manual operations on them at all
self-service
Cloud resources can be availalbe within hours or minutes of needing them — speeding all phases of development and enabling rapid scaling of production capacity. Applications can be scaled in the public cloud region, service set, or provider that is most cost-effective.
what are ansible role folders and files
Each role folder tree aggregates resources that collectively enable a phase of detailed configuration. A role folder contains a /tasks folder with a main.yml tasks file. It also contains a folder of asynchronous handler task files. For more information about roles, refer to "Roles" in the Ansible documentation.
benefits of cloud paradigms
Self-service ("platforms on demand") Close specification, consistency, repeatability Platform abstraction
Challenges of Cloud paradigm - Great power, great responsibility
Access control is critical, because cloud users with the wrong permissions can do a lot of damage to their organization's assets. Cloud permissions can be also challenging to manage, particularly in manually operated scenarios.
why need automaion- finanical cost
An often-quoted Gartner statistic (albeit from 2014) places the average cost of an IT outage at upwards of $5,600 per minute, or over $300,000 USD per hour. The cost of a security breach can be even greater: in the worst cases, representing an existential threat to human life, property, business reputation, and/or organizational survival.
What is VIRL
Cisco Virtual Internet Routing Laboratory, a commercial product originally developed for internal use at Cisco, with broad and active community support. Now in version 2, VIRL can run on bare metal or on large virtual machines on several hypervisor platforms (ESXi and VMware Workstation 12+ among them).
Goal of modern automation tools
Compilation of variable definitions. Server inventory as structured data and other details separate from generic code. Orderly means to inject variable values into code, config file templates, and other destinations at runtime.
disadvantages of manual operation
Manual processes — such as waiting for infrastructure availability, manual app configuration and deployment, and production system maintenance — work against speed and scale, keeping your team from delivering new capabilities to colleagues and customers. Manual processes are always subject to human error, and documentation meant for humans is often incomplete and ambiguous, hard to test, and quickly outdated. This makes it difficult to encode and leverage hard-won knowledge about known-good configurations and best practices across large organizations and their disparate infrastructures.
What is SLI
Service level indicator
What is SLO
Service level objective, uses SLIs as the metric
Defining moments of Devops
Site Reliability Engineering (SRE) Debois and "Agile Infrastructure" Allspaw and Hammond
Challenges of Cloud Paradigm
So you thought application coding was tricky Great power, great responsibility Cost as a metric
what are ansible variable files
These files describe variable values pertinent to groups of hosts and individual hosts.
Chaos Engineering
This philosophy is based on the assertion that failure is normal: as applications scale, some parts are always failing and that apps and platforms should be engineered to: Minimize the "blast radius" of issues Self-heal Monitor events
3 major shifts in automation
Walk - get visibility and insight into your network Run - Activate policy and internet across different network domains Fly - Proactively manage applications, users, devices with DevOps workflow
Run
With these "Run stage" automation scenarios you can safely enable users to provision their own network updates. You can also automate on-boarding workflows, manage day-to-day network configurations, and run through Day 0, Day 1, and daily (Day n) scenarios
What are ansible main playbook files
Written in YAML, these files may reference one another or lower-level roles.
what does the -i flag do when running a ansible playbook
lets you specify which inventory file to use
What are roles in ansible
low-level playbook task sequences
automation mitigation - minimize the blast radius of issues
recognize problems quickly and route traffic to alternative capacity, ensuring that end users aren't severely impacted and that on-call operations personnel aren't unnecessarily paged.
automation mitigation - monitor events
remember everything that led to the incident, so that fixes can be scheduled and post-mortems can be performed
What is a Service Level Indicator (SLI)
SLIs are engineered to map to the practical reality of delivering a service to customers: they may represent a single threshold or provide more sophisticated bracketing to further classify outlier results.
Benefits of microservices
Scalability: Microservices can be scaled and load-balanced as needed across many networked servers or even multiple, geographically-separate datacenters or public cloud regions, eliminating single points of failure, and providing as much of each task-specific flavor of capacity as conditions demand, wherever it's needed, eliminating bottlenecks. Infrastructure automation tools: Increasingly, the dynamism of microservice-based applications is provided by infrastructure: container "orchestrators" like Kubernetes or Mesos, which automate on-demand scaling, self-healing, and more.
features of pyATS
pyATS framework and libraries can be leveraged within any Python code. It is modular, and includes components such as:AEtest executes the test scripts.Easypy is the runtime engine that enables parallel execution of multiple scripts, collects logs in one place, and provides a central point from which to inject changes to the topology under test. A CLI enables to enable rapid interrogation of live networks, extraction of facts, and helps automate running of test scripts and other forensics. this enables very rapid 'no-code' debugging and correction of issues in network topologies created and maintained using these tools.
What is Provisioning
refers to obtaining compute, storage, and network infrastructure (real or virtual), enabling communications, putting it into service, and making it ready for use by operators and developers (e.g., by installing an operating system, machine-level metrics, ssh keys, and the lowest level of operations tooling).
Stateful / State stored on server
A record of user state must be maintained across a series of transactions. An example of this application is a website that requires authentication: the app isn't allowed to serve pages to a user who is not logged in. User state is typically persisted by giving the client an identifying cookie that is returned to the server with each new request and used to match an ID stored there. This application can't be scaled just by adding servers: if a logged-in user is routed to a server that hasn't stored an ID matching the user's cookie, that server won't recognize them as being logged in, and will refuse their request.
Core Principles of Devops/SRE
A relentless focus on automation. The idea that "failure is normal". A reframing of "availability" in terms of what a business can tolerate.
Close specification, consistency, repeatability
Developers can capture and standardize unique configurations, maintaining configurational consistency of platforms through development, testing, staging, and production. Deploying a known-good application and configuration prevents bugs that can be introduced during manual platform configuration changes.
Challenges of Cloud Paradigm - So you thought application coding was tricky
Developers must pay close attention to platform design, architecture, and security, and cloud environments make new demands on applications. Public or private cloud frameworks have varying UIs, APIs, and quirks, meaning users can't always treat cloud resources as the commodities they really should be — especially when trying to manage clouds manually.
What are ansible inventory files
Also called hostfiles. These organize your inventory of resources (e.g., servers) under management. This enables you to aim deployments at a sequence of environments: e.g., dev, test, staging, production. For more information about inventory files, refer to "How to build your inventory" in the Ansible documenation.
Observability
An observable system enables users to infer the internal state of a complex system from its outputs. Observability (sometimes abbreviated as o11y) can be achieved through platform and application monitoring and through proactive production testing for failure modes and performance issues, but in a dynamic operation that includes autoscaling and other application behaviors, complexity increases and entities become ephemeral. A recent report by observability framework provider DataDog states that the average lifetime of a container under orchestration is only 12 hours; microservices and functions may only live for seconds. Making ephemeral entities observable and testing in production are only possible with automation.
What is scale on demand
Apps and platforms need to be able to scale up and down in response to traffic and workload requirements and to use heterogeneous capacity; for example, burst-scaling from private to public cloud, and traffic shaping appropriately. Cloud platforms may provide the ability to autoscale VMs, containers, or workloads on a serverless framework.
What is self-service
Automated self-service frameworks enable users to requisition infrastructure on demand, including: Standard infrastructure components such as database instances and VPN endpoints. Development and testing platforms. Hardened web servers and other application instances, along with the isolated networks and secured internet access that make them useful, safe, and resistant to errors. Analytics platforms such as Apache Hadoop, Elastic Stack, InfluxData, and Splunk.
downside of dependencies
Components need to be flexibly configurable (able to work alongside many other components in many different situations) and "unopinionated" (showing no more preference for specific companion components or architectures than absolutely necessary). Component developers may abandon support for obsolete features and rarely-encountered integrations, disrupting processes that depend on those features. It's also difficult or impossible to test a release exhaustively, accounting for every configuration Dependency-ridden application setups tend to get locked into fragile and increasingly insecure deployment stacks, effectively becoming monoliths: "special snowflakes" that can't easily be managed, improved, scaled, or migrated to new, perhaps more cost-effective infrastructures. Updates and patches may be postponed because changes are risky to apply and difficult to roll back.
Platform Abstraction
Container technologies abstract apps and platforms away from one another, by encapsulating application dependencies and letting your containerized app run on a generically-specified host environment.
Fly
Here you can get ahead of needs by monitoring and proactively managing your users and devices plus gaining insights with telemetry data. Take a look at these examples for this stage:
Challenges of microservices
Increased complexity: Microservices mean that there are many moving parts to configure and deploy and more demanding operations, including scaling-on-demand, self-healing and other features. Automation is a requirement: Manual methods can't realistically cope with the complexity of deploying and managing dynamic applications and their orchestrator platforms, with their high-speed, autonomous operations and their transitory and ephemeral bits and pieces.
ansible file hierarchy
Inventory Files Variable files Library and utility files main playbook files role folders and files
Principles of Idempotence
Look before you leap: Also known as "If it ain't broke, don't fix it" and "First, do no harm". Ensure the change you want to make hasn't already been made. Doing nothing is almost always a better choice than doing something wrong and possibly unrecoverable. Get to a known-good state, if possible, before making changes: For example, you may need to remove and purge earlier versions of applications before installing later versions. In production infra-as-code environments, this principle becomes the basis for immutability: the idea that changes are never made on live systems. Instead, we change automation and use it to build brand-new, known-good components from scratch. Test for idempotency: Be scrupulous about building automation free from side effects. One bad apple spoils the bunch: Only if all components of a procedure are known to be idempotent can the procedure as a whole be idempotent.
Infrastructure automation benefits
Speed Repeatability The ability to work at scale, with reduced risk
Why need automation
Speed and agility enable the business to explore, experiment with, and exploit opportunities ahead of competition. Scaling operations enables the business to capture market share efficiently and scaling capacity to match demand. Developers need to accelerate every phase of software building: coding and iterating, testing, and staging. And because DevOps practices mean that developers deploy and manage apps in production, developers need to automate those activities as well.
what are ansible library and utility files
These optional files contain Python code for custom modules and the utilities they may require. You may wish to write custom modules and utilities yourself, or obtain them from Ansible Galaxy or other sources. For example, Ansible ships with a large number of modules already present for controlling main features of Cisco ACI, but also provides tutorials on how to compose additional custom modules for ACI features currently lacking coverage.
Stateless / State stored on database
User state is stored in a database accessible to any webserver in the middle tier. An example of this application is a web server that needs to be aware of the correspondence between a user ID and user cookie. New webservers and copies of the website can be added freely without disrupting user sessions in progress and without requiring that each request from a given user be routed to the specific server that maintains their session.
Walk
Using automation tools, you can gather information about your network configuration. This scenario offers answers to the most basic and also most common question you can ask, "What changed?"
Challenges of Cloud paradigm - cost as a metric
When cloud resources can be self-served quickly via manual operations, consumption can be hard to manage and costs are difficult to calculate. Private clouds require frequent auditing and procedures for retiring unused virtual infrastructure. Public cloud users can be surprised by unexpected costs when pay-by-use resources are abandoned, but not torn down
What is pyATS
a Python-based network device test and validation solution, originally developed by Cisco for internal use, then made available to the public and partially open-sourced pyATS can be used to help check whether your changes work before putting them into production, and continue validation and monitoring in production to ensure smooth operations.
what is a .VIRL file
a human-readable YAML file. The .virl file contains a complete descriptions of the IOS routers, their interface configurations and connection (plus other configuration information), credentials for accessing them, and other details. These files can be used to launch simulations via the VIRL REST API, and you can convert .virl files to and from "testbed" files for use with PyATS and Genie.
imperative procedure
an ordered sequence of commands aimed at achieving a goal. The sequence may include flow-control, conditions, functional structure, classes, and more
what is a Site Reliability Engineer
intended to fuse the disciplines and skills of Dev and Ops, creating a new specialty and best-practices playbook for doing Ops with software methods.
What is deployment
involves building, arranging, integrating, and preparing multi-component applications (such as database clusters) or higher-level platforms (like Kubernetes clusters), often across multiple nodes.
What is Orchestration
may refer to several things. When meant concretely, it usually refers to user-built or platform-inherent automation aimed at managing workload lifecycles and reacting dynamically to changing conditions (e.g., by autoscaling or self-healing), particularly in container environments. When meant abstractly, it may refer simply to processes or workflows that link automation tasks to deliver business benefits, like self-service.
What is Configuration
means installing base applications and services and performing the operations, tasks, and tests required to prepare a low-level platform to deploy applications or a higher-level platform.
What is the goal of SLO/SLI?
permits cheaper, more rapid delivery of business value by removing the obligation to seek perfection in favor of building what's "good enough". It can also influence the pace, scope, and other aspects of development to ensure and improve adequacy
What is a ansible control node
runs on virtually any Linux machine running Python 2 or 3, including a laptop, a Linux VM residing on a laptop of any kind, or on a small virtual machine adjacent to cloud-resident resources under management. All system updates are performed on the control node. The control node connects to managed resources over SSH. Through this connection, Ansible can: Run shell commands on a remote server or transact with a remote router or other network entity via its REST interface. Inject Python scripts into targets and remove them after they run Install Python on target machines if required.