ISM Module two and three (study this one)

¡Supera tus tareas y exámenes ahora con Quizwiz!

Infrastructure as a Service (IaaS)

"The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (for example, host firewalls)." - NIST IaaS pricing may be subscription-based or based on resource usage. The provider pools the underlying IT resources and they are typically shared by multiple consumers through a multitenant model. IaaS can even be implemented internally by an organization, with internal IT managing the resources and services

Software as a Service (SaaS)

"The capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (for example, web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings." - NIST In the SaaS model, a provider offers a cloud-hosted application to multiple consumers as a service. The consumers do not own or manage any aspect of the cloud infrastructure. In SaaS, a given version of an application, with a specific configuration (hardware and software) typically provides service to multiple consumers by partitioning their individual sessions and data. SaaS applications execute in the cloud and usually do not need installation on end-point devices. This enables a consumer to access the application on demand from any location and use it through a web browser on a variety of end-point devices. Some SaaS applications may require a client interface to be locally installed on an end-point device. Customer Relationship Management (CRM), email, Enterprise Resource Planning (ERP), and office suites are examples of applications delivered through SaaS.

Mobile cloud computing

- Enables access to cloud services over mobile devices - examples: cloud storage, travel expense management, and CRM

Cloud Foundry

- PaaS offering based on industry open-source project - Enables streamlined and agile application development - supports multiple programming platforms and data services Pivotal Cloud Foundry (CF) is an enterprise Platform as a Service, built on the foundation of the Cloud Foundry open-source PaaS project. The Cloud Foundry open-source project is sustained by the Cloud Foundry Foundation, which has many leading global enterprises as members. Pivotal CF, powered by Cloud Foundry, enables streamlined application development, deployment, and operations in both private and public clouds. It supports multiple programming languages and frameworks including Java, Ruby, Node.js, PHP, and Python. It supports agile application development and enables developers to continuously deliver updates to and horizontally scale web and third platform applications with no downtime. Developers can rapidly develop and deploy applications without being concerned about configuring and managing the underlying cloud infrastructure. Pivotal CF also supports multiple leading data services such as Jenkins, MongoDB, MySQL, Redis, and Hadoop. The use of open standards enables migration of applications between compatible public and private clouds. Pivotal CF provides a unified management console for the entire platform that enables in-depth application and infrastructure monitoring.

Enterprise Mobility use case

- provides employees with ubiquitous access to information and applications -increases collaboration and productivity -facilitates BYOD

Data Center Infrastructure Virtual Infrastructure

-Virtualization abstracts physical resources and creates virtual resources -virtual components: * virtual compute, virtual storage, and virtual network ** created from physical resource pools using virtualization software -benefits of virtualization: * resource consolidation and multitenant environment * improved resource utilization and increased ROI * flexible resource provisioning and rapid elasticity Virtualization is the process of abstracting physical resources, such as compute, storage, and network, and creating virtual resources from them. Virtualization is achieved through the use of virtualization software that is deployed on compute systems, storage systems, and network devices. Virtualization software aggregates physical resources into resource pools from which it creates virtual resources. A resource pool is an aggregation of computing resources, such as processing power, memory, storage, and network bandwidth. For example, storage virtualization software pools the capacity of multiple storage devices to create a single large storage capacity. Similarly, compute virtualization software pools the processing power and memory capacity of a physical compute system to create an aggregation of the power of all processors (in megahertz) and all memory (in megabytes). Examples of virtual resources include virtual compute (virtual machines), virtual storage (LUNs), and virtual networks. Virtualization enables a single hardware resource to support multiple concurrent instances of systems, or multiple hardware resources to support a single instance of system. For example, a single disk drive can be partitioned and presented as multiple disk drives to a compute system. Similarly, multiple disk drives can be concatenated and presented as a single disk drive to a compute system. With virtualization, it is also possible to make a resource appear larger or smaller than it actually is. Virtualization offers several benefits in a data center. It enables the consolidation of physical IT resources, and supports a multitenant environment. This optimizes the utilization of physical resources that, in turn, results in an increased return-on-investment (ROI) and enables reducing the costs of purchasing of new hardware. Virtualization also reduces space and energy requirements and simplifies infrastructure management. It also increases the flexibility of resource provisioning through the dynamic creation and reclamation of virtual resources. Virtualization is a key enabling technology to meet the resource pooling and rapid elasticity characteristics of cloud computing. (Cont'd) Module 3: Data Center Environment Copyright 2015 EMC Corporation. All rights reserved. 6 Compute virtualization is covered later in this module, while different storage virtualization and network virtualization techniques are covered later in the course in the storage modules and network modules respectively.

Agility

-improve operational agility and facilitate innovation -reduce time-to-market

retail use case

: In retail, organizations use Big Data analytics to gain valuable insights for competitive pricing, anticipating future demand, effective marketing campaigns, optimized inventory assortment, and improved distribution. This enables them to provide optimal prices and services to customers, and also improve operations and revenue.

What is a compute system

A computing platform (hardware and system software) that runs applications - physical components include processor, memory, internal storage, and I/O devices -Logical components include os, device drivers, file system, and logical volume manager In an enterprise data center, applications are typically deployed on compute clusters for high availability and for balancing computing workloads. A compute cluster is a group of two or more compute systems that function together, sharing certain network and storage resources, and logically viewed as a single system.

On-demand self-service

A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.

Data lake definition

A data lake is a collection of structured and non-structured data assets that are stored as exact or near-exact copies of the source formats. The data lake architecture is a "store-everything" approach to Big Data. Unlike conventional data warehouses, data is not classified when it is stored in the repository, as the value of the data may not be clear at the outset. The data is also not arranged as per a specific schema and is stored using an object-based storage architecture. As a result, data preparation is eliminated and a data lake is less structured compared to a data warehouse. Data is classified, organized, or analyzed only when it is accessed. When a business need arises, the data lake is queried, and the resultant subset of data is then analyzed to provide a solution. The purpose of a data lake is to present an unrefined view of data to highly-skilled analysts, and to enable them to implement their own data refinement and analysis techniques.

data warehouse

A data warehouse is a central repository of integrated data gathered from multiple different sources. It stores current and historical data in a structured format. It is designed for query and analysis to support an organization's decision making process. For example, a data warehouse may contain current and historical sales data that is used for generating trend reports for sales comparisons.

What is cloud computing?

A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction

packaged applications

An organization may also migrate standard packaged applications, such as email and collaboration software out of the private cloud to a public cloud. This frees up internal IT resources for higher value projects and applications.

Application development and testing

An organization may also use the hybrid cloud model for application development and testing. An application can be tested for scalability and under heavy workload using public cloud resources, before incurring the capital expense associated with deploying it in a production environment. Once the organization establishes a steady-state workload pattern and the longevity of the application, it may choose to bring the application into the private cloud environment.

Web application hosting

An organization may use the hybrid cloud model for web application hosting. The organization may host mission-critical applications on a private cloud, while less critical applications are hosted on a public cloud. By deploying less critical applications in the public cloud, an organization can leverage the scalability and cost benefits of the public cloud. For example, e-commerce applications use public-facing web assets outside the firewall and can be hosted in the public cloud.

characteristics of third platform infrastructure

Availability Security Scalability Performance Ease of Access Interoperability Manageability

Social networking use cases

Brand Networking Enterprise Collaboration Marketing Customer Support

Broad network access

Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).

Rapid Elasticity

Capabilities can be rapidly and elastically provisioned, in some cases automatically, to scale rapidly outward and inward commensurate with demand.

Hybrid Cloud Model Use Cases

Cloud Bursting web application hosting migrating packaged applications application development and testing

cloud bursting

Cloud bursting is a common usage scenario of a hybrid cloud. In cloud bursting, an organization uses a private cloud for normal workloads, but optionally accesses a public cloud to meet transient higher workload requirements. For example, an application can get additional resources from a public cloud for a limited time period to handle a transient surge in workload.

measured service

Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service

On premise community cloud

Community cloud: "The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (for example, mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises." - NIST The organizations participating in the community cloud typically share the cost of deploying the cloud and offering cloud services. This enables them to lower their individual investments. Since the costs are shared by a fewer consumers than in a public cloud, this option may be more expensive. However, a community cloud may offer a higher level of control and protection than a public cloud. As with the private cloud, there are two variants of a community cloud: on-premise and externally-hosted. In an on-premise community cloud, one or more organizations provide cloud services that are consumed by the community. The cloud infrastructure is deployed on the premises of the organizations providing the cloud services. The organizations consuming the cloud services connect to the community cloud over a secure network. The figure on the slide illustrates an example of an on-premise community cloud.

Pivotal Greenplum Database

Complete SMAQ solution for business intelligence and analytics Massive linear scalability and parallel processing - automatic workload parallelization and data distribution across the cluster Support for SQL, Hadoop, MapReduce, and programmable analytics Pivotal Greenplum Database is a complete SMAQ solution, designed for business intelligence and Big Data analytics. It has a linearly scalable, massively parallel processing (MPP) architecture that stores and analyzes Terabytes to Petabytes of data. In this architecture, each server node acts as a self-contained database management system that owns and manages a distinct portion of the overall data. It provides automatic parallelization with no need for manual partitioning or tuning. The system automatically distributes data and parallelizes query workloads across all available hardware. In-database analytics is enabled via the support of high-performance and flexible data exchange between Hadoop and Greenplum Database. It has embedded support for SQL, MapReduce, and programmable analytics. It also provides tools for database management, backup, and disaster recovery.

Data Repositories

Data for analytics typically comes from data warehouses and data lakes a data warehouse is a central repository of integrated data gathered from different sources -stores current and historical data in a structured format -designed for query and analysis to support decision making a data lake is a collection of data that is stored as a an exact or near-exact copy of the source format - data is classified, organized, or analyzed only when it is acessed -Enables analysts to implement their own analysis techniques

Data center infrastructure Services

Delivers IT resources as services to users - enables users to achieve desired business results - users have no liabilities associated with owning the resources Components: -service catalog -self-service portal Functions of service layer: -stores service information in service catalog and presents them to the users - enables users to access services via a self-service portal Similar to a cloud service, an IT service is a means of delivering IT resources to the end users to enable them to achieve the desired business results and outcomes without having any liabilities such as risks and costs associated with owning the resources. Examples of services are application hosting, storage capacity, file services, and email. The service layer is accessible to applications and end users. This layer includes a service catalog that presents the information about all the IT resources being offered as services. The service catalog is a database of information about the services and includes a variety of information about the services, including the description of the services, the types of services, cost, supported SLAs, and security mechanisms. The provisioning and management requests are passed on to the orchestration layer, where the orchestration workflows—to fulfill the requests—are defined.

query

Efficient way to process, store and retrieve data platform for user-friendly analytics systems

organizational transformation

Emergence of new roles and responsibilities to establish and manage services. examples service manager, cloud architect, capacity planner, and service operation manager

Mobile computing use cases

Enterprise Mobility Mobility-Based Products and Services Mobile Cloud Computing

File system

File is a collection of related records stored as a single named unit in contiguous logical address space a file system controls and manages the storage and retrieval of files -enables users to perform various operations on files -groups and organizes files in hierarchical structure file systems may be broadly classified as: - disk-based file system - network-based file system - virtual file system A file is a collection of related records or data stored as a single named unit in contiguous logical address space. Files are of different types, such as text, executable, image, audio/video, binary, library, and archive. Files have a number of attributes, such as name, unique identifier, type, size, location, owner, and protection. A file system is an OS component that controls and manages the storage and retrieval of files in a compute system. A file system enables easy access to the files residing on a storage drive, a partition, or a logical volume. It consists of logical structures and software routines that control access to files. It enables users to perform various operations on files, such as create, access (sequential/random), write, search, edit, and delete. A file system typically groups and organizes files in a tree-like hierarchical structure. It enables users to group files within a logical collection called a directory, which are containers for storing pointers to multiple files. A file system maintains a pointer map to the directories, subdirectories (if any), and files that are part of the file system. It also stores all the metadata (file attributes) associated with the files. A file system block is the smallest unit allocated for storing data. Each file system block is a contiguous area on the physical disk. The block size of a file system is fixed at the time of its creation. The file system size depends on the block size and the total number of file system blocks. A file can span multiple file system blocks because most files are larger than the predefined block size of the file system. File system blocks cease to be contiguous and become fragmented when new blocks are added or deleted. Over the course of time, as files grow larger, the file system may become fragmented. File system may be broadly classified as follows disk-based, network-based, and virtual file systems. These are described below. Disk-based file system: A disk-based file system manages the files stored on storage devices such as solid-state drives, disk drives, and optical drives. Examples of disk-based file systems are Microsoft NT File System (NTFS), Apple Hierarchical File System (HFS) Plus, Extended File System family for Linux, Oracle ZFS, and Universal Disk Format (UDF). Network-based file system: A network-based file system uses networking to allow file system access between compute systems. Network-based file systems may use either the client-server model, or may be distributed/clustered. In the client-server model, the file system resides on a server, and is accessed by clients over the network. The client-server model allows clients to mount the remote file systems from the server. NFS for UNIX environment and CIFS for Windows environment (both covered in Module 6, 'File-based Storage System (NAS)') are two standard client-server file sharing protocols. A clustered file system is a file system that is simultaneously mounted on multiple compute systems (or nodes) in a cluster. It allows the nodes in the cluster to share and concurrently access the same storage device. Clustered file systems provide features like location-independent addressing and redundancy. A clustered file system may also spread data across multiple storage nodes, for redundancy and/or performance. Examples of networkbased file systems are Microsoft Distributed File System (DFS), Hadoop Distributed File System (HDFS), VMware Virtual Machine File System (VMFS), Red Hat GlusterFS, and Red Hat CephFS. Virtual file system: A virtual file system is a memory-based file system that enables compute systems to transparently access different types of file systems on local and network storage devices. It provides an abstraction layer that allows applications to access different types of file systems in a uniform way. It bridges the differences between the file systems for different operating systems, without the application's knowledge of the type of file system they are accessing. The examples of virtual file systems are Linux Virtual File System (VFS) and Oracle CacheFS. The following is the process of mapping user files to the storage that uses an LVM: 1. Files are created and managed by users and applications. 2. These files reside in the file systems. 3. The file systems are mapped to file system blocks. 4. The file system blocks are mapped to logical extents of a logical volume. 5. These logical extents in turn are mapped to the physical extents either by the OS or by the LVM. 6. These physical extents are mapped to the sectors in a storage subsystem. If there is no LVM, then there are no logical extents. Without LVM, file system blocks are directly mapped to sectors. Apart from the files and directories, the file system also includes a number of other related records, which are collectively called the metadata. The metadata of a file system must be consistent for the file system to be considered healthy.

Big Data Analytics Use cases

Healthcare Finance Retail Government

Finance use case

In finance, organizations use Big Data analytics for activities such as correlating purchase history, profiling customers, and analyzing behavior on social networks. This also enables in controlling customer acquisition costs and target sales promotions more effectively. Big Data analytics is also being used extensively in detecting credit card frauds.

government use case

In government organizations, Big data analytics enables improved efficiency and effectiveness across a variety of domains such as social services, education, defense, national security, crime prevention, transportation, tax compliance, and revenue management.

Healthcare use case

In healthcare, Big Data analytics solutions provide consolidated diagnostic information and enable healthcare providers to analyze patient data; improve patient care and outcomes; minimize errors; increase patient engagement; and improve operations and services. These solutions also enable healthcare providers to monitor patients and analyze their experiences in real time

Externally hosted community cloud

In the externally-hosted community cloud model, the organizations of the community outsource the implementation of the community cloud to an external cloud service provider. The cloud infrastructure is hosted on the premises of the provider and not within the premises of any of the participant organizations. The provider manages the cloud infrastructure and facilitates an exclusive community cloud environment for the organizations. The IT infrastructure of each of the organizations connects to the externally-hosted community cloud over a secure network. The cloud infrastructure may be shared with multiple tenants. However, the community cloud resources are securely separated from other cloud tenants by access policies implemented by the provider.

Pivotal GemFire

In-memory distributed database for high-scale NoSQL applications - provides low latency data access to applications at massive scale - supports real-time analytics Automatic data distribution across cluster support for multiple programming languages Pivotal GemFire is an in-memory distributed database for high-scale custom NoSQL applications. GemFire stores all operational data in the RAM across distributed nodes to provide fast access to data while minimizing the performance penalty of reading from the storage drives. This provides low latency data access to applications at massive scale with many concurrent transactions involving Terabytes of operational data. Designed for maintaining consistency of concurrent operations across its distributed data nodes, GemFire supports ACID (Atomicity, Consistency, Isolation, Durability) transactions for massively-scaled applications, such as stock trading, financial payments, and ticket sales having millions of transactions a day. GemFire provides linear scalability that allows to predictably increase the capacity and the data storage by adding additional nodes to a cluster. Data distribution and system resource usage is automatically adjusted as nodes are added or removed, making it easy to scale up or down to quickly meet the expected or unexpected spikes of demand. GemFire offers built in fail-over and resilient selfhealing clusters to allow developers to meet the most stringent service level requirements for data accessibility. It provides native support for Java, C++, and C# programming languages, while applications written in other programming languages are supported via a REST API.

Cloud Service Models

Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS)

Query more fully defined by the SMAQ stack

It is unintuitive and inconvenient to specify MapReduce jobs in terms of distinct Map and Reduce functions in a programming language. To mitigate this, SMAQ systems incorporate a higher-level query layer to simplify both the specification of the MapReduce operations, and the analysis of the results. The query layer implements high-level languages that enable users to describe, run, and monitor MapReduce jobs. The languages are designed to handle not only the processing, but also the loading and saving of data from and to the MapReduce cluster. The languages typically support integration with NoSQL databases implemented on the MapReduce cluster.

Storage more fully defined in the SMAQ stack

MapReduce fetches data sets and stores the results of the computation in storage. The data must be available in a distributed fashion, to serve each processing node. The design and features of the storage layer are important not just because of the interface with MapReduce, but also because they affect the ease with which data can be loaded and the results of computation extracted and searched. A storage system in the SMAQ stack is based on either a proprietary or an open-source distributed file system, such as Hadoop Distributed File System (HDFS). The storage system may also support multiple file systems for client access. The storage system consists of multiple nodes—collectively called a "cluster"—and the file system is distributed across all the nodes in the cluster. Each node in the cluster has processing capability as well as storage capacity. The system has a highly-scalable architecture, and additional nodes can be added dynamically to meet the workload and the capacity needs. The distributed file systems like HDFS typically provide only an interface similar to that of regular file systems. Unlike a database, they can only store and retrieve data and not index it, which is essential for fast data retrieval. To mitigate this and gain the advantages of a database system, SMAQ solutions may implement a NoSQL database on top of the distributed file system. NoSQL databases may have built-in MapReduce features that allow processing to be parallelized over their data stores. In many applications, the primary source of data is in a relational database. Therefore, SMAQ solutions may also support the interfacing of MapReduce with relational database systems.

Fully description of map reduce

MapReduce is the driving force behind most Big Data processing solutions. It is a parallel programming framework for processing large data sets on a compute cluster. The key innovation of MapReduce is the ability to take a query over a data set, divide it, and run it in parallel over multiple compute systems or nodes. This distribution solves the issue of processing data that is too large to be processed by a single machine. MapReduce works in two phases—"Map" and "Reduce"—as suggested by its name. An input data set is split into independent chunks which are distributed to multiple compute systems. The Map function processes the chunks in a completely parallel manner, and transforms them into multiple smaller intermediate data sets. The Reduce function condenses the intermediate results and reduces them to a summarized data set, which is the desired end result. Typically both the input and the output data sets are stored on a file-system. The MapReduce framework is highly scalable and supports the addition of processing nodes to process chunks. Apache's Hadoop MapReduce is the predominant open source Java-based implementation of MapReduce. The figure on the slide depicts a generic representation of how MapReduce works and can be used to illustrate various examples. A classic example of MapReduce is the task of counting the number of unique words in a very large body of data including millions of documents. In the Map phase, each word is identified and given the count of 1. In the Reduce phase, the counts are added together for each word. Another example is the task of grouping customer records within a data set into multiple age groups, such as 20-30, 30-40, 40-50, and so on. In the Map phase, the records are split and processed in parallel to generate intermediate groups of records. In the Reduce phase, the intermediate data sets are summarized to obtain the distinct groups of customer records (depicted by the colored groups).

Skills transformation

Need for developing new technical and soft skills.

Drivers for Transforming to the Third Platform

New Business Models Agility Intelligent Operations New Products and Services Mobility Social Networking

Essential Cloud Characteristics

On-demand self-service Broad network access Resource pooling Rapid elasticity Measured service

Logical Components of a Compute System

Operating System, Virtual Memory, Logical Volume Manager and File System

Data Center Infrastructure physical layer

Physical infrastructure foundation layer of the data center infrastructure physical components: - compute systems storage and network devices * require operating systems, system software, and protocols for their functions - executes the requests generated by the virtual and software-defined layers The physical infrastructure forms the foundation layer of a data center. It includes equipment such as compute systems, storage systems, and networking devices along with the operating systems, system software, protocols, and tools that enable the physical equipment to perform their functions. A key function of physical infrastructure is to execute the requests generated by the virtual and software-defined infrastructure, such as storing data on the storage devices, performing compute-to-compute communication, executing programs on compute systems, and creating backup copies of data.

Mobility-Based Products and Services

Provide customers with ubiquitous access to mobility-based solutions examples: social networking, banking, e-commerce and location-based services

Components of a big data analytics solution

Query MapReduce Storage

collective name of technology layers in the big data analytics solution

SMAQ stack

Syncplicity

SaaS solution for file sharing and data protection. - Provides mobile and web access - enables byod Synchronizes files across devices in real time EMC Syncplicity is an enterprise-grade online file sharing, collaboration, and data protection SaaS solution. It enables a business user to securely share files and folders, and collaborate with other users. It supports both mobile and web access to files from any device, and the files are also available offline. It synchronizes file changes across all devices in real time, so documents are always protected and available on any device. If a device fails, access to files would still be available from other devices. It enables a bring-your-own-device (BYOD) workforce, while providing access controls, single sign-on (SSO), data encryption, and other enterprise-grade features. Syncplicity currently has four offerings: Personal Edition (for individuals), Business Edition (for small and medium businesses), Department Edition (for enterprise departments), and Enterprise Edition. The Enterprise Edition has support for public, on-premise, and hybrid deployment options.

SLA

Service Level Agreement

SLO

Service Level Objective

Brand Networking

Showcase products and interact with customers improve brand visibility gain insights on customer base through analytics

storage

Storage systems consist of multiple nodes collectively called a "cluster" -based on distributed file systems -each node has processing capability and storage capacity -highly-scalable architecture A noSQL database may be implemented on top of the distributed file system

virtual memory

The amount of physical memory (RAM) in a compute system determines both the size and the number of applications that can run on the compute system. Memory virtualization presents physical memory to applications as a single logical collection of contiguous memory locations called virtual memory. While executing applications, the processor generates logical addresses (virtual addresses) that map into the virtual memory. The memory management unit of the processor then maps the virtual address to the physical address. The OS utility, known as the virtual memory manager (VMM), manages the virtual memory and also the allocation of physical memory to virtual memory. An additional memory virtualization feature of an OS enables the capacity of secondary storage devices to be allocated to the virtual memory. This creates a virtual memory with an address space that is much larger than the actual physical memory space present in the compute system. This enables multiple applications and processes, whose aggregate memory requirement is greater than the available physical memory, to run on a compute system without impacting each other. The VMM manages the virtual-to-physical memory mapping and fetches data from the secondary storage when a process references a virtual address that points to data at the secondary storage. The space used by the VMM on the secondary storage is known as a swap space. A swap space (also known as page file or swap file) is a portion of the storage drive that is used as physical memory. In a virtual memory implementation, the memory of a system is divided into contiguous blocks of fixed-size pages. A process known as paging moves inactive physical memory pages onto the swap file and brings them back to the physical memory when required. This enables efficient use of the available physical memory among different applications. The OS typically moves the leastused pages into the swap file so that enough RAM is available for processes that are more active. The access to swap file pages is slower than physical memory pages because swap file pages are allocated on the storage drive, which is slower than the physical memory.

Platform as a Service (PaaS)

The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment. Most PaaS offerings support multiple operating systems and programming frameworks for application development and deployment. PaaS usage fees are typically calculated based on factors, such as the number of consumers, the types of consumers (developer, tester, and so on), the time for which the platform is in use, and the compute, storage, or network resources consumed by the platform.

Hybrid Cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

Private Cloud

The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises. Many organizations may not wish to adopt public clouds due to concerns related to privacy, external threats, and lack of control over the IT resources and data. When compared to a public cloud, a private cloud offers organizations a greater degree of privacy and control over the cloud infrastructure, applications, and data. There are two variants of private cloud: on-premise and externally-hosted, as shown in figure 1 and figure 2 respectively on the slide. The on-premise private cloud is deployed by an organization in its data center within its own premises. In the externally-hosted private cloud (or off-premise private cloud) model, an organization outsources the implementation of the private cloud to an external cloud service provider. The cloud infrastructure is hosted on the premises of the provider and may be shared by multiple tenants. However, the organization's private cloud resources are securely separated from other cloud tenants by access policies implemented by the provider.

Public Cloud

The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider. Public cloud services may be free, subscription-based, or provided on a pay-per-use model. A public cloud provides the benefits of low up-front expenditure on IT resources and enormous scalability. However, some concerns for the consumers include network availability, risks associated with multi-tenancy, visibility and control over the cloud resources and data, and restrictive default service levels.

Operating system

The operating system (OS) is a software that acts as an intermediary between a user of a compute system and the compute system hardware. It controls and manages the hardware and software on a compute system. The OS manages hardware functions, applications execution, and provides a user interface (UI) for users to operate and use the compute system. The figure on the slide depicts a generic architecture of an OS. Some functions (or services) of an OS include program execution, memory management, resources management and allocation, and input/output management. An OS also provides networking and basic security for the access and usage of all managed resources. It also performs basic storage management tasks while managing other underlying components, such as the device drivers, logical volume manager, and file system. An OS also contains high-level Application Programming Interfaces (APIs) to enable programs to request services. To interact with a particular hardware resource, an OS requires a device driver, which is a special system software that permits the OS to interact with the specific device. For example, hardware such as printer, mouse, disk drive, network adapters, and graphics cards require device drivers. A device driver enables the OS to recognize the device, and to access and control it. Device drivers are hardware-dependent and OS-specific.

Resource pooling

The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.

What is mobile computing

The use of mobile devices to remotely access applications and information "on the go" over a wireless network - mobile devices include laptops, tablets, smartphones, and PDAs

Volume

The word "Big" in Big Data refers to the massive volumes of data. Organizations are witnessing an ever-increasing growth in data of all types, such as transaction-based data stored over the years, sensor data, and unstructured data streaming in from social media. This growth in data is reaching Petabyte—and even Exabyte—scales. The excessive volume not only requires substantial cost-effective storage, but also gives rise to challenges in data analysis.

types of compute systems

Tower, Rack-Mounted, Blade. The compute systems used in building data centers are typically classified into three categories: tower compute system, rack-mounted compute system, and blade compute system A tower compute system, also known as a tower server, is a compute system built in an upright standalone enclosure called a "tower", which looks similar to a desktop cabinet. Tower servers have a robust build, and have integrated power supply and cooling. They typically have individual monitors, keyboards, and mice. Tower servers occupy significant floor space and require complex cabling when deployed in a data center. They are also bulky and a group of tower servers generate considerable noise from their cooling units. Tower servers are typically used in smaller environments. Deploying a large number of tower servers in large environments may involve substantial expenditure. A rack-mounted compute system, also known as a rack server, is a compute system designed to be fixed inside a frame called a "rack". A rack is a standardized enclosure containing multiple mounting slots called "bays", each of which holds a server in place with the help of screws. A single rack contains multiple servers stacked vertically in bays, thereby simplifying network cabling, consolidating network equipment, and reducing the floor space use. Each rack server has its own power supply and cooling unit. Typically, a console is mounted on a rack to enable administrators to manage all the servers in the rack. Some concerns with rack servers are that they are cumbersome to work with, and they generate a lot of heat because of which more cooling is required, which in turn increases power costs. A "rack unit" (denoted by U or RU) is a unit of measure of the height of a server designed to be mounted on a rack. One rack unit is 1.75 inches (44.45 mm). A 1 U rack server is typically 19 inches (482.6 mm) wide. The standard rack cabinets are 19 inches wide and the common rack cabinet sizes are 42U, 37U, and 27U. The rack cabinets are also used to house network, storage, telecommunication, and other equipment modules. A rack cabinet may also contain a combination of different types of equipment modules. (Cont'd) Module 3: Data Center Environment Copyright 2015 EMC Corporation. All rights reserved. 28 A blade compute system, also known as a blade server, is an electronic circuit board containing only core processing components, such as processor(s), memory, integrated network controllers, storage drive, and essential I/O cards and ports. Each blade server is a self-contained compute system and is typically dedicated to a single application. A blade server is housed in a slot inside a blade enclosure (or chassis), which holds multiple blades and provides integrated power supply, cooling, networking, and management functions. The blade enclosure enables interconnection of the blades through a high-speed bus and also provides connectivity to external storage systems. The modular design of the blade servers makes them smaller, which minimizes the floor space requirements, increases the compute system density and scalability, and provides better energy efficiency as compared to the tower and the rack servers. It also reduces the complexity of the compute infrastructure and simplifies compute infrastructure management. It provides these benefits without compromising on any capability that a non-blade compute system provides. Some concerns with blade servers include the high cost of a blade system (blade servers and chassis), and the proprietary architecture of most blade systems due to which a blade server can typically be plugged only into a chassis from the same vendor.

Value

Value refers to the cost-effectiveness of the Big Data analytics technology used and the business value derived from it. Many large enterprise scale organizations have maintained large data repositories, such as data warehouses, managed non-structured data, and carried out realtime data analytics for many years. With hardware and software becoming more affordable and the emergence of more providers, Big Data analytics technologies are now available to a much broader market. Organizations are also gaining the benefits of business process enhancements, increased revenues, and better decision making.

Variability

Variability refers to the constantly changing meaning of data. For example, analysis of natural language search and social media posts requires interpretation of complex and highlyvariable grammar. The inconsistency in the meaning of data gives rise to challenges related to gathering the data and in interpreting its context.

Variety

Variety refers to the diversity in the formats and types of data. Data is generated by numerous sources in various structured and non-structured forms. Organizations face the challenge of managing, merging, and analyzing the different varieties of data in a cost-effective manner. The combination of data from a variety of data sources and in a variety of formats is a key requirement in Big Data analytics. An example of such a requirement is combining a large number of changing records of a particular patient with various published medical research to find the best treatment.

Velocity

Velocity refers to the rate at which data is produced and changes, and also how fast the data must be processed to meet business requirements. Today, data is generated at an exceptional speed, and real-time or near-real time analysis of the data is a challenge for many organizations. It is essential for the data to be processed and analyzed, and the results to be delivered in a timely manner. An example of such a requirement is real-time face recognition for screening passengers at airports.

veracity

Veracity refers to the varying quality and reliability data. The quality of the data being gathered can differ greatly, and the accuracy of analysis depends on the veracity of the source data. Establishing trust in Big Data presents a major challenge because as the variety and number of sources grows, the likelihood of noise and errors in the data increases. Therefore, a significant effort may go into cleaning data to remove noise and errors, and to produce accurate data sets before analysis can begin. For example, a retail organization may have gathered customer behavior data from across systems to analyze product purchase patterns and to predict purchase intent. The organization would have to clean and transform the data to make it consistent and reliable.

Characteristics of big data

Volume, Velocity, Variety, Variability, Veracity, Value

vCloud Air

a public cloud to which you can connect a private cloud to form a hybrid cloud -extending/migrating existing workloads -new application development - disaster recovery VMware vCloud Air is a secure public cloud owned and operated by VMware that offers Infrastructure as a Service for enterprise use cases, such as extending existing data center workloads into the public cloud, migrating applications from on-premise clouds to the public cloud, new application development, and disaster recovery. It is built on the foundation of vSphere and is compatible with existing VMware on-premise clouds. It enables organizations to adopt the hybrid cloud model by seamlessly extending their on-premise clouds into the public cloud. vCloud Air allows existing applications to run in the public cloud without the need to rewrite or re-architect them. Organizations can use the same networking, security, and management tools, skills, and policies that are used in their on-site environments. A consolidated view of allocated resources is provided to enable administrators to manage resource utilization. vCloud Air has three primary service offerings (with more expected in the future): Dedicated Cloud (singletenant, physically isolated cloud service), Virtual Private Cloud (logically isolated, multi-tenant cloud service), and Disaster Recovery (cloud-based disaster recovery service). vCloud Air offers both term-based subscription and pay-as-you-go options.

ease of access

access to applications and information from any location over mobile devices

operating model transformation

adoption of ITaaS model IT resources are provisioned by LoBs through a self-service portal

marketing

advertise products on the pages of individuals identify potential customers through analytics

technology transformation

application transformation: cloud delivery, and analytics and mobile capabilities infrastructure transformation: standardization, consolidation and automation

security

challenges related to unauthorized access, data ownership, malware, governance, and compliance ensure security across multiple third platform technologies

Data center infrastructure orchestration

component: - Orchestration software provides workflows for executing automated tasks interacts with various components across layers and functions to invoke provisioning tasks The orchestration layer includes the orchestration software. The key function of this layer is to provide workflows for executing automated tasks to accomplish a desired outcome. Workflow refers to a series of inter-related tasks that perform a business operation. The orchestration software enables this automated arrangement, coordination, and management of the tasks. This helps to group and sequence tasks with dependencies among them into a single, automated workflow. Associated with each service listed in the service catalog, there is an orchestration workflow defined. When a service is selected from the service catalog, an associated workflow in the orchestration layer is triggered. Based on this workflow, the orchestration software interacts with the components across the software-defined layer and the BC, security, and management functions to invoke the provisioning tasks to be executed by the entities.

new products and services

create new or additional products and services create new revenue streams

New Business Models

create new or improve existing business models improve-decision-making through valuable insights from data

Logical Volume Manager (LVM)

creates and controls compute level logical storage - provides a logical view of physical storage -logical dta blocks are mapped to physical data blocks physical volumes form a volume group - lvm manages volume groups as a single entity -logical volumes are created from a volume group Logical Volume Manager (LVM) is software that runs on a compute system and manages logical and physical storage. LVM is an intermediate layer between the file system and the physical drives. It can partition a larger-capacity disk into virtual, smaller-capacity volumes (partitioning) or aggregate several smaller disks to form a larger virtual volume (concatenation). LVMs are mostly offered as part of the OS. Earlier, an entire storage drive would be allocated to the file system or the other data entity used by the OS or application. The disadvantage of this was a lack of flexibility. When a storage drive ran out of space, there was no easy way to extend the file system's size. As the storage capacity of the disk drive increased, allocating the entire disk drive for the file system often resulted in underutilization of the storage capacity. The evolution of LVMs enabled dynamic extension of file system capacity and efficient storage management. The LVM provides optimized storage access and simplifies storage resource management. It hides details about the physical disk and the location of data on the disk. It enables administrators to change the storage allocation even when the application is running. The basic LVM components are physical volumes, logical volume groups, and logical volumes. In LVM terminology, each physical disk connected to the compute system is a physical volume (PV). A volume group is created by grouping together one or more PVs. A unique physical volume identifier (PVID) is assigned to each PV when it is initialized for use by the LVM. Physical volumes can be added or removed from a volume group dynamically. They cannot be shared between different volume groups; which means, the entire PV becomes part of a volume group. Each PV is divided into equal-sized data blocks called physical extents when the volume group is created. Logical volumes (LV) are created within a given volume group. A LV can be thought of as a disk partition, whereas the volume group itself can be thought of as a disk. The size of a LV is based on a multiple of the number of physical extents. The LV appears as a physical device to the OS. A LV is made up of noncontiguous physical extents and may span over multiple physical volumes. A file system is created on a logical volume. These LVs are then assigned to the application. A logical volume can also be mirrored to provide enhanced data availability.

availability

critical services continue to be available following a disruptive events resilient infrastructure and application design

Data center infrastructure Software-defined infrastructure

deployed either on virtual layer or on physical layer all infrastructure components are virtualized and aggregated into pools - underlying resources are abstracted from applications -enables ITaaS Centralized, automated, and policy-driven management and delivery of heterogeneous resources components: - software-defined compute - software defined storage - software- defined network The software-defined infrastructure layer is deployed either on the virtual layer or on the physical layer. In the software-defined approach, all infrastructure components are virtualized and aggregated into pools. This abstracts all underlying resources from applications. The softwaredefined approach enables ITaaS, in which consumers provision all infrastructure components as services. It centralizes and automates the management and delivery of heterogeneous resources based on policies. The key architectural components in the software-defined approach include software-defined compute (equivalent to compute virtualization), software-defined storage (SDS), and software-defined network (SDN). Software-defined data center is covered later in this module. Software-defined storage is covered in Module 8, whereas software-defined network is covered in the network modules.

Storage define

distributed architecture non-relational, non-structured data

Enterprise Collaboration

enable employees to communicate and collaborate

Data center infrastructure business continuity

enables ensuring the availability of services in line with SLA Supports all the layers to provide uninterrupted services Includes adoption of measures to mitigate the impact of downtime proactive: business impact analysis, risk assessment, technology solutions deployment (backup and replication) reactive: disaster recovery, disaster restart The business continuity (BC) cross-layer function specifies the adoption of proactive and reactive measures that enable an organization to mitigate the impact of downtime due to planned and unplanned outages. The proactive measures include activities and processes such as business impact analysis, risk assessment, and technology solutions such as backup, archiving, and replication. The reactive measures include activities and processes such as disaster recovery and disaster restart to be invoked in the event of a service failure. This function supports all the layers—physical, virtual, software-defined, orchestration, and services—to provide uninterrupted services to the consumers. The BC cross-layer function of a cloud infrastructure enables a business to ensure the availability of services in line with the service level agreement (SLA).

performance

ensure optimal performance for mixed workloads ensure high throughput and low latency

intelligent operations

improve operational efficiency through intelligent tools and strategies

social networking

increase visibility, market reach, and provide better service

scalability

massive scalability to accommodate changes in workloads and data volume

customer support

monitor customer comments and resolve issues

interoperability

multiple systems or components share and use information and services through apis, web services, or middleware

Imperatives for third platform transformation

operating model transformation organizational transformation technology transformation skills transformation

MapReduce

parallel computation across many servers batch-processing model

Physical components of a compute system

processor random-access memory read-only memory motherboard chipset secondary storage

mobility

provide ubiquitous access to applications and information improve collaboration productivity and profitability

Cloud Deployment Models

public, private, community, hybrid

Big Data

represents the information assets whose high volume, high velocity, and high variety require the use of new technical architectures and analytical methods to gain insights and for deriving business value. The definition of Big Data has three principal aspects: characteristics of data, data processing needs, and business value. includes both structured and non-structured data requires highly-scalable storage architecture and new tools for processing analytics enables better decision making

Query

simplifies the specification of MapReduce operations, and the retrieval and analysis of the results -designed to retrieve and process massive amounts of non-structured data - provides support for analytics and reporting

manageability

single pane of management, automation, and multi-party orchestration

SNA

social network analysis


Conjuntos de estudio relacionados

Napoleon Video Questions Parts 3 and 4

View Set

AP Lit Unit 7: Explain the function of character. Quiz

View Set

Chapter 14- Digestive System- Honors Anatomy & Physiology

View Set

The Outsiders by S. E. Hinton HH chapter analysis

View Set

INFO 320 - Exam 1 Prep (Chapters 1-3)

View Set

Psychological Disorders (Chapter 12- Substance-Related and Addictive Disorders)

View Set

Unit 2 1003: Pharm- Genitourinary Practice Questions

View Set

System Analysis & Design - Chapter 7

View Set