Data Management Midterm Review

¡Supera tus tareas y exámenes ahora con Quizwiz!

Based on the 2020 graphic below, which cloud provider is #1 in market share?

AWS

To create a repo that has a connection to GitHub, you must create the repository on GitHub first A) True B) False

B) False

In AWS CAF what does CAF stand for?

Cloud Adoption Framework

Which selection of an AWS CloudFormation template is the only required section and specifies resources, such as an Amazon EC2 instance or Amazon S3 bucket, and their properties? A) Metadata B) Parameters C) Mappings D) Resources

D) Resources

A data engineering team is building a data pipeline for batch analytics. Which resource would provide the most relevant guidance about AWS best practices? A) Cost optimization pillar of the AWS Well-Architected Framework. B) Operational excellence pillar of the AWS Well-Architected Framework. C) Machine learning lifecycle section of the Machine Learning Lens for the AWS Well-Architected Framework. D) Scenarios section of the Data Analytics Lens of the AWS Well-Architected Framework.

D) Scenarios section of the Data Analytics Lens of the AWS Well-Architected Framework.

Which statement represents a strategy that is suggested for innovating with artificial intelligence and machine learning (AI/ML)? A) Focus on structured data where you can be more certain of the outcomes. B) Expect a significant investment to train analysts to use new tools and abandon older, familiar tools, such as SQL. C) To protect your investment, limit ML activities to data scientists who have significant experience with ML. D) Take advantage of cloud services with AI/ML features that democratize access.

D) Take advantage of cloud services with AI/ML features that democratize access.

In an AWS CloudFormation template, how could an engineer ensure that an Amazon EC2 instance is created before an Amazon RDS database is created? A) Use a wait condition in the Outputs section for the EC2 instance. B) Add a template section to define the properties for the EC2 instance. C) Place the EC2 instance section earlier in the template than the Amazon RDS section. CloudFormation will automatically create the EC2 instance first. D) Use the DependsOn attribute in the Properties section for the Amazon RDS resource.

D) Use the DependsOn attribute in the Properties section for the Amazon RDS resource.

Put the following in chronological order, with the earliest approach to storing and accessing data first (1) to newer approaches last (4). ____Purpose-built cloud data stores ____Nonrelational ____Data lakes ____Relational

__4__Purpose-built cloud data stores __2__Nonrelational __3__Data lakes __1__Relational

One advantage of cloud computing is that we trade _________ expense for __________ expense

capital, variable

You have a file called readme.md and want to put it in the staging area. What command does this?

git add readme.md

How can you create a new branch called features?

git branch features

How would you make a commit with the message 'add readme'?

git commit -m 'add readme'

Go ______ in ________

global, minutes

Both the customer and AWS share responsibility for security. The customer is responsible for security ______ the cloud. AWS is responsible for the security ______ the cloud.

in, of

What is true about regions?

A regions is a physical location of multiple Availability Zones Each regions located in a separate geographical location

Which of these are ways to access AWS core services? Select all that apply A) AWS Command Line Interface (AWS CLI) B) Technical support calls C) AWS Marketplace D) Software Development Kits (SDKs) E) AWS Management Console

A) AWS Command Line Interface (AWS CLI) D) Software Development Kits (SDKs) E) AWS Management Console

A system administrator creates cryptographic keys for their organization, but as the organization grows, they find it more difficult to audit and manage all the keys. They still would like to control encryption of data across AWS services. Which AWS service would best meet their need? A) AWS Key Management Service (AWS KMS) B) AWS CloudTrail C) AWS Auto Scaling D) AWS CloudFormation

A) AWS Key Management Service (AWS KMS)

Which of the following is a compute service? A) Amazon EC2 B) Amazon S3 C) Amazon Redshift D) Amazon CloudFront D) Amazon VPC

A) Amazon EC2

The stream processing pipeline example that is presented in this module describes producers as part of the ingestion and producers layer. Which statement describes a producer? A) An integration that collects data from a source and loads it onto a stream. B) An application that reads and processes data from the stream. C) An application that reads the data but does not process it. D) Temporary storage that is used while processing data in the stream.

A) An integration that collects data from a source and loads it onto a stream.

A customer wants to create a website that highlights cars only. Because there are many different cars, the customer plans to train a model by showing it numerous pictures that are labeled as cars and letting it learn to identify unlabeled pictures as cars. Which data science approach could accomplish this task? A) Artificial intelligence and machine learning (AI/ML) B) Sentiment analysis C) Traditional analytics D) Historical analysis E) Natural language processing (NLP)

A) Artificial intelligence and machine learning (AI/ML)

How could a healthcare company use data to support the most valuable personalized experience for their customers? A) Combine disparate data of different types from on-premises databases, public datasets, and customer devices to alert customers who might be at risk for a health event. B) Collect heart rate data from customers' personal devices and automatically add it to their healthcare records. C) Provide summary reports of health trends based on an interval customer database. The company can ensure the best experience because it has control over this data and is certain of its validity. D) Give customers access to full public health datasets through a mobile app. Users can search for a variety of health conditions.

A) Combine disparate data of different types from on-premises databases, public datasets, and customer devices to alert customers who might be at risk for a health event.

Which statement best describes data wrangling? A) Data wrangling is a set of steps that are performed to transform large amounts of data from multiple sources into a meaningful dataset. B) Data wrangling is a data transformation approach that requires the use of sophisticated tools to review and transform data from a given data source. C) Data wrangling provides a set of transformation steps, which are each performed one time in sequence as data is ingested. D) Data wrangling provides rigid guidance for data transformations to ensure that they adhere to standards that are needed for ML models.

A) Data wrangling is a set of steps that are performed to transform large amounts of data from multiple sources into a meaningful dataset.

Which task might be performed as part of data structuring within the data wrangling process? A) Map data from a source file into a format that can be used with existing sources and used in the pipeline. B) Fill in missing data. C) Determine which visualization tools should be used to present the data to business users. D) Merge multiple data sources into a single dataset.

A) Map data from a source file into a format that can be used with existing sources and used in the pipeline.

A data engineer would like to control access to workload infrastructure. Which security best practices should they implement? Select THREE. A) Monitor infrastructure changes and user activities. B) Allow SSH access. C) Implement least privilege policies. D) Prevent unintended access. E) Allow data owners to determine access. F) Use a public subnet.

A) Monitor infrastructure changes and user activities. C) Implement least privilege policies. D) Prevent unintended access.

Which of the following are NOT benefits of AWS Cloud computing? Select two A) Multiple procurement cycles B) High latency C) Temporary and disposable resources D) Fault tolerant databases E) High Availability

A) Multiple procurement cycles B) High latency

Which statement describes how data is processed through the pipeline? A) Processing will almost always be iterative. B) The ingestion step of the pipeline might be iterative, but after data is stored to be processed, iterations rarely occur. C) After the selected data sources have been ingested into the pipeline, it would be unlikely for additional iterations to involve new data sources. D) Processing should almost never be iterative.

A) Processing will almost always be iterative.

What are the reasons that a data engineer might use AWS CloudFormation to set up resources instead of manually setting them up? Select THREE. A) Simplify infrastructure management. B) Secure AWS Identity and Access Management (IAM) policies. C) Allow for application monitoring. D) Easily control and track changes to infrastructure. E) Manage costs. F) Quickly replicate infrastructure.

A) Simplify infrastructure management. D) Easily control and track changes to infrastructure. F) Quickly replicate infrastructure.

Which of these are cloud computing models? Select all that apply A) Software as a service B) Platform as a service C) Infrastructure as a service D) System administration as a service

A) Software as a service B) Platform as a service C) Infrastructure as a service

Which of these benefits of cloud computing over on-premises computing? Select ALL that apply A) Trade capital expense from variable expense B) Eliminate guessing on your infrastructure capacity needs C) Increase speed and agility D) Pay for racking, stacking, and powering servers E) Benefit from massive economies of scale

A) Trade capital expense from variable expense B) Eliminate guessing on your infrastructure capacity needs C) Increase speed and agility E) Benefit from massive economies of scale

AWS owned and maintains the network-connected hardware required for application services, while you provision and use what you need. A) True B) False

A) True

AWS services work together like building blocks A) True B) False

A) True

Data management is the practice of collecting, keeping, and using data securely, efficiently, and cost-effectively. A) True B) False

A) True

I make changes to a file and add it to the staging area. I make further changes to that file. If I want to commit the most updated version, I need to re-add the file to the staging area. A) True B) False

A) True

In Prof Eicher's opinion, a certification through a cloud provider is more valuable in the job market than a data management certification (like those through DAMA). A) True B) False

A) True

One advantage of version control is that it allows collaborators to see a transparent history of changes, who made them, and how they contribute to the development of a project A) True B) False

A) True

The purpose of a pull request is to collaborate with others, generally telling them you've made changes to a branch A) True B) False

A) True

This DTSC-320 course will cover two AWS Academy courses (Data Engineering & Cloud Foundations), along with a module on Git and Github. A) True B) False

A) True

You will often get a .gitignore file straight from GitHub A) True B) False

A) True

A data engineer is building a pipeline to ingest unstructured and semistructured data. A data scientist will explore the data for potential use in an ML solution. Which ingestion approach is best for this business need? A) Use an extract, load, and transform (ELT) approach and load the nearly raw data into an S3 data lake. B) Use an extract, transform, and load (ETL) approach and load the highly transformed data into an Amazon Redshift data warehouse. C) Use an extract, load, and transform (ELT) approach and load the minimally transformed data into an Amazon Redshift data warehouse. D) Use an extract, transform, and load (ETL) approach and load the highly transformed data into an S3 data lake.

A) Use an extract, load, and transform (ELT) approach and load the nearly raw data into an S3 data lake.

What are the advantages of cloud computing over computing on-premises? Select all that apply A) Use on demand capacity B) Increase speed and agility C) Go global in minutes D) Avoid large capital purchases

A) Use on demand capacity B) Increase speed and agility C) Go global in minutes D) Avoid large capital purchases

Which statement accurately describes design characteristics of the ingestion layer in the AWS modern data architecture? A) Use purpose-built services to match the connectivity, data format, data structure, and data velocity requirements for each data source. B) Use AWS DataSync to perform a one-time migration from relational database sources. C) Use Amazon AppFlow to ingest streaming data from sources such as social media. D) Select a single ingestion service that can support the data formats, structures, and velocity requirements of all the data sources that you intend to collect.

A) Use purpose-built services to match the connectivity, data format, data structure, and data velocity requirements for each data source.

Which statement about data veracity is true? A) Value rests on veracity. If you can't trust the data, then your analysis will not have much value. B) The first point at which a data engineer or data scientist will address veracity is when cleaning and transforming the data during ingestion. C) The data analytics field shares a common definition of clean data to ensure that veracity standards are met on all data sources. D) All data issues that challenge veracity can be addressed by cleansing data that enters the pipeline.

A) Value rests on veracity. If you can't trust the data, then your analysis will not have much value.

Which statements describe volume and velocity in a data pipeline? (Select THREE.) A) Velocity is about how quickly data enters and moves through a pipeline. B) Volume is about how much data you need to process. C) Only volume drives the expected throughput and scaling requirements of your pipeline. D) Velocity is about how much data you need to process. E) Volume and velocity together drive the expected throughput and scaling requirements of a pipeline.

A) Velocity is about how quickly data enters and moves through a pipeline. B) Volume is about how much data you need to process. E) Volume and velocity together drive the expected throughput and scaling requirements of a pipeline.

Which data characteristic describes how accurate, precise, and trusted the data is? A) Veracity B) Variety C) Velocity D) Volume

A) Veracity

A data engineer is designing a pipeline to ingest semistructured data from their organization's customer service systems. Which statement best describes the approach that they should take? A) Work with business users to understand the primary use case. Then, build a process that suits the use case and leaves room for evolving needs. B) Build an extract, load, and transform (ELT) process that keeps the data in a raw state to be used for a variety of use cases. C) Build an extract, transform, and load (ETL) process to transform the data for fast SQL queries in a data warehouse. D) Ask the business users which method they prefer, recognizing that the choice they make now cannot easily be changed.

A) Work with business users to understand the primary use case. Then, build a process that suits the use case and leaves room for evolving needs.

Where can a customer go to get more detail about Amazon Elastic Cloud (Amazon EC2) billing activity that took place 3 months ago?

AWS Cost explorer

Which AWS services are free?

AWS IAM AWS Cloudformation Auto Scaling Amazon VPC Elastic Beanstalk

Which of the following are geographic areas that host two or most availability zones?

AWS Regions

Which AWS service provides infrastructure security optimization recommendations?

AWS Trusted Advisor

Which component of the AWS Global Infrastructure does Amazon CloudFront use to ensure low-latency delivery?

AWS edge locations

What AWS tool lets you explore AWS services and create an estimate for the cost of your use cases on AWS?

AWS pricing calculator

What does AURI stand for?

All Upfront Reserved Instance

Why is AWS more economical than traditional data centers for applications with varying compute workloads?

Amazon EC2 instances can be launched on-demand when needed

Which of these statements about Availability zones is true. Select all that apply.

Availability zones are designed for fault isolation Availability zones are made up of one or more datacenters Availability zones are connected to one other using high speed private links

Which statement accurately describes design characteristics of the storage layer of the AWS modern data architecture? A) Amazon S3 and Amazon Redshift are not natively integrated but can be integrated in the data pipeline by using a connector utility. B) Amazon S3 objects in the data lake are organized into different buckets or identified by different prefixes to represent different states of data (landing, raw, trusted, curated). C) Data arrives in the landing zone within Amazon S3. Data then moves to the raw zone, where it is made available for complex querying by the Amazon Redshift data warehouse. D) Amazon S3 is used to house highly structured data. Data cannot be ingested into the lake without first defining the schema.

B) Amazon S3 objects in the data lake are organized into different buckets or identified by different prefixes to represent different states of data (landing, raw, trusted, curated).

Which statement is true regarding how data might be acted upon as it passes through a pipeline? (Select TWO.) Select 2 correct answer(s) A) All the data that is ingested into the pipeline will be enriched. B) Data will probably be transformed through some type of extract, transform, and load (ETL) process. C) Extract, transform, and load (ETL) processes are not used in analytics pipelines. D) Data might need to be cleaned or normalized. E) Data that passes through the pipeline will be transformed only once, and this will occur during the ingestion phase.

B) Data will probably be transformed through some type of extract, transform, and load (ETL) process. D) Data might need to be cleaned or normalized.

Cloud computing provides a simple way to access servers, storage, databases, and a broad set of application services over the internet. You own the network-connected hardware equipment required for these services and Amazon Web Services provisions what you need. A) True B) False

B) False

GitHub is a version control system. A) True B) False

B) False

If you don't need to sync with the original repository, and just want to play with the code, you should fork instead of clone A) True B) False

B) False

In order to create a repository on GitHub called 'example', we use the command 'git github add example' A) True B) False

B) False

In order to create a repository with a connection to GitHub, you must create the repository locally first A) True B) False

B) False

Only one academic field is concerned with "Data management": data science. A) True B) False

B) False

To clone a repository, it is best to first use git init to create a repository, and then clone within it A) True B) False

B) False

When you fork a repository, it keeps all the information from the original repository, just like when you clone A) True B) False

B) False

git log allows you to create a report of your recent changes to code, and you create that log in any text editor A) True B) False

B) False

Which option describes a task that a data engineer would be likely to perform during data cleaning? A) Update column headings to match the headings in the target data store. B) Modify the format of a date field to comply with the format of a target data store. C) Research values that appear to be outliers, and replace them with values that are more within the normal range. D) Check that the postal codes in an address record fall within the actual postal code ranges for the state that is specified in the record.

B) Modify the format of a date field to comply with the format of a target data store.

A data engineer would like to improve the security of the data processing phase of a machine learning (ML) pipeline. What are best practices that they could implement? Select THREE. A) Choose an Amazon EC2 instance with more resources. B) Protect the privacy of sensitive data. C) Add metadata. D) Keep only the relevant data. E) Enable tagging. F) Enforce data lineage.

B) Protect the privacy of sensitive data. D) Keep only the relevant data. F) Enforce data lineage.

A business analyst requests to add a new data source into their data pipeline to provide details about customer interactions that can be analyzed for trend reports. Which data discovery task should a data engineer perform first? A) Determine how to organize the data in the pipeline storage layer. B) Query the source to see whether it provides data that is related to customer interactions. C) Evaluate the value of the data compared to the cost of extracting it. D) Determine whether the team has the correct tools to analyze the data.

B) Query the source to see whether it provides data that is related to customer interactions.

What is an example of structured data? A) Clickstream data B) Relational database table C) CSV, JSON, or XML file D) Image

B) Relational database table

Which data type best describes CSV, JSON, and XML files? A) Structured B) Semistructured C) Relational D) Unstructured

B) Semistructured

Which statement about the data validating step is true? A) A data engineer must perform all of the validating tasks. B) The data validating step shares some similarities with the data cleaning step. C) A data analyst must perform all of the validating tasks. D) Validating is an entirely manual process of reviewing data values in the enriched dataset.

B) The data validating step shares some similarities with the data cleaning step.

Which statement describes a key factor to design an effective decision-making infrastructure? A) The designer should focus on collecting data and building a pipeline that can be used for all the analytics in the organization. B) The designer should start with the business problem to be solved and then build the pipeline that suits that use case. C) Every data pipeline should follow the same pattern of tasks and move directly from ingestion to storage to processing and then to analysis and visualization, sequentially. D) The design should prevent data from being transformed as it moves through the pipeline.

B) The designer should start with the business problem to be solved and then build the pipeline that suits that use case.

Which statement about data types is correct? A) Unstructured data and structured data are equally difficult to query. B) Unstructured data is the hardest to query but the most flexible. Structured data is the easiest to query but the least flexible. C) Unstructured data is the hardest to query and the least flexible. Structured data is the easiest to query and the most flexible. D) Unstructured data is the easiest to query but the most flexible. Structured data is the hardest to query but the least flexible.

B) Unstructured data is the hardest to query but the most flexible. Structured data is the easiest to query but the least flexible.

Which statements are TRUE regarding horizontal and vertical scaling? Select TWO. A) Adding more Amazon EC2 instances to your resource pool is an example of vertical scaling. B) Upgrading to a higher Amazon EC2 instance type is an example of vertical scaling. C) Upgrading to a higher Amazon EC2 instance type and adding more EC2 instances to your resource pool are both examples of horizontal scaling. D) Adding more Amazon EC2 instances to your resource pool is an example of horizontal scaling. E) Upgrading to a higher Amazon EC2 instance type is an example of horizontal scaling.

B) Upgrading to a higher Amazon EC2 instance type is an example of vertical scaling. D) Adding more Amazon EC2 instances to your resource pool is an example of horizontal scaling.

If we clone a repository and change the remote to our own, but want to keep a connection to the original repository, by convention we refer to that remote as _________ A) prime B) upstream C) main D) master E) downstream

B) upstream

What are the support plans offered by AWS?

Basic, Developer, Business, Enterprise

When you work on a project, you're going to have a bunch of different features or ideas in progress at a given time. Those features can be worked on without disrupting your main code by creating and working on a __________

Branch

Which statement best describes the data scientist role in processing data through a pipeline? A) A data scientist focuses on granting the correct level of access to different types of users. B) A data scientist focuses on the infrastructure that data passes through. C) A data scientist works with data in the pipeline. D) A data scientist works on moving data into the pipeline.

C) A data scientist works with data in the pipeline.

A data engineer is building their infrastructure. They would like to create and deploy infrastructure as code to simplify and automate this process. Which service could the data engineer use to accomplish this task? A) AWS Key Management Service (AWS KMS) B) AWS Auto Scaling C) AWS CloudFormation D) AWS CloudTrail

C) AWS CloudFormation

Which task is part of the data publishing step of the data wrangling process? A) Fix outliers in the data before it is published. B) Decide how to map source fields to those in the data warehouse. C) Apply appropriate access controls. D) Review the dataset before writing it to pipeline storage.

C) Apply appropriate access controls.

Which type or types of data movement should the modern data architecture support? A) The architecture should not support data movement between data stores. Data should be ingested into the data store where it will be used. B) Only inside-out data movement, which is when data in the lake is moved to a purpose-built data store. C) Both inside-out and outside-in data movement. The architecture should also support movement directly between purpose-built data stores. D) Only outside-in data movement, which is when data from purpose-built data stores is moved into the data lake.

C) Both inside-out and outside-in data movement. The architecture should also support movement directly between purpose-built data stores.

Which statement describes a key task that is performed as part of the data enriching step? A) Ingest data sources into the target data store in the pipeline. B) Give business users access to the data to build reports. C) Combine data from different sources into a single dataset. D) Run scripts to perform row counts for each data source.

C) Combine data from different sources into a single dataset.

Economies of scale results from __________ A) Having many different cloud providers B) Having to invest heavily in data centers and servers C) Having hundreds of thousands of customers aggregated in the cloud D) Having hundreds of cloud services available over the internet

C) Having hundreds of thousands of customers aggregated in the cloud

A data engineer would like to improve their overall data security after encountering suspicious activity in their environment. Which principles can they apply to help strength their data security? Select THREE. A) Process data manually. B) Implement security best practices manually. C) Implement a strong identity foundation. D) Apply security at the network access control list (ACL) layer only. E) Protect data in transit and at rest. F) Enable traceability.

C) Implement a strong identity foundation. E) Protect data in transit and at rest. F) Enable traceability.

Which statement accurately reflects how an organization should think about using its data to make decisions? A) A solution should never sacrifice accuracy for speed. B) Data becomes more valuable for decision-making over time. C) Preventative and predictive analysis promise the most value when compared to other types of analytics. D) An organization should always store as much data as possible to maximize value.

C) Preventative and predictive analysis promise the most value when compared to other types of analytics.

An administrator needs to identify an AWS Identity and Access Management (IAM) user who terminated a production Amazon EC2 instance. Which service should they use in this situation? A) AWS CloudFormation B) AWS Auto Scaling C) Amazon CloudWatch D) AWS CloudTrail

D) AWS CloudTrail

What is the role of Amazon Redshift in the AWS modern data architecture? A) Amazon Redshift provides a relational database engine that is compatible with PostgreSQL. B) Amazon Redshift provides object storage for structured and unstructured data. C) Amazon Redshift provides a fully managed nonrelational database that supports key-value data models. D) Amazon Redshift provides a fully managed data warehouse service.

D) Amazon Redshift provides a fully managed data warehouse service.

Which statement best describes the unify part of the three-pronged strategy to build modern data infrastructures? A) Centrally control access to data. B) Use artificial intelligence and machine learning (AI/ML) to discover new insights faster. C) Migrate to purpose-built tools and data stores. D) Break down silos and democratize access.

D) Break down silos and democratize access.

What is the data engineer's role in processing data through a pipeline? A) Look for additional insights in the data. B) Train data analysts on analysis and visualization tools. C) Evaluate the results. D) Build the infrastructure that the data passes through.

D) Build the infrastructure that the data passes through.

Which scenario describes a challenge to velocity? A) Data is ingested from regional sales sites, and the overnight batch job fails because it runs out of space. B) Regional offices send the same type of data but use different CSV formats for their files. C) A sales department wants to use a data source but does not have information about its lineage or how it has been maintained. D) Clickstream data is collected from a shopping website to make personalized recommendations while a user is shopping. When the website is very busy, there is a delay in returning results to customers.

D) Clickstream data is collected from a shopping website to make personalized recommendations while a user is shopping. When the website is very busy, there is a delay in returning results to customers.

Which statement describes how data architectures evolved from 1970 to the present? A) Hierarchical databases dominated the market until the explosion of data during the rise of the internet. B) Data stores evolved from relational to nonrelational structures to support demand for higher levels of connected users. C) Data warehouses evolved out of the need to process data for artificial intelligence and machine learning (AI/ML) applications. D) Data stores evolved to adapt to increasing demands of data volume, variety, and velocity.

D) Data stores evolved to adapt to increasing demands of data volume, variety, and velocity.

An analyst wants to ingest data from a customer feedback database for an ML model to predict customer behavior. Records include copied emails, text notes, and screenshots. The most recent record is 2 years old. Many records have blank values for the customer identifier fields. How should a data engineer proceed with this request? A) Create scripts to structure and clean the data. B) Create a folder structure in the data lake to separate the data by type (for example, text or image). C) Buy an extraction tool that is designed to quickly extract message content from emails into a structured format. D) Evaluate whether the potential value of the data appears worth the effort to extract and prepare it.

D) Evaluate whether the potential value of the data appears worth the effort to extract and prepare it.

Which type of data source includes data that is generated continually by events that include a time-based component? A) Public datasets B) File stores C) On-premises databases D) Events, Internet of Things (IoT) devices, and sensors

D) Events, Internet of Things (IoT) devices, and sensors

What is the role of AWS Glue in the AWS modern data architecture? A) Provide the ability to query dat directly from the data lake by using SQL. B) Help you to monitor and classify data. C) Secure access to sensitive data. D) Facilitate data movement and transformation between data stores.

D) Facilitate data movement and transformation between data stores.

Which scenario could benefit from batch ingestion or processing? A) A dashboard is populated with real-time error rates of sensors in a factory. B) Clickstream data from a retainer's website sends a large volume of small bits of data at a continuous pace. Analysis must be performed immediately. C) Making a batch of chocolate chip cookies and ingesting them. D) Sales transaction data from a retailer's website is sent to a central location periodically. Data is analyzed overnight with reports sent to branches in the morning. E) Real-time alerts are produced based on log data to identify potential fraud as soon as it occurs.

D) Sales transaction data from a retailer's website is sent to a central location periodically. Data is analyzed overnight with reports sent to branches in the morning.

You have a file called "readme.md" and want to put it in the staging area. What command does this? A) add git readme.md B) add readme.md git C) readme.md git add D) git add readme.md

D) git add readme.md

Which of the following is NOT a git command? A) git init B) git clone C) git ignore D) git create

D) git create

The Bible of Data Management is called the (5-letter acronym):

DMBoK

What is at the center of the DAMA wheel? (two words)

Data Governance

The goal of the modern data architecture is to store data in a centralized location and make it available to all consumers to perform analytics and run AI/ML applications. What is the two-word term for the centralized repository of data (the center of the wheel in the diagram shown)?

Data Lake

Prof Eicher used the LinkedIn image below to explain that we shouldn't skip (two words)

Data Management

What is at the top of the Aiken Pyramid? (two words)

Data Science

What is the pricing model that enables AWS customers to pay for resources on an as-needed basis? A) Pay as you eat B) Pay as you buy C) Pay as you reserve D) Pay as you decommission E) Pay as you go

E) Pay as you go

A data scientist writes a program that applies rule-based logic to customer profile data to determine which customers are likely to respond to an advertisement. How would you categorize this type of analysis? A) Machine Learning (ML) B) Preventative analytics C) Megonigal analytics D) Artificial Intelligence (AI) E) Traditional analytics

E) Traditional analytics

As AWS grows, the cost of doing business is reduced and savings are passed back to the customer with lower pricing. What is this optimization called?

Economies of scale

Which AWS support plan offers a Technical Account Manager(TAM)

Enterprise

By default , AWS replicated your data and resources across Availability Zones for residency

False

Edge locations are only located the same general area as Regions.

False

To receive the discounted rate associated with reserved instances, you must make a full, upfront payment for the term of the agreement. True or false

False

Unlimited services are available with the AWS Free Tier to new AWS customers for 12 months following their AWS sign-up date. True or false

False

Select all true statements about AWS data centers.

It is the responsibility of AWS to keep the data centers secure Data centers consist of a large number of physical servers Data center locations are not disclosed and all access to them is restricted A data center is the location where the actual data resides

Data is the new _______ It's raw and it needs to be refined, protected, and managed.

Oil

What are the benefits of using AWS Organizations?

Provides the ability to create groups of accounts and then attach policies to a group Simplifies automating account creation and management by using API'S

Which statement is true about the pricing model on AWS?

Storage is typically charged per gigabyte

What does TCO stand for?

Total Cost of Ownership

Availability Zones within a Region are connected over low-latency links.

True

Networking, storage, compute, and databases are examples of service categories that AWS offers.

True

Regions are isolated form one another, resources in one Region are not automatically replicated to other regions

True

Match the AWS service with its corresponding use in the consumption layer. 1. Business Intelligence (BI) 2. Interactive SQL 3. Machine Learning ____Redshift ____Athena ____SageMaker ____QuickSight

__2__ Redshift __2__ Athena __3__ SageMaker __1__ QuickSight

Match the AWS service with its primary purpose. 1. AWS CloudTrial 2. AWS CloudWatch 3. AWS KMS 4. AWS IAM ____ Create and manage cryptographic keys ____ Monitoring and observability service ____ Logging service ____ Share and control access to your AWS resources for individuals & groups

__3__ Create and manage cryptographic keys __2__ Monitoring and observability service __1__ Logging service __4__ Share and control access to your AWS resources for individuals & groups

Put the following data zones in order from "beginning" (1) to "end" (4). ____Trusted ____Raw ____Landing ____Curated

__3__Trusted __2__Raw __1__Landing __4__Curated

AWS Global Infrastructure is _______ which means it has built in component redundancy which enables It to continue operation despite a failed component

fault tolerant

Ver todos los conjuntos de estudio

Data Management Midterm Review

Conjuntos de estudio relacionados

7.10-7.15

Color Theory

Ch. 35 history

Chapter 3

History Final 27

The Art of Public Speaking: MIDTERM

PHIL 110: Argument Forms

Chapter 3 ( HEALTH CLASS )

Ch 7 - Study Questions

MKTG 361 Chapter 12

CHM1020 Ch. 2

WA learning goals

Social Media - HubSpot

MKT450 Marketing Research Exam 3

Chapter 14 homework

MKT 301 STUDY GUIDE

2.7.5 Quiz Health PE

HISTORY 2 QUIZ 1

ATI fundamentals tough ones

Farm & Ranch Exam 3