AWS Certified Machine Learning 2

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

A specialist in machine learning is trying to construct a linear regression model.Given just the given residual plot, what is the MOST LIKELY cause of the model's failure? A. Linear regression is inappropriate. The residuals do not have constant variance. B. Linear regression is inappropriate. The underlying data has outliers. C. Linear regression is appropriate. The residuals have a zero mean. D. Linear regression is appropriate. The residuals have constant variance.

A. Linear regression is inappropriate. The residuals do not have constant variance.

A Machine Learning Specialist is putting a bespoke ResNet model into a Docker container in order to train the model using Amazon SageMaker. The Specialist is training the model on Amazon EC2 P3 instances and wants to setup the Docker container effectively to take use of the NVIDIA GPUs.What is the Specialist's role? A. Bundle the NVIDIA drivers with the Docker image. B. Build the Docker container to be NVIDIA-Docker compatible. C. Organize the Docker container's file structure to execute on GPU instances. D. Set the GPU flag in the Amazon SageMaker CreateTrainingJob request body.

B. Build the Docker container to be NVIDIA-Docker compatible.

A health care business intends to categorize X-ray pictures into normal and pathological categories using neural networks. The labeled data is separated into a 1,000-image training set and a 200-image test set. Initial training of a neural network model with 50 hidden layers resulted in a 99 percent accuracy on the training set but only a 55% accuracy on the test set. What modifications should the Specialist consider in order to resolve this situation? (Select three.) A. Choose a higher number of layers B. Choose a lower number of layers C. Choose a smaller learning rate D. Enable dropout E. Include all the images from the test set in the training set F. Enable early stopping

B. Choose a lower number of layers D. Enable dropout F. Enable early stopping

Amazon Textract is being used everyday by a corporation to extract textual data from thousands of scanned text-heavy legal documents. The firm utilizes this information to automatically process loan applications. Certain papers do not pass business validation and are returned to human reviewers for investigation. This action adds to the time required to process loan applications. What should the organization do to expedite loan application processing? A. Configure Amazon Textract to route low-confidence predictions to Amazon SageMaker Ground Truth. Perform a manual review on those words before performing a business validation. B. Use an Amazon Textract synchronous operation instead of an asynchronous operation. C. Configure Amazon Textract to route low-confidence predictions to Amazon Augmented AI (Amazon A2I). Perform a manual review on those words before performing a business validation. D. Use Amazon Rekognition's feature to detect text in an image to extract the data from scanned images. Use this information to process the loan applications.

C. Configure Amazon Textract to route low-confidence predictions to Amazon Augmented AI (Amazon A2I). Perform a manual review on those words before performing a business validation.

A Machine Learning Specialist is enabling Amazon SageMaker to provide simultaneous access to notebooks, model training, and endpoint deployment by numerous Data Scientists. To guarantee optimal operational performance, the Specialist must be able to monitor the frequency with which the Scientists deploy models, the GPU and CPU use of deployed SageMaker endpoints, and any issues that occur when an endpoint is called.Which services are linked with Amazon SageMaker for the purpose of tracking this data? (Select two.) A. AWS CloudTrail B. AWS Health C. AWS Trusted Advisor D. Amazon CloudWatch E. AWS Config

A. AWS CloudTrail D. Amazon CloudWatch

Using a dataset of 100 continuous numerical characteristics, a Data Scientist is developing a model to predict customer attrition. The Marketing department has offered no guidance on which characteristics are significant for churn prediction. The Marketing department want to interpret the model and determine the direct effect of important characteristics on the model's output. While training a logistic regression model, the Data Scientist notices a significant difference in the accuracy of the training and validation sets.Which techniques may the Data Scientist use to enhance the model's performance and meet the Marketing team's requirements? (Select two.) A. Add L1 regularization to the classifier B. Add features to the dataset C. Perform recursive feature elimination D. Perform t-distributed stochastic neighbor embedding (t-SNE) E. Perform linear discriminant analysis

A. Add L1 regularization to the classifier C. Perform recursive feature elimination

A Machine Learning Specialist is tasked with determining the optimal SageMakerVariantInvocationsPerInstance setting for an endpoint's automated scaling configuration. The Specialist conducted a load test on a single instance and concluded that the maximum number of requests per second (RPS) without degrading service is around 20 RPS. Due to the fact that this is the Specialist's first deployment, he wishes to set the invocation safety factor to 0.5. What should the Specialist put as the SageMakerVariantInvocationsPerInstance setting based on the aforementioned parameters and the fact that the invocations per instance setting is monitored on a per-minute basis? A. 10 B. 30 C. 600 D. 2,400

C is correct . SageMakerVariantInvocationsPerInstance = (MAX_RPS * SAFETY_FACTOR) * 60 AWS recommended Saf_fac =0 .5

A Machine Learning Specialist must be capable of ingesting streaming data and storing it in Apache Parquet files for analysis and exploration.Which of the following services would correctly ingest and store this data? A. AWS DMS B. Amazon Kinesis Data Streams C. Amazon Kinesis Data Firehose D. Amazon Kinesis Data Analytics

C. Amazon Kinesis Data Firehose

A Machine Learning Specialist is using an Amazon SageMaker notebook instance inside a company VPC's private subnet. The ML Specialist needs to take a picture of the Amazon SageMaker notebook instance's Amazon EBS volume. The ML Specialist, on the other hand, is unable to locate the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance inside the VPC.Why is the machine learning specialist unable to view the instance in the VPC? A. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs. B. Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts. C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts. D. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.

C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.

A machine learning expert wishes to integrate a bespoke algorithm into Amazon SageMaker. The Specialist runs the algorithm in a Docker container that Amazon SageMaker supports. How should the Specialist bundle the Docker container in order for Amazon SageMaker to properly start the training? A. Modify the bash_profile file in the container and add a bash command to start the training program B. Use CMD config in the Dockerfile to add the training program as a CMD of the image C. Configure the training program as an ENTRYPOINT named train D. Copy the training program to directory /opt/ml/train

C. Configure the training program as an ENTRYPOINT named train

A data ingestion solution for the organization's Amazon S3-based data lake is required by a machine learning specialist working for an online fashion company. The Specialist wishes to develop a set of ingesting mechanisms that will allow the following future capabilities: ✑ Real-time analytics ✑ Interactive analytics of historical data ✑ Clickstream analytics ✑ Product recommendations Which services should be used by the Specialist? A. AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real-time data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations B. Amazon Athena as the data catalog: Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for near-real-time data insights; Amazon Kinesis Data Firehose for clickstream analytics; AWS Glue to generate personalized product recommendations C. AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations D. Amazon Athena as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon DynamoDB streams for clickstream analytics; AWS Glue to generate personalized product recommendations

A. AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real-time data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations

A expert in machine learning (ML) wishes to secure calls to the Amazon SageMaker Service API. The expert has created an Amazon Virtual Private Cloud (VPC) with a VPC interface endpoint for the Amazon SageMaker Service API and is trying to protect traffic from particular instances and IAM users. A single public subnet is setup in the VPC.Which combination of procedures should the machine learning professional take to ensure traffic security? (Select two.) A. Add a VPC endpoint policy to allow access to the IAM users. B. Modify the users' IAM policy to allow access to Amazon SageMaker Service API calls only. C. Modify the security group on the endpoint network interface to restrict access to the instances. D. Modify the ACL on the endpoint network interface to restrict access to the instances. E. Add a SageMaker Runtime VPC endpoint interface to the VPC.

A. Add a VPC endpoint policy to allow access to the IAM users. C. Modify the security group on the endpoint network interface to restrict access to the instances.

A financial services firm wants to make Amazon SageMaker its primary data science environment. On sensitive financial data, the company's data scientists run machine learning (ML) models. The organization is concerned about data egress and desires the services of a machine learning engineer to safeguard the environment.Which methods does the machine learning engineer have at his disposal to manage data egress from SageMaker? (Select three.) A. Connect to SageMaker by using a VPC interface endpoint powered by AWS PrivateLink. B. Use SCPs to restrict access to SageMaker. C. Disable root access on the SageMaker notebook instances. D. Enable network isolation for training jobs and models. E. Restrict notebook presigned URLs to specific IPs used by the company. F. Protect data with encryption at rest and in transit. Use AWS Key Management Service (AWS KMS) to manage encryption keys.

A. Connect to SageMaker by using a VPC interface endpoint powered by AWS PrivateLink. D. Enable network isolation for training jobs and models. F. Protect data with encryption at rest and in transit. Use AWS Key Management Service (AWS KMS) to manage encryption keys.

A security-conscious company's machine learning specialist is creating a dataset for model training. The dataset is hosted in Amazon S3 and includes Personally Identifiable Information (PII) (PII).✑ Must be accessible from a VPC only.✑ Must not traverse the public internet.How are these criteria to be met? A. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC. B. Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance. C. Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance. D. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance

A. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.

A Data Scientist is required to develop a serverless ingestion and analytics solution for real-time streaming data at high speeds.Without sacrificing data integrity, the ingestion process must buffer and transform incoming records from JSON to a query-optimized, columnar format. The output datastore must be highly accessible, and analysts must be able to query the data using SQL and link to pre-existing business intelligence dashboards.Which solution should the Data Scientist construct in order to meet the requirements? A. Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector. B. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and writes the data to a processed data location in Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector. C. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and inserts it into an Amazon RDS PostgreSQL database. Have the Analysts query and run dashboards from the RDS database. D. Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.

A. Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.

An online retailer want to provide a new cloud-based product suggestion tool to its web application. Due to data localization requirements, sensitive data must remain on-premises and the product recommendation model must be trained and evaluated using only nonsensitive data. IPsec is required for data transport to the cloud. The web application is hosted on-premises and all data is stored in a PostgreSQL database. Each day, the organization needs the data securely sent to Amazon S3 for model retraining. How should a professional in machine learning achieve these requirements? A. Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tables without sensitive data through an AWS Site-to-Site VPN connection directly into Amazon S3. B. Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest all data through an AWS Site-to-Site VPN connection into Amazon S3 while removing sensitive data using a PySpark job. C. Use AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection. Replicate data directly into Amazon S3. D. Use PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection. Use AWS Glue to move data from Amazon EC2 to Amazon S3.

A. Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tables without sensitive data through an AWS Site-to-Site VPN connection directly into Amazon S3.

This graph plots the training and validation loss for a neural network versus the number of epochs. The following network is being trained: ✑ Two dense layers, one output neuron ✑ 100 neurons in each layer ✑ 100 epochs Which strategy may be utilized to increase the accuracy of the model in the validation set? A. Early stopping B. Random initialization of weights with appropriate seed C. Increasing the number of epochs D. Adding another layer with the 100 neurons

A. Early stopping

A Machine Learning Specialist is employed by a multinational cybersecurity firm that handles real-time security events for businesses worldwide. The cybersecurity firm wants to develop a system that would enable it to employ machine learning to classify dangerous events as anomalies in data as it is consumed. Additionally, the corporation wishes to save the findings in its data lake for subsequent processing and analysis.Which method is the MOST EFFECTIVE for completing these tasks? A. Ingest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection. Then use Kinesis Data Firehose to stream the results to Amazon S3. B. Ingest the data into Apache Spark Streaming using Amazon EMR, and use Spark MLlib with k-means to perform anomaly detection. Then store the results in an Apache Hadoop Distributed File System (HDFS) using Amazon EMR with a replication factor of three as the data lake. C. Ingest the data and store it in Amazon S3. Use AWS Batch along with the AWS Deep Learning AMIs to train a k-means model using TensorFlow on the data in Amazon S3. D. Ingest the data and store it in Amazon S3. Have an AWS Glue job that is triggered on demand transform the new data. Then use the built-in Random Cut Forest (RCF) model within Amazon SageMaker to detect anomalies in the data.

A. Ingest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection. Then use Kinesis Data Firehose to stream the results to Amazon S3.

A machine learning expert is currently working on a bespoke video recommendation model for an application. The dataset used to train this model is quite huge, including millions of data points that are stored in an Amazon S3 bucket. The Specialist want to avoid putting all of this data into an Amazon SageMaker notebook instance since doing so would take hours and would surpass the notebook instance's associated 5 GB Amazon EBS volume.Which strategy enables the Specialist to train the model using all available data? A. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode. B. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset C. Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode. D. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.

A. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.

A machine learning specialist is now developing a comprehensive Bayesian network on a dataset describing New York City's public transportation. One of the random variables is discrete, and it reflects the amount of time that New Yorkers wait for a bus, provided that buses run every ten minutes on average.Which probability distribution should the machine learning specialist use as a prior for this variable? A. Poisson distribution B. Uniform distribution C. Normal distribution D. Binomial distribution

A. Poisson distribution

A machine learning specialist is currently developing a logistic regression model to predict whether or not an individual would buy pizza. The Specialist is attempting to construct the best model with the best classification threshold.Which model assessment approach should the Specialist use to determine the effect of various categorization thresholds on the model's performance? A. Receiver operating characteristic (ROC) curve B. Misclassification rate C. Root Mean Square Error (RMSE) D. L1 norm

A. Receiver operating characteristic (ROC) curve

A logistics firm requires a forecast model to anticipate the inventory requirements for a single item across ten warehouses for the next month. A machine learning expert uses Amazon Forecast to create a forecast model using three years' worth of monthly data. There are no data gaps. The expert determines the algorithm to use to train a predictor. The predictor's mean absolute percentage error (MAPE) is much bigger than the MAPE generated by human forecasters currently in use. Which modifications to the CreatePredictor API call might result in an increase in MAPE? (Select two.) A. Set PerformAutoML to true. B. Set ForecastHorizon to 4. C. Set ForecastFrequency to W for weekly. D. Set PerformHPO to true. E. Set FeaturizationMethodName to filling.

A. Set PerformAutoML to true. D. Set PerformHPO to true.

The graph depicted is from a forecasting model used to validate a time series.Using just the graph, what conclusion should a Machine Learning Specialist draw about the model's behavior? A. The model predicts both the trend and the seasonality well B. The model predicts the trend well, but not the seasonality. C. The model predicts the seasonality well, but not the trend. D. The model does not predict the trend or the seasonality well.

A. The model predicts both the trend and the seasonality well

A media corporation with a large collection of unlabeled photographs, text, audio, and video footage seeks to index its assets in order to enable the Research team to quickly identify relevant information. The firm wishes to use machine learning in order to expedite the work of its in-house researchers, who have minimal experience with machine learning.Which approach is the FASTEST for indexing the assets? A. Use Amazon Rekognition, Amazon Comprehend, and Amazon Transcribe to tag data into distinct categories/classes. B. Create a set of Amazon Mechanical Turk Human Intelligence Tasks to label all footage. C. Use Amazon Transcribe to convert speech to text. Use the Amazon SageMaker Neural Topic Model (NTM) and Object Detection algorithms to tag data into distinct categories/classes. D. Use the AWS Deep Learning AMI and Amazon EC2 GPU instances to create custom models for audio transcription and topic modeling, and use object detection to tag data into distinct categories/classes.

A. Use Amazon Rekognition, Amazon Comprehend, and Amazon Transcribe to tag data into distinct categories/classes.

On Amazon S3, a financial services business is constructing a powerful serverless data lake. The data lake should be adaptable to changing circumstances and satisfy the following requirements:✑ Support querying old and new data on Amazon S3 through Amazon Athena and Amazon Redshift Spectrum.✑ Support event-driven ETL pipelines✑ Provide a quick and easy way to understand metadataWhich technique satisfies these criteria? A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata. B. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an external Apache Hive metastore to search and discover metadata. C. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Batch job, and an AWS Glue Data Catalog to search and discover metadata. D. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Glue ETL job, and an external Apache Hive metastore to search and discover metadata.

A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata.

A Machine Learning Specialist is required to work for an online shop that want to do analytics on each client visit using a machine learning pipeline.The data must be ingested at a rate of up to 100 transactions per second using Amazon Kinesis Data Streams, and the JSON data blob must be 100 KB in size.What is the MINIMUM number of shards that the Specialist should employ in Kinesis Data Streams to effectively ingest this data? A. 1 shards B. 10 shards C. 100 shards D. 1,000 shards

B. 10 shards

A data scientist is working on a binary classifier that will identify whether a patient has a certain ailment based on a sequence of test findings. The Data Scientist has information on 400 randomly chosen patients from the community. The illness affects 3% of the population. Which approach for cross-validation should the Data Scientist use? A. A k-fold cross-validation strategy with k=5 B. A stratified k-fold cross-validation strategy with k=5 C. A k-fold cross-validation strategy with k=5 and 3 repeats D. An 80/20 stratified split between training and validation

B. A stratified k-fold cross-validation strategy with k=5

A Machine Learning Specialist works for a credit card processing firm and is responsible for predicting in near-real time whether transactions are fraudulent. The Specialist must specifically train a model that provides the chance that a certain transaction is fraudulent. What frame should the Specialist use to frame this business issue? A. Streaming classification B. Binary classification C. Multi-category classification D. Regression classification

B. Binary classification

A Machine Learning Specialist is developing a technique to increase a company's sales. The goal is to leverage the massive quantity of data the corporation has on users' behavior and product preferences to forecast which things consumers would like based on their similarities to other users.What actions should the Specialist take to accomplish this objective? A. Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR. C. Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR D. Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR

B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.

A Machine Learning Specialist is given a TensorFlow project and is required to work for an extended time without access to Wi-Fi. Which strategy should the Specialist use in order to continue working? A. Install Python 3 and boto3 on their laptop and continue the code development using that environment. B. Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code. C. Download TensorFlow from tensorflow.org to emulate the TensorFlow kernel in the SageMaker environment. D. Download the SageMaker notebook to their local environment, then install Jupyter Notebooks on their laptop and continue the development in a local notebook.

B. Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code.

A machine learning specialist is running a linear least squares regression model on a 1,000-record dataset with 50 characteristics. Prior to training, the machine learning specialist observes that two characteristics are completely linearly dependent on one another.How is this relevant to the linear least squares regression model? A. It could cause the backpropagation algorithm to fail during training B. It could create a singular matrix during optimization, which fails to define a unique solution C. It could modify the loss function during optimization, causing it to fail during training D. It could introduce non-linear dependencies within the data, which could invalidate the linear assumptions of the model

B. It could create a singular matrix during optimization, which fails to define a unique solution

A real estate firm wishes to develop a machine learning model capable of forecasting home values using a historical dataset. 32 features are included in the dataset. Which model is most appropriate for the business requirement? A. Logistic regression B. Linear regression C. K-means D. Principal component analysis (PCA)

B. Linear regression

A machine learning specialist is developing a new natural language processing program that will parse 1 million words from a dataset. The objective is to then run Word2Vec to produce sentence embeddings and allow various forms of predictions.The following is an excerpt from the dataset:"The swift BROWN FOX leaps over the sluggish dog."Which of the following actions does the Specialist need to execute in order to properly clean and prepare the data in a repeatable manner? (Select three.) A. Perform part-of-speech tagging and keep the action verb and the nouns only. B. Normalize all words by making the sentence lowercase. C. Remove stop words using an English stopword dictionary. D. Correct the typography on "quck" to "quick." E. One-hot encode all words in the sentence. F. Tokenize the sentence into words.

B. Normalize all words by making the sentence lowercase. C. Remove stop words using an English stopword dictionary. F. Tokenize the sentence into words.

A Machine Learning Specialist has developed a proof of concept for a client using a small data set, and is now prepared to construct an end-to-end solution in AWS using Amazon SageMaker. Amazon RDS is used to store the past training data. Which technique should the Specialist use while training a model on such data? A. Write a direct connection to the SQL database within the notebook and pull data in B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook. C. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in. D. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.

B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.

A Machine Learning Specialist is tasked with the responsibility of establishing a long-running Amazon EMR cluster. There will be one master node, ten core nodes, and twenty task nodes in the EMR cluster. The Specialist will use Spot Instances in the EMR cluster to save money.Which nodes should be launched on Spot Instances by the Specialist? A. Master node B. Any of the core nodes C. Any of the task nodes D. Both core and task nodes

C. Any of the task nodes

A Machine Learning Specialist is developing a prediction model for a large number of variables by using linear models such as linear regression and logistic regression. The Specialist discovers that several characteristics are strongly associated with one another during exploratory data analysis. This may result in the model being unstable. What can be done to mitigate the negative effect of such a vast number of features? A. Perform one-hot encoding on highly correlated features. B. Use matrix multiplication on highly correlated features. C. Create a new feature space using principal component analysis (PCA) D. Apply the Pearson correlation coefficient.

C. Create a new feature space using principal component analysis (PCA)

A gaming business has introduced an online game in which players may sign up for free but must pay to access certain features. The organization must develop an automated system that can forecast if a new user will convert to a premium subscriber within a year. The business has compiled a labeled collection of data from one million consumers.The training dataset contains 1,000 positive samples (from users who paid within a year) and 999,000 negative samples (from users who never paid for any characteristics). Each data sample contains 200 attributes about the user, such as their age, device, location, and play behaviors.The Data Science team constructed a random forest model on this dataset, which converged to above 99 percent accuracy on the training set. However, the prediction accuracy on a test dataset was insufficient.Which of the following strategies should the Data Science team use to address this issue? (Select two.) A. Add more deep trees to the random forest to enable the model to learn more features. B. Include a copy of the samples in the test dataset in the training dataset. C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data. D. Change the cost function so that false negatives have a higher impact on the cost value than false positives. E. Change the cost function so that false positives have a higher impact on the cost value than false negatives.

C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data. D. Change the cost function so that false negatives have a higher impact on the cost value than false positives.

An online reseller has a huge, multi-column dataset that is missing 30% of the data in one column. According to a Machine Learning Specialist, some columns in the dataset may be utilized to recreate the missing data.Which reconstruction technique should the Specialist utilize to maintain the dataset's integrity? A. Listwise deletion B. Last observation carried forward C. Multiple imputation D. Mean substitution

C. Multiple imputation

A business want to categorize user behavior as fraudulent or normal. A Machine Learning Specialist want to develop a binary classifier based on two features: account age and transaction month. The graphic shown illustrates the class distribution of these characteristics. Which model, based on this information, would have the HIGHEST recall rate in relation to the fraudulent class? A. Decision tree B. Linear support vector machine (SVM) C. Naive Bayesian classifier D. Single Perceptron with sigmoidal activation function

C. Naive Bayesian classifier

A business operates a machine learning prediction service that produces 100 terabytes of forecasts daily. A Machine Learning Specialist must create a visualization of the daily precision-recall curve derived from the predictions and provide it to the Business team in a read-only format. Which method involves the LEAST work in terms of coding? A. Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Give the Business team read-only access to S3. B. Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team. C. Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team. D. Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team.

C. Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team.

A machine learning specialist is developing a convolutional neural network (CNN) for the purpose of classifying ten different animal species. The Specialist has constructed a sequence of layers in a neural network that will take a picture of an animal as input, run it through a succession of convolutional and pooling layers, and then pass it through a dense and fully connected layer with ten nodes. The Specialist want to get from the neural network an output that represents a probability distribution indicating the likelihood that the input picture belongs to each of the ten classifications.Which function returns the desired result? A. Dropout B. Smooth L1 loss C. Softmax D. Rectified linear units (ReLU)

C. Softmax

On Amazon SageMaker, a Machine Learning Specialist is preparing data for training. The Specialist is training using one of SageMaker's built-in algorithms. The dataset is saved in.CSV format and converted to a numpy.array, which looks to be slowing down the training process.What actions should the Specialist take to optimize the data for SageMaker training? A. Use the SageMaker batch transform feature to transform the training data into a DataFrame. B. Use AWS Glue to compress the data into the Apache Parquet format. C. Transform the dataset into the RecordIO protobuf format. D. Use the SageMaker hyperparameter optimization feature to automatically optimize the data.

C. Transform the dataset into the RecordIO protobuf format

A city wants to monitor its air quality in order to mitigate the effects of air pollution. A machine learning specialist is required to anticipate the city's air quality in terms of contaminants in parts per million for the following two days. Due to the nature of this prototype, only daily data from the previous year is accessible. Which model is the MOST LIKELY to get optimal outcomes in Amazon SageMaker? A. Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor. B. Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data. C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor. D. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of classifier.

C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.

A machine learning specialist is debating whether to construct a naïve Bayesian model or a fully Bayesian network to solve a classification issue. The Specialist calculates Pearson correlation coefficients between each characteristic and discovers that their absolute values vary from 0.1 to 0.95.In this case, which model best represents the underlying data? A. A naive Bayesian model, since the features are all conditionally independent. B. A full Bayesian network, since the features are all conditionally independent. C. A naive Bayesian model, since some of the features are statistically dependent. D. A full Bayesian network, since some of the features are statistically dependent.

D. A full Bayesian network, since some of the features are statistically dependent.

A data scientist is training a multilayer perception (MLP) algorithm on a multiclass dataset. Although the target class is distinct from the other classes in the dataset, it does not attain an adequate recall score. The Data Scientist has already experimented with altering the number and size of the MLP's hidden layers, but the results have not improved considerably. A recall solution must be implemented as soon as feasible.Which strategies are appropriate for meeting these requirements? A. Gather more data using Amazon Mechanical Turk and then retrain B. Train an anomaly detection model instead of an MLP C. Train an XGBoost model instead of an MLP D. Add class weights to the MLP's loss function and then retrain

D. Add class weights to the MLP's loss function and then retrain

Which of the following metrics should a Machine Learning Specialist often utilize when comparing and evaluating machine learning classification models? A. Recall B. Misclassification rate C. Mean absolute percentage error (MAPE) D. Area Under the ROC Curve (AUC)

D. Area Under the ROC Curve (AUC)

A data scientist is constructing a pipeline for the ingestion of streaming online traffic data. As part of the pipeline, the data scientist must build a technique for identifying odd web traffic patterns. The patterns will be utilized for alerting and incident response downstream. If necessary, the data scientist gets access to unlabeled historical data. The solution must do the following: ✑ Calculate an anomaly score for each item in the online traffic. Adapt the detection of anomalous events to changing web trends over time. Which methodology should the data scientist use to satisfy these requirements? A. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record. B. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record. C. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window. D. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

D. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

The chief editor of a product catalog has requested that the research and development team create a machine learning system capable of determining whether or not people in a series of photographs are wearing the company's retail brand. The team has a collection of training data.Which machine learning method should the researchers utilize to ensure that their criteria are met to the greatest extent possible? A. Latent Dirichlet Allocation (LDA) B. Recurrent neural network (RNN) C. K-means D. Convolutional neural network (CNN)

D. Convolutional neural network (CNN)

A manufacturing corporation maintains a substantial collection of labeled historical sales data. The manufacturer wishes to forecast the number of units of a certain component that should be manufactured each quarter.Which machine learning technique should be utilized to address this issue? A. Logistic regression B. Random Cut Forest (RCF) C. Principal component analysis (PCA) D. Linear regression

D. Linear regression

A machine learning specialist is creating a daily ETL routine that includes various ETL operations. The workflow is comprised of the following steps: "¢ Begin the procedure immediately upon data upload to Amazon S3. "¢ Once all datasets are accessible in Amazon S3, start an ETL task to combine the new datasets to the numerous terabyte-sized datasets already in Amazon S3. "¢ Store the results of joining datasets in Amazon S3. "¢ Notify the Administrator if one of the tasks fails. Which arrangement will satisfy these criteria? A. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure. B. Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure. C. Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure. D. Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.

A. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.

A data scientist wants to utilize Amazon Forecast to develop a forecasting model for a retail company's inventory demand. The firm has submitted a.csv file with historical inventory demand data for its items in an Amazon S3 bucket. The table below contains a representative sample of the dataset. What should the data scientist do with the data after it is transformed? A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metadata dataset. Upload both datasets as .csv files to Amazon S3. B. Use a Jupyter notebook in Amazon SageMaker to separate the dataset into a related time series dataset and an item metadata dataset. Upload both datasets as tables in Amazon Aurora. C. Use AWS Batch jobs to separate the dataset into a target time series dataset, a related time series dataset, and an item metadata dataset. Upload them directly to Forecast from a local machine. D. Use a Jupyter notebook in Amazon SageMaker to transform the data into the optimized protobuf recordIO format. Upload the dataset in this format to Amazon S3.

A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metadata dataset. Upload both datasets as .csv files to Amazon S3.

A firm operates a vast number of factories and maintains a complicated supply chain connection in which an unexpected breakdown of a machine might result in the suspension of operations at multiple plants. A data scientist wishes to examine factory sensor data in order to detect equipment in need of preventative maintenance and then deploy a repair crew to avoid unscheduled downtime. A single machine's sensor data may include up to 200 data points, including temperatures, voltages, vibrations, RPMs, and pressure measurements.The firm put Wi-Fi and LANs across the plants to capture this sensor data. Despite the fact that many industrial sites lack stable or high-speed internet access, the manufacturer want to retain near-real-time inference capabilities.Which model deployment architecture will satisfy these business requirements? A. Deploy the model in Amazon SageMaker. Run sensor data through this model to predict which machines need maintenance. B. Deploy the model on AWS IoT Greengrass in each factory. Run sensor data through this model to infer which machines need maintenance. C. Deploy the model to an Amazon SageMaker batch transformation job. Generate inferences in a daily batch report to identify machines that need maintenance. D. Deploy the model in Amazon SageMaker and use an IoT rule to write data to an Amazon DynamoDB table. Consume a DynamoDB stream from the table with an AWS Lambda function to invoke the endpoint.

B. Deploy the model on AWS IoT Greengrass in each factory. Run sensor data through this model to infer which machines need maintenance.

A machine learning specialist has developed a deep learning neural network model that performs well on training data but fails miserably on test data.Which of the following should the Specialist examine in order to rectify this? (Select three.) A. Decrease regularization. B. Increase regularization. C. Increase dropout. D. Decrease dropout. E. Increase feature combinations. F. Decrease feature combinations.

B. Increase regularization. C. Increase dropout. F. Decrease feature combinations.

A machine learning specialist is currently developing a model for recognizing the make and model of automobiles in photographs. The Specialist want to use transfer learning and an already-trained model on photos of common things. The Specialist compiled a large bespoke collection of images of several car brands and models. What should the Specialist do to re-train the model with the custom data? A. Initialize the model with random weights in all layers including the last fully connected layer. B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer. C. Initialize the model with random weights in all layers and replace the last fully connected layer. D. Initialize the model with pre-trained weights in all layers including the last fully connected layer.

B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.

A Machine Learning Specialist is assigned to a Fraud Detection team and is responsible for tuning an XGBoost model to ensure that it performs correctly on test data. However, when dealing with unknown data, this does not operate as planned. The following table summarizes the existing parameters.Which parameters should the Specialist tune in order to prevent overfitting? A. Increase the max_depth parameter value. B. Lower the max_depth parameter value. C. Update the objective to binary:logistic. D. Lower the min_child_weight parameter value.

B. Lower the max_depth parameter value.

A machine learning professional is running an Amazon SageMaker endpoint on a P3 instance and using the built-in object identification algorithm to make real-time predictions in a production application. When the expert examines the model's resource consumption, he or she sees that the model is only using a portion of the GPU.Which architectural improvements would maximize the use of provided resources? A. Redeploy the model as a batch transform job on an M5 instance. B. Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance. C. Redeploy the model on a P3dn instance. D. Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance.

B. Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance.

A machine learning specialist created a deep learning model for picture categorization. The Specialist, on the other hand, encountered an overfitting issue, with training and testing accuracies of 99 percent and 75%, respectively.How should the Specialist approach this situation and what is the underlying cause? A. The learning rate should be increased because the optimization process was trapped at a local minimum. B. The dropout rate at the flatten layer should be increased because the model is not generalized enough. C. The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough. D. The epoch number should be increased because the optimization process was terminated before it reached the global minimum.

B. The dropout rate at the flatten layer should be increased because the model is not generalized enough.

A big consumer goods firm is now selling the following items: "¢ 34 different toothpaste variants "¢ 48 different toothbrush variants "¢ 43 different mouthwash variants Amazon S3 stores the complete sales history of all of these goods. Currently, the firm forecasts demand for these items using custom-built autoregressive integrated moving average (ARIMA) models. The corporation want to forecast demand for a newly announced product. Which of the following solutions should a Machine Learning Specialist implement? A. Train a custom ARIMA model to forecast demand for the new product. B. Train an Amazon SageMaker DeepAR algorithm to forecast demand for the new product. C. Train an Amazon SageMaker k-means clustering algorithm to forecast demand for the new product. D. Train a custom XGBoost model to forecast demand for the new product.

B. Train an Amazon SageMaker DeepAR algorithm to forecast demand for the new product.

A corporation is experiencing poor accuracy while training on Amazon SageMaker's default built-in picture categorization algorithm. The Data Science team want to use an Inception neural network architecture rather than a ResNet one.Which of the following is the most effective way to do this? (Select two.) A. Customize the built-in image classification algorithm to use Inception and use this for model training. B. Create a support case with the SageMaker team to change the default image classification algorithm to Inception. C. Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training. D. Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network, and use this for model training. E. Download and apt-get install the inception network code into an Amazon EC2 instance and use this instance as a Jupyter notebook in Amazon SageMaker.

C. Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training. D. Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network, and use this for model training.

Using a study dataset, a Machine Learning team trains an Apache MXNet handwritten digit classification model using Amazon SageMaker. The team want to be notified when the model becomes overfit. Auditors wish to inspect the Amazon SageMaker log activity report to guarantee no illegal API calls have occurred.What should the Machine Learning team do to ensure that the criteria are met with the fewest lines of code and procedures possible? A. Implement an AWS Lambda function to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting. B. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting. C. Implement an AWS Lambda function to log Amazon SageMaker API calls to AWS CloudTrail. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting. D. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Set up Amazon SNS to receive a notification when the model is overfitting

B. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.

A marketing manager of a pet insurance business intends to conduct a focused social media marketing campaign in order to increase client acquisition. Currently, the following data is stored in Amazon Aurora:✑ Profiles for all past and existing customers✑ Profiles for all past and existing insured pets✑ Policy-level information✑ Premiums received✑ Claims paidHow could a machine learning model be used to detect prospective new clients on social media? A. Use regression on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media B. Use clustering on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media C. Use a recommendation engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media. D. Use a decision tree classifier engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.

B. Use clustering on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media

A Data Scientist is tasked with the responsibility of migrating an on-premises ETL process to the cloud. The present procedure is scheduled to run on a regular basis and makes use of PySpark to consolidate and format many huge data sources into a single consolidated output for downstream processing.The Data Scientist has been instructed to provide the cloud solution with the following requirements:✑ Combine multiple data sources.✑ Reuse existing PySpark logic.✑ Run the solution on the existing schedule.✑ Minimize the number of servers that will need to be managed.Which architecture should the Data Scientist use for this solution's construction? A. Write the raw data to Amazon S3. Schedule an AWS Lambda function to submit a Spark step to a persistent Amazon EMR cluster based on the existing schedule. Use the existing PySpark logic to run the ETL job on the EMR cluster. Output the results to a "processed" location in Amazon S3 that is accessible for downstream use. B. Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a "processed" location in Amazon S3 that is accessible for downstream use. C. Write the raw data to Amazon S3. Schedule an AWS Lambda function to run on the existing schedule and process the input data from Amazon S3. Write the Lambda logic in Python and implement the existing PySpark logic to perform the ETL process. Have the Lambda function output the results to a "processed" location in Amazon S3 that is accessible for downstream use. D. Use Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries against the stream to carry out the required transformations within the stream. Deliver the output results to a "processed" location in Amazon S3 that is accessible for downstream use.

B. Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a "processed" location in Amazon S3 that is accessible for downstream use.

The Machine Learning Specialist is developing a model to forecast future employment rates using a variety of economic variables. While analyzing the data, the Specialist observes that the amplitude of the input attributes varies significantly. The Specialist does not want the model to be dominated by factors of greater size.What is the Specialist's role in preparing data for model training? A. Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution. B. Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude. C. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude. D. Apply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude.

C. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude.

A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket that is encrypted on the server side using AWS Key Management Service (KMS).How should the Machine Learning Specialist configure the Amazon SageMaker notebook instance so that it may access the same dataset stored in Amazon S3? A. Define security group(s) to allow all HTTP inbound/outbound traffic and assign those security group(s) to the Amazon SageMaker notebook instance. B. Ð¡onfigure the Amazon SageMaker notebook instance to have access to the VPC. Grant permission in the KMS key policy to the notebook's KMS role. C. Assign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset. Grant permission in the KMS key policy to that role. D. Assign the same KMS key used to encrypt data in Amazon S3 to the Amazon SageMaker notebook instance.

C. Assign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset. Grant permission in the KMS key policy to that role.

A major company's data science team use Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is worried that internet-enabled notebook instances provide a security risk by allowing malicious programs to breach data privacy. The organization requires that all instances remain inside a protected VPC with no access to the internet, and that all data communication traffic remains within the AWS network. What configuration should the Data Science team make to the notebook instance placement to suit these requirements? A. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC. B. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker. C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it. D. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.

C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.

A medical imaging business wants to train a computer vision model to identify suspicious spots on CT images of patients. The firm has amassed a sizable collection of unlabeled CT scans that are associated with individual patients and kept in an Amazon S3 bucket. Scanners must be restricted to authorized users. A machine learning engineer is tasked with the task of developing a labeling pipeline. Which sequence of stages should the engineer follow in order to construct the labeling pipeline with the LEAST amount of effort? A. Create a workforce with AWS Identity and Access Management (IAM). Build a labeling tool on Amazon EC2 Queue images for labeling by using Amazon Simple Queue Service (Amazon SQS). Write the labeling instructions. B. Create an Amazon Mechanical Turk workforce and manifest file. Create a labeling job by using the built-in image classification task type in Amazon SageMaker Ground Truth. Write the labeling instructions. C. Create a private workforce and manifest file. Create a labeling job by using the built-in bounding box task type in Amazon SageMaker Ground Truth. Write the labeling instructions. D. Create a workforce with Amazon Cognito. Build a labeling web application with AWS Amplify. Build a labeling workflow backend using AWS Lambda. Write the labeling instructions.

C. Create a private workforce and manifest file. Create a labeling job by using the built-in bounding box task type in Amazon SageMaker Ground Truth. Write the labeling instructions.

A business want to categorize user behavior as fraudulent or normal. A machine learning expert will develop a binary classifier based on two features: the account's age, represented by x, and the month of the transaction, denoted by y. The distributions of the classes are shown in the accompanying image. Positive classes are shown in red, whereas negative classes are depicted in black.Which model would be the most precise? A. Linear support vector machine (SVM) B. Decision tree C. Support vector machine (SVM) with a radial basis function kernel D. Single perceptron with a Tanh activation function

C. Support vector machine (SVM) with a radial basis function kernel

A credit card business want to develop a credit score model that will assist it in determining if a new credit card application would fail on a credit card payment. The business gathered data from a variety of sources and extracted hundreds of raw features. Early studies with training a classification model demonstrated that many qualities are closely associated, that the vast number of features greatly slows down training speed, and that there are some concerns with overfitting.The Data Scientist working on this project wishes to accelerate model training without sacrificing too much information from the original dataset.Which approach of feature engineering should the Data Scientist use to accomplish the objectives? A. Run self-correlation on all features and remove highly correlated features B. Normalize all numerical values to be between 0 and 1 C. Use an autoencoder or principal component analysis (PCA) to replace original features with new features D. Cluster raw data using k-means and use sample data from each cluster to build a new dataset

C. Use an autoencoder or principal component analysis (PCA) to replace original features with new features

An interactive online dictionary would want to include a widget that shows terms that are often used in comparable circumstances. A machine learning specialist is tasked with providing word characteristics for the widget's downstream closest neighbor model. What actions should the Specialist take to ensure compliance with these requirements? A. Create one-hot word encoding vectors. B. Produce a set of synonyms for every word using Amazon Mechanical Turk. C. Create word embedding vectors that store edit distance with every other word. D. Download word embeddings pre-trained on a large corpus.

D. Download word embeddings pre-trained on a large corpus.

A Data Engineer is tasked with the responsibility of developing a model utilizing a dataset including customer credit card information.How can the Data Engineer guarantee that the data stays encrypted and the credit card information remains secure? A. Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers. B. Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers. C. Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers. D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.

D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.

A data scientist is constructing a machine learning model to forecast future health outcomes using data about each patient and their treatment plans. The model should anticipate a continuous value. The offered data set contains labeled outcomes for 4,000 patients. The research examined a group of people over the age of 65 who had a specific condition that is known to deteriorate with age. Initial models have been unsatisfactory. While studying the underlying data, the Data Scientist discovers that out of 4,000 patient observations, 450 had an age of 0. The remaining characteristics of these observations seem to be normal in comparison to the remainder of the sample group. What should the Data Scientist do to rectify this situation? A. Drop all records from the dataset where age has been set to 0. B. Replace the age field value for records with a value of 0 with the mean or median value from the dataset C. Drop the age feature from the dataset and train the model using the rest of the features. D. Use k-means clustering to handle missing features

AWS Certified Machine Learning 2

संबंधित स्टडी सेट्स

Computer Forensics

SKELETAL SYSTEM

Social Movements Unit 1

IBUS 1

RTVF Midterm CH 3 & 4

What is Memory? (Chapter 6)

Physic 1 homeworks

FIN 317 multiple choice.

Macro Midterm Module 7

IT 215 Midterm Review

Organizational Behaviour CH-04

BLAW Ch. 13 - Defenses to Contract Enforceability

Marketing Final Exam Ch. 6

CCST Networking Midterm

How the earth works quiz anwers

PL10 Physiology of Sleep & EEG

Chapter 2 Study Guide

COMP 251 Quiz 2 ChatGPT

Term 2 Comp Exam

Ch. 38: Angiosperm Reproduction and Biotechnology