AWS AI Practitioner
Initial Steps to train a ML model on AWS
- Step 1: Upload the dataset to Amazon S3. - Step 2: Create a training job in Amazon SageMaker. - Step 3: Configure the training job to use the dataset from Amazon S3.
foundation model
A large, pre-trained model that can be adapted for multiple tasks.
Retrieval Augmented Generation (RAG)
A method for enhancing model output by integrating retrieved information.
chunking
A technique to split large inputs into smaller, manageable segments.
Amazon Quicksight
BI tool for dashboards and reports
Amazon Sagemaker Canvas
Build ML models with no code by simply interacting with data and obtaining predictions. Best for nontechnical people.
AWS Cloud Adoption Framework
Business People Governance Platform Security Operations
Batch Inference
Choose for processing large sets of data offline without requiring a persistent endpoint
Amazon Polly
Converts text to speech—important for voice-enabled applications.
Amazon SageMaker Model Cards
Create documents of the lifecycle of ML models. Provide info on training data, performance metrics, and intended use cases
A data scientist is tasked with ensuring that an AI system is designed in a way that prioritizes user trust and comprehension. What does human-centered design mean in this context?
Creating AI systems that prioritize human needs and understanding
RLHF Steps
Creating a reward model, supervised fine-tuning
Amazon SageMaker
Crucial for building, training, and deploying ML models. You don't need to be an expert, but understanding SageMaker's feature services (like Clarify, Model Monitor, and Ground Truth) is essential.
key difference between data access control and data integrity
Data access control handles user authentication and authorization; data integrity ensures data is accurate and unchanged.
How best to tackled skewed examples in training data?
Data augmentation. Also can create more data from existing data.
AWS Glue
Data cleaning service. Extract, transform, and load (ETL) service that can categorize, clean, and transform unstructured data, like medical records, into a structured format
Most improtant chatbot capabilities for ensuring regulatory compliance
Data protection, Monitoring threat detection
Why use Decision Trees over KNN and SVM?
Decision Trees allow to clearly illustrate how different factors influence the outcome, making it easy to understand and interpret the predictions
distinction between discriminative and generative models
Generative models create new data from learned patterns, while discriminative models distinguish between classes in the data.
Transparent Model
Gives insight into decision making
Significant challenges of generative AI
Hallucinations, Toxicity, Intellectual property, plagiarism, disruption the nature of work, knowledge cutoff, lack of explainability, security and privacy concerns
Amazon Comprehend
Handles natural language processing (NLP) tasks like sentiment analysis and entity recognition. Finds insights and relationships in text.
Model Monitoring
Helps with Model Explainability, Detecting Drift, and the Model Update Pipeline
Underfitting
High bias and less variance. Bad on both test and training data.
Human vs automatic evaluation
Human evaluation assesses qualitative aspects like interpretability, while automatic evaluation provides quantitative scores using metrics like F1 and BERT Score.
Amazon SageMaker Clarify
Hvae a model's predictions be transparent and explainable to stakeholders. Can help assess and mitigate bias in pre and post training.
Serverless Inference in Amazon Sagemaker
Ideal for synchronous workloads with spiky traffic patterns that can tolerate latency variations
Amazon Bedrock
It is a fully managed solution that makes high-performing foundation models (FMs) from top AI startups and Amazon available to you through a common API. AWS's service - released in 2023 - for building scalable AI applications. Expect questions on how Bedrock integrates with modern AI applications, especially generative AI.
Gen AI techniques typically used today?
LLMs and RAG
What factors directly influence the latency of a machine learning model?
Length of the input and output sequences
Examples of supervised learning algorithsm
Linear Regression, Neural Network
Overfitting
Low bias and high variance. Bad on test data but good on training data.
MLOps meaning
Machine Learning Operations
Which characteristics of a Generative AI system are relevant for creating personalized product recommendations?
Personalization, Data efficiency, Scalability
Amazon SageMaker Data Wrangler
Prepare and transform data. Data cleansing and handling
Amazon VPC Gateway Endpoint
Privately connect AWS VPC to S3 and DynamoDB without a VPN. Stays within AWS network.
Amazon Sagemaker Inference Options
Real Time, Serverless, Asynchronous, Batch
What is a primary difference between real-time inference and batch inference?
Real-time inference provides immediate predictions with low latency, while batch inference processes large data volumes at once with higher latency.
What can evaluate the performance of a model for text summarization?
Recall-Oriented Understudy for Gisting Evaluation-N (ROUGE-N)
A legal firm has a large collection of scanned legal documents and case files in PDF format. They need a system to automate text extraction, identify key elements like tables and forms, and analyze image content to minimize manual intervention. Which AWS services will most efficiently meet the firm's needs with LEAST operational management?
Rekognition, Textract
Amazon RDS
Relational database service. Supports efficient vector data storage and querying for embeddings from ML models.
A machine learning engineer trained a model on AWS, but the training dataset included confidential information. How can the engineer ensure that the model does not generate inferences containing sensitive data?
Remove confidential data from the training dataset, retrain the model to eliminate traces of sensitive content.
Which two AWS services can provide detailed documentation of the model's training and ensure its predictions are fully explainable for audits
SageMaker Model Cards SageMaker Clarify
Amazon Macie
Service that uses machine learning to discover, classify, and protect sensitive data automatically
Amazon SageMaker Feature Store
Share ML features across projects
Amazon S3
Simple Storage Service (SaaS), a scalable, high-speed, low-cost, web-based cloud storage service designed for online backup and archiving of data and application programs
Amazon OpenSearch Service
Storing embeddings in a vector database for efficient similarity searches
Types of machine learning
Supervised Learning, Unsupervised Learning, Reinforcement Learning
When must you purchase Provisioned Throughput
To use a custom model for inference.
Few-shot prompting
Use a prompt that includes several examples of the task to guide the model in producing accurate responses.
Real-time Inference in Amazon Sagemaker
Use for low-latency workloads with predictable traffic patterns that need consistent latency characteristics and are always available
AWS PrivateLink
VPC directly to AWS services without exposure to public internet. Similar to VPC Gateway Endpoint, but works with more services.
How best to address concerns about data quality and model trustworthiness?
Veracity and Robustness, Explainability
AWS Sagemaker
allows the creation, training, and deployment by developers of machine-learning (ML) models. Released 2017.
Context window
amount of text an AI model can handle and respond to at once
Zero-shot prompting
asks the model to perform tasks without any prior examples
Amazon Bedrock Model Evaluation
assess, compare, and choose the most suitable foundational models
Intelligent Document Processing (IDP)
automates data processing using OCR, computer vision, NLP, and machine learning
Amazon SageMaker Model Registry
catalog models, manage model versions, and associate metadata with the models
AWS Security Hub
comprehensive view of your high-priority security alerts and compliance status
Embeddings
convert textual content into numerical vectorsrepresent data in a high-dimensional space
Amazon Redshift
data warehouse service for big data analytics
model drift
degradation of model behavior due to underlying data distribution
AWS Artifact
download and review compliance documents anytime
PartyRock
environment for experimenting with generative AI models
BERTScore
evaluation metric that uses contextual embeddings to compare generated text with a reference text, making it well-suited for assessing the semantic similarity of chatbot responses
Chain-of-thought prompting
guide the reasoning process of a model in a logical sequence
Negative prompting
guiding a generative AI model to avoid certain topics or content
Data efficiency
how well a model can learn from limited data
Discriminative models
learn the boundary between classes
AWS CloudTrail
log and audit all activities for compliance and security. audit, govern, and ensure compliance within your AWS account
Self-refine prompting
model iteratively solves a problem, critiques its own solution, and then revises the solution based on the critique
Amazon Q in QuickSight
natural language query feature in Amazon QuickSight that enables users to ask questions in plain language and receive relevant visualizations and dashboards without the need for coding or complex querying.
Instruction-based fine-tuning
pre-trained foundation model is further trained with specific instructions to perform particular tasks
Amazon Bedrock Guardrails
set up protections for your AI applications to met accuracy and regulatory compliance standards
Amazon Q
tailored for conversational AI applications. Can generate code snippets, manage reference tracking, and monitor open-source license compliance
Amazon Lex
used for building conversational interfaces like chatbots with voice and text.
AZ
Availability Zone, Redundancy and isolation from other AZs in a given region.
Prompt Leaking
AI inappropriately recalls or references previous queries
Data Poisoning
AI's response includes harmful or misleading content
Which AWS service is primarily responsible for managing user access and permissions to secure AI resources?
AWS Identity and Access Management (IAM)
Amazon Rekognition
AWS service for image and video analysis, such as facial recognition, object detection, and image moderation tasks. Has a Content Moderation feature.
Amazon SageMaker JumpStart
Accelerate your AI journey with pre-trained models and pre-built solutions
Why use Accuracy over Precision or Recall?
Accuracy gives the whole picture where Recall and Precision give partial pictures.
AWS tools to evaluate a model's performance and integrate human review
Amazon SageMaker Model Monitor and Amazon A2I (Amazon Augmented AI)
Which AWS machine learning services can detect and read text from images?
Amazon Textract, Amazon Rekognition
Amazon Inspector
Analyze security of EC2 instances by identifying potential vulnerabilities
Best AWS vector search services
DocumentDB, OpenSearch, Neptune ML, Aurora (Postgre-compatible), MemoryDB
Amazon Personalize
Easily generate personalized recommendations.
Amazon EFS stands for what
Elastic File System. When you need a traditional file system.
A data engineer uses an Amazon Bedrock base model to analyze chat interactions for customer support. To track model input and output data for monitoring, which strategy should the engineer implement?
Enable invocation logging directly in Amazon Bedrock.
Best way to evaluate Amazon Bedrock models for business value?
Evaluate the models using a human workforce and custom prompt datasets.
Amazon Textract
Extract text and data from documents
Core principles of responsible AI
Fairness, Explainability, Privacy & Security, Veracity & Robustness, Contrability, Transparency, Governance, Safety
The Batch Transform Inference is best for what
Make predictions in batches where immediate access is not required.
Role of Agents within Amazon Bedrock
Managing complex AI workflows with multiple steps.
AWS Regions and AZs
Most AWS Regions have a minimum of three Availability Zones (AZs), though some exceptions exist.
Federal solution outlines standards to protect the confidentiality, integrity, and availability of the data accessed by the AI.
NIST AU RMF
A financial technology startup is developing an innovative tool to predict stock market trends. The tool analyzes vast amounts of historical stock data to forecast future price movements. Which of the following statements accurately describes the neural networks in this financial application?
Neural networks are utilized as deep learning models that simulate the human brain's pattern recognition capabilities, learning from historical financial data to anticipate future stock market trends.
Best pricing model for pay for what you consume?
On-Demand
