Microsoft AI-900: Generative AI Topics
Why define embeddings or contextual vectors? (transformer model)
To create a vocabulary that include semantic relationships between tokens
What can be thought of as coordinates in a multidimensional space? (Transformer model)
Elements in a token embedding vector so that each token occupies a specific "location"
What Azure OpenAI model would you use to convert text to numeric vectors for analysis, w? Ex. comparing sources of text for similarity
Embeddings
You can utilize ChatGPT's capabilities on Azure OpenAI via this model:
GPT-35-turbo
What does GitHub Copilot integrate with?
Integrates the power of OpenAI Codex into a plugin for developer environments like Visual Studio Code
What is the metaprompt and grounding layer? (generative AI)
Involves the construction of prompts submitted to the model
What approach is used to mitigate potential harms of a generative AI solution?
Layered approach
What is unique about the Codex model family?
More capable across more languages than GPT models
In practice, what technique uses different elements of the embeddings to calculate multiple attention scores?
Multi-head attention
What are vectors in an embedding?
Multi-valued numeric representations of info Example: [10, 3, 1]
Does the attention layer of an encoder or decoder block work with numeric vector representations of the tokens or the actual text?
Numeric vector representations of the tokens
What does the system layer of a generative AI solution involve?
Platform-level configurations and capabilities
During training, what is the goal of the attention layer in a decoder block?
Predict the vector for the final token in the sequence based on preceding tokens
For language tokens, what does each element of a token's vector represent?
Semantic attribute of the token
What is a Large Language Model (LLM)?
Specialized type of ML model that can be used to perform NLP tasks
When editing an image using DALL-E, how do you indicate the area of the image you want to edit?
Specify a transparent mask
What does GitHub call the Github copilot?
The AI pair programmer
What type of architecture are LLMs based on?
Transformer
The encoder and decoder blocks in a transformer model include multiple layers. What is one of the types of layers used in both blocks?
attention layer
What are some image generation capabilities supported by Azure OpenAI Service?
generate and edit images
What can Codex be used for?
- Turn comments into code - Rewrite code for efficiency - Complete your next line of code - Bring knowledge to you (ex. API call for an app) - Add comments
What are some mitigation techniques used at the safety system layer?
1. Abuse detection algorithms 2. Alert notifications
What large language model capabilities are supported with Azure OpenAI Service?
1. Deploy 2. Customize 3. Host
List 4 steps in identifying harm (generative AI)
1. Identify harms 2. Prioritize harms based on: likelihood of occurrence and level of impact 3. Test and verify harms 4. Document and share verified harms
What are the 4 steps to responsible Generative AI solution?
1. Identify potential harms 2. Measure the presence and impact 3. Mitigate harms 4. Operate responsibly
What are some common AI workloads that Azure OpenAI supports?
1. Machine Learning 2. Computer Vision 3. NLP 4. Conversational AI 5. Anomaly detection 6. Knowledge Mining
LLMs based on Transformer architecture have proven to be successful in _______ & _______ .
1. Modeling vocabularies 2. Generating language
Generative AI workloads can create original content in a variety of formats. What are the 3 formats?
1. NLP 2. Image 3. Code
As it relates to code generation, GPT models can take in the following types of inputs and translate them into code:
1. Natural language 2. or Code Snippets
Common types of harm related to generative AI:
1. Offensive or discriminatory harm 2. Factual inaccuracies 3. Content that encourages illegal or unethical practices
List 3 steps to measure potential harms (generative AI)
1. Prepare a diverse selection of inputs 2. Submit prompts to system 3. Apply pre-defined criteria to evaluate output (ex. "Harmful" or "Not harmful")
With DALL-E, how are image variations created? 2 steps
1. Providing image 2. Specifying how many variations of the image you would like
What are the main factors to consider when trying to identify harm?
1. Specific services 2. Model used 3. Fine-tuning techniques 4. Grounding data
What are some mitigation techniques you can use at the metaprompt and grounding layer?
1. Specifying metaprompts or system inputs that define behavioral parameters 2. Applying prompt engineering to add grounding data 3. Retrieval augmented generation (RAG) to retrieve contextual data from trusted data sources and include it in prompts
What are some natural language generation capabilities support by Azure OpenAI Service?
1. Text completion: generate & edit text 2. Embeddings: search, classify, and compare text
What are some overlapping capabilities between Azure AI Language Service and Azure OpenAI Service?
1. Translation 2. Sentiment Analysis 3. Keyword extraction
What are 2 excellent applications of generative pre-trained transformer (GPT) models?
1. Understanding language 2. Creating language
Editing an image with DALL-E, how are edits made? 3 steps
1. Uploading original image 2. Specifying a transparent mask that indicates what area of the image to edit 3. Prompt indicating what is to be edited
How many elements are in this vector? [10, 3, 1]
3
What is a characteristic of a higher performing generative AI model?
A model that has been fine-tuned for specific tasks
What does each numeric element in a vector represent?
A particular attribute of the information Example: [10, 3, 1] has 3 elements
What is a multi-valued numeric representations of information, for example [10, 3, 1] in which each numeric element represents a particular attribute of the information?
A vector
What is the 'magic' of the GPT-4 model?
A. Ability to a string coherent sentence together B. Large vocabulary
You would like to complete some coed. Which generative AI model would you use? A. Codex B. GPT C. DALL-E
A. Codex
What is the best way to start using a Codex model? Start with a ________.
A. Comment B. Data C. Code
What are embeddings in a transformer model?
A. Contextual vectors B. Encapsulates semantic relationships between tokens to give meaning to words in the vocabulary
The ability to string a coherent sentence together does NOT imply that GPT-4 has any __________ or _________.
A. Knowledge B. Intelligence
What is a decoder block?
A. One of two components of a transformer model architecture B. Generates new language sequences
Which of Azure OpenAI's AI models can be trained and customized with fine-tuning?
All models
How are ChatGPT, OpenAI, and Azure OpenAI related? A. Azure OpenAI is Microsoft's version of ChatGPT, a chatbot that uses generative AI models. B. ChatGPT and OpenAI are chatbots that generate natural language, code, and images. Azure OpenAI provides access to these two chatbots. C. OpenAI is a research company that developed ChatGPT, a chatbot that uses generative AI models. Azure OpenAI provides access to many of OpenAI's AI models.
C. OpenAI is a research company that developed ChatGPT, a chatbot that uses generative AI models. Azure OpenAI provides access to many of OpenAI's AI models.
In the Azure OpenAI Studio, where can find the assistant setup to instruct the model about how it should behave?
Chat playground
What is an example of an automated testing method to measure potential harm? (generative AI)
Classification model to evaluate output
In the Azure OpenAI Studio, where can you experiment with prompts, configure parameters, and see responses without having to code?
Completions playground
Which additional piece of information is included with each phrase returned by an image description task of the Azure AI Vision?
Confidence score Tip: Each phrase returned by an image description task of the Azure AI Vision includes the confidence score. vs. Bounding box coordinates are returned by services such as object detection, but not image description.
What generative AI model works with images?
DALL-E
What happens as you continue to train a transformer model?
Each new token in the training text is added to the vocabulary with appropriate token IDs
Transformer model architecture consists of two components, or blocks. What are they?
1. Encoder block - creates semantic representations of training vocabulary 2. A decoder block - generates new language sequences
What 2 Azure OpenAI model would you use to generate natural language or code?
1. GPT-4 (latest) 2. GPT-3.5
What are some other Generative AI workloads that OpenAI supports?
1. Generating natural language 2. Generating code 3. Generating images
What's an example of a code prompt to generate a code response?
"Write a for loop counting from 1 to 10 in Python"
What are two ways to test diverse selection of input prompts? (generative AI)
1. Automated Testing 2. Manual testing
OpenAI GPT models are proficient in over a dozen (coding) languages. What are some examples?
1. C# 2. JavaScript 3. Perl 4. PHP 5. Python (most capable)
What are some natural language tasks that GPT models are great at completing?
1. Classifying text 2. Summarizing text 3. Translation 4. Generating names or phrases 5. Suggesting content 6. Answering questions
What do GPT models empower developers to do?
1. Code faster 2. Understand new coding languages 3. Focus on solving bigger problems in their applications
What are some examples of natural language generation that an LLM can perform?
1. Determining sentiment or classifying NLP text 2. Summarizing text 3. Comparing text sources for semantic similarity 4. Generating new natural language
What can styles be used for when it comes to image generation?
1. Edits 2. Variations
Generative AI image capabilities generally fall into 3 categories. What are those 3 categories?
1. Image creation 2. Editing an image 3. Creating variations of an image
What is the goal of measuring potential harms? (generative AI)
1. Initial baseline that quantifies harms in given scenarios 2. Track improvements against baseline
How might one prioritize identified harms? By assessing the following 2 condition:
1. Likelihood of occurrence (ex. inaccurate cooking times resulting in illness) 2. Impact if it does occur (ex. death resulting from illegal use)
What are the 4 layers where harm mitigation should / can occur? (generative AI)
1. Model 2. Safety system 3. Metaprompt and grounding 4. User experience
What are some mitigation techniques you can apply at the model layer to mitigate harms caused by a generative AI solution?
1. Model selection 2. Fine-tuning a model with your own training data
Azure OpenAI Service is consists of these 4 components:
1. Pre-trained generative AI models 2. Customization capabilities (fine-tune) 3. Built-in tools to detect or mitigate harmful use cases 4. Enterprise-grade security with role-based access control (RBAC) and private networks
What are the key elements of a transformer model?
1. Tokenization 2. Embeddings 3. Attention
How are specific categories for the elements of the vectors in a language model determined?
A. During training B. Based on how commonly words are used together or in similar context
What is an encoder block?
A. One of two components of a transformer model architecture B. Creates semantic representations of the training vocabulary
What is one action Microsoft takes to support ethical AI practices in Azure OpenAI? A. Provides Transparency Notes that share how technology is built and asks users to consider its implications. B. Logs users out of Azure OpenAI Studio after a period of inactivity to ensure it's only used by one user. C. Allows users to build any application, regardless of harmful effects, to ensure fairness.
A. Provides Transparency Notes that share how technology is built and asks users to consider its implications.
What makes the GPt-4 model so powerful?
A. Sheer volume of data with which it has been trained B. Complexity of the network
In a decoder block, how is the attention technique used to predict the next token in a sequence?
A. Takes into account the sequence of tokens leading up to that point B. Considers which of the tokens are most influential Ex. I heard a dog [bark]
What are 2 capabilities of Azure OpenAI's natural language models?
A. able to take in natural language B. able to generate responses
What is a content filter?
Applies criteria to suppress prompts and responses based on classification of content into: - 4 severity levels - 4 categories of harm
What is a technique used to examine a sequence of text tokens and try to quantify the strength of the relationships between them?
Attention
During training, what is used to calculate a possible vector for the next token based on numeric weights assigned to each token in the sequence so far?
Attention score
What is the Codex model series a descendent of?
Azure OpenAI Service's GPT-3 series
What's an example of a safety system layer configuration or capability? (generative AI)
Azure OpenAI's content filters
What is an example of a well-known LLM that uses ONLY the encoder block?
BERT to support Google's search engine
What happens first when an attention technique is applied in a decoder?
Positional encoding layer adds value to each embedding to indicate its position in a sequence
What technique is used to examine a sequence of text tokens and determine how other tokens around one particular token influence that token's meaning?
Self-attention
During training, what is known? (Attention layer)
The actual sequence of tokens is known The tokens that come later in the sequence are masked
What is the first step in training a transformer model?
Tokenization - Decompose training text into tokens
In an encoder block, how is the attention technique used to examine each token in context and determine an appropriate encoding for its vector embedding?
Vector values based on relationship between the token and other tokens As a result, the same word might have multiple meanings. Ex. "bark of a tree" vs. "a dog bark"
What are some code generation capabilities supported by Azure OpenAI Service?
generate, edit, and explain code
Once the plugin is installed and enabled, how long until you can start writing code?
immediately