OCI Generation AI Foundations
What does accuracy measure in the context of fine-tuning results for a generative model?
How many predictions the model made correctly out of all the predictions in an evaluation
What do embeddings in Large Language Models (LLMs) represent?
The semantic content of data in high-dimensional vectors
Prompt Engineering
The process of iteratively refining a prompt to elicit a style of response
What is the role of temperature in the decoding process of a Large Language Model (LLM)?
To adjust the sharpness of probability distribution over vocabulary when selecting the next word
Prompt Injection
To deliberately provide an LLM with a prompt intended to cause harm.
How can the concept of "Groundedness" differ from "Answer Relevance" in the context of Retrieval Augmented Generation (RAG)?
Groundedness pertains to factual correctness, whereas Answer Relevance concerns query relevance.
What are vector databases designed for?
Handling high-dimensional data efficiently. Often used in applications that involve machine learning models and similarity search.
Characteristics of Llama?
Highly performant, Open source model. Model parameters: 70B; context window: 4096 Tokens
Characteristics of Cohere Command?
Highly performant, instructional following conversational model. Model parameters: 52B; context window: 4096 Tokens
True or False: String Prompts can only support a single variable at a time.
False. String prompts can handle multiple variables at a time, or none at all.
T-Few Fine Tuning is an additive to what type of Fine-Tuning method?
Few Shot Parameter Efficient Fine Tuning
If you need more instruction following, or teach the LLM how to act it is best to...
Fine Tuning
Fine-Tuning requires training the __________ model leading to ___________ Computational costs.
Fine-Tuning requires training the enture model leading to increased computational costs.
Which LangChain component is responsible for generating the linguistic output in a chatbot system?
LLMs
What does a cosine distance of 0 indicate about the relationship between two embeddings?
A cosine distance of 0 between two embeddings indicates that they are perfectly similar in terms of orientation; in other words, they are pointing in the same direction in the vector space. Cosine distance is usually calculated as 1 minus the cosine similarity.
What is a diffusion model?
A deep neural network that holds latent variables capable of learning the structure of a given image by removing its blur (i.e., noise). After a model's network is trained to "know" the concept abstraction behind an image, it can create new variations of that image.
Model Endpoint
A designated point on a dedicated AI cluster where a LLM can accept requests and send back responses
What is an RDMA super cluster?
A direct memory access from the memory of one computer into that of another without involving either one's operating system.
Base Models with Text Generation
(1) Cohere (2) Cohere Light (3) llama2
Cluster types
(1) Fine-Tuning - for training (2) hosting - hosting an end point for inference
What are the two types of RAG techniques?
(1) RAG Sequence (2) RAG Tokens
Fine Tuning Benefits
(1) more effective mechanism of improving model performance then prompt engineering (2) customizing data to domain-specific data creates more contextually relevant responses (3) reduce the number of tokens needed for your model
How many end points can a Hosting Dedicated AI Cluster have?
50
What is LangChain?
A Python library for building applications with Large Language Models
What are the pros of a RAG system?
Access the latest data, grounds the results.
When does a chain typically interact with memory in a run within the LangChain framework?
After user input but before chain execution, and again after core logic but before output
How does OCI ensure security on a Dedicated AI Cluster?
All GPUs isolated from other GPUs on the dedicated RDMA network
What is the Chain-of-Thought technique?
An LLM emits reasoning steps as a part of its prompts.
What is the difference between a LLM without a RAG and one with a RAG.
An LLM with a RAG uses an external database which is a vector database.
How are documents usually evaluated in the simplest form of keyword-based search?
Based on the presence and frequency of the user-provided keywords
Why is it challenging to apply diffusion models to text generation?
Because text representation is categorical unlike images
In the context of generating text with a Large Language Model (LLM), what does the process of greedy decoding entail?
Choosing the word with the highest probability at each step of decoding
Base Models with Summarization capability
Cohere.Command
In-Context Learning
Conditioning an LLM with instructions and or demonstrations of the task it is meant to complete.
What does in-context learning in Large Language Models involve?
Conditioning the model with task-specific instructions or demonstrations
A RAG Sequence....
Considers all of the documents together and constructs a single coherent response
Cohere, Llama ang GPT4 are examples of?
Decoders
True or False: StreamlitChatMessageHistory can be used in any type of LLM application.
False
Bert and Embed-light are examples of?
Encoders
Few Shot (K Shot) Prompting
Explicitly providing K examples of the intended ask in the prompt
Which are the main differences between PEFT and Fine-Tuning in terms of the number of parameters modified and the type of data used?
Fine-tuning modifies all parameters using labeled, task-specific data, whereas Parameter Efficient Fine-Tuning updates a few, new parameters also with labeled, task-specific data.
What does the RAG Sequence model do in the context of generating a response?
For each input query, it retrieves a set of relevant documents and considers them together to generate a cohesive response.
What does the Ranker do in a text generation system?
It evaluates and prioritizes the information retrieved by the Retriever.
What differentiates Semantic search from traditional keyword search?
It involves understanding the intent and context of the search.
When to use few shot prompting?
LLM already understands topics that are necessary for the text generations
When to use fine-tuning?
LLM does not perform a task well. Data required to adapt the LLM is too large for prompt engineering. Latency is too high.
How does the structure of vector databases differ from traditional relational databases?
It is based on distances and similarities in a vector space.
How does a presence penalty function in language model generation?
It penalizes a token each time it appears after the first occurrence.
Dedicated AI Clusters Available in OCI
Large Cohere - hosting and fine tuning Small Cohere - hosting and fine tuning Embed Cohere - Hosting Llama2 - hosting
What does the Loss metric indicate about a model's predictions?
Loss is a measure that indicates how wrong the model's predictions are.
What are the minimum unit hours required for a fine-tuning cluster that will be active for 10 days?
Minimum units is 2 is required for a fine-tuning cluster. 10 times 24 hours times 2 = 480 unit hours.
Top P
Model selects the next token based on the sum of the probabilities.
Top K
Model selects the next token from the top K tokens in its list.
What are the cons of a RAG system?
More complex setup, requires a compatible data source.
non-deterministic decoding
Pick randomly among high probability candidates at each step.
Hosting Dedicated AI Cluster Unit Consumption
One Unit
How do vector databases differ from traditional relational databases?
Optimized for operations like nearest neighbor search in high-dimensional space
Pros and Cons of fine Tuning?
PROS: Increase in model in model performance on a specific tasks. No impact on model latency. Requires labeled data which can be expensive and time consuming to acquire.
Parameter-Efficient Fine-Tuning involves updating a _____________ set of parameters ____________ computational costs.
Parameter-Efficient Fine-Tuning involves updating a small subset of parameters decreasing computational costs.
When customizing an LLM it is best to start with....
Prompt engineering as it is easiest to start with; test and learn quickly.
What do prompt templates use for templating in language model applications?
Prompt templates in language model applications typically use Python's str.format syntax or similar templating mechanisms. This is because str.format provides a straightforward way to interpolate variables or dynamic content into a string, which is very useful for generating prompts where certain pieces of information need to be inserted into a pre-defined text structure. It allows the developer to define a template with placeholders that can be replaced by actual values at run time, providing a flexible way to generate dynamic prompts based on user input or other data.
Pros and Cons of few shot prompting?
Pros: very simple, no training costs. Cons: Adds latency to each model reuqest.
If you need to optimize context in an LLM it is best to use...
RAG
Inference
Refers to the process of using a trained ML model to make predictions or decisions based on new input
What are some examples or knowledge bases or a corpus of information that retrievers get information from?
Retrievers are generally used to retrieve relevant information from a corpus of data, such as a knowledge base or the Internet.
Characteristics of Cohere Command-light?
Smaller, faster, not as capable as cohere. Model parameters: 6B; context window: 4096 Tokens
Vector databases improve accuracy by improving what type of relationships.
Symantec Relationships that capture the meaning and context of words.
Fine-Tuning
Take a pre-trained model and use labeled data for a specific task and train the model to perform the task by altering all of its parameters.
Parameter Efficient Fine Tuning (PEFT)
Takes a pre-trained model and uses labeled data for a specific task and trains the model to perform the task by altering all of a small set of parameters or new ones.
What are the minimum unit hours required for a hosting cluster?
The minimum hosting commitment is 744 unit hours.
Soft Prompting
This is a training a model that adds parameters to a prompt in order to queue it to complete a specific task. This training model uses labeled data to train.
What is the T-Few fine-tuning method?
This method is characterized by selectively updating a fraction of the model's weights. This approach is a form of parameter-efficient fine-tuning that aims to fine-tune large models without updating all of the weights, thus saving on computational resources and time.
What is the function of the Generator in a text generation system?
To generate human-like text using the information retrieved and ranked, along with the user's original query
What is the purpose of Retrieval Augmented Generation (RAG) in text generation?
To generate text using extra information obtained from an external data source
In the simplified workflow for managing and querying vector data, what is the role of indexing?
To map vectors to a data structure for faster searching, enabling efficient retrieval
What is the purpose of Retrievers in LangChain?
To retrieve relevant information from knowledge bases.
A given StreamlitChatMessageHistory will not be shared across user sessions.
True
True or False: A given StreamlitChatMessageHistory will NOT be persisted.
True
True or False: A single dedicated AI cluster can be used to to train multiple models
True
True or False: StreamlitChatMessageHistory will store messages in Streamlit session state at the specified key.
True
Fine Tuning Dedicated AI Cluster Unit Consumption
Two Units
Which method requires more training time: vanilla or t-few fine-tuning method?
Vanilla fine-tuning requires more training time then t-few fine-tuning method.
In which scenario is soft prompting appropriate compared to other training styles?
When there is a need to add learnable parameters to a Large Language Model (LLM) without task-specific training
In which scenario is soft prompting appropriate compared to other training styles?
When there is a need to add learnable parameters to a Large Language Model (LLM) without task-specific training.
When is soft-prompting appropriate?
When there is a need to add learnable parameters to an LLM without task specific training.
Domain Adaptation
adapting a model to enhance its performance outside of the domain/subject-area it was trained on
Base Models with embedding capability
cohere.embed
A RAG Token...
considers each part of the response and collects documents and then constructs the response incrementally.
In-Context Learning
constructing a prompt that has demonstrations of the task that the model is meant to complete.
Dedicated AI clusters GPUs _____________ host your custom models
exclusively
Groundedness pertains to...
factual correctness
Increasing temperature...
flattens the distribution, allowing for more varied word choices.
Hallucination
generated text that is non-factual or ungrounded
Decoder
models are designed to decode or generate text.
Decreasing temperature...
peaks distribution, allowing for less varied responses.
Answer Relevance concerns...
query relevance
least to most prompting
solve simpler problems first and use the solutions to the simple problems to solve more difficult problems.
Embedding
the process of converting a sequence of words into a single vector or a sequence of vectors.
Encoder
to encode text and produce embeddings.
Prompting alone maybe inappropriate when
training data exists, or domain adaptation is required.