OCI Generation AI Foundations

¡Supera tus tareas y exámenes ahora con Quizwiz!

What does accuracy measure in the context of fine-tuning results for a generative model?

How many predictions the model made correctly out of all the predictions in an evaluation

What do embeddings in Large Language Models (LLMs) represent?

The semantic content of data in high-dimensional vectors

Prompt Engineering

The process of iteratively refining a prompt to elicit a style of response

What is the role of temperature in the decoding process of a Large Language Model (LLM)?

To adjust the sharpness of probability distribution over vocabulary when selecting the next word

Prompt Injection

To deliberately provide an LLM with a prompt intended to cause harm.

How can the concept of "Groundedness" differ from "Answer Relevance" in the context of Retrieval Augmented Generation (RAG)?

Groundedness pertains to factual correctness, whereas Answer Relevance concerns query relevance.

What are vector databases designed for?

Handling high-dimensional data efficiently. Often used in applications that involve machine learning models and similarity search.

Characteristics of Llama?

Highly performant, Open source model. Model parameters: 70B; context window: 4096 Tokens

Characteristics of Cohere Command?

Highly performant, instructional following conversational model. Model parameters: 52B; context window: 4096 Tokens

True or False: String Prompts can only support a single variable at a time.

False. String prompts can handle multiple variables at a time, or none at all.

T-Few Fine Tuning is an additive to what type of Fine-Tuning method?

Few Shot Parameter Efficient Fine Tuning

If you need more instruction following, or teach the LLM how to act it is best to...

Fine Tuning

Fine-Tuning requires training the __________ model leading to ___________ Computational costs.

Fine-Tuning requires training the enture model leading to increased computational costs.

Which LangChain component is responsible for generating the linguistic output in a chatbot system?

LLMs

What does a cosine distance of 0 indicate about the relationship between two embeddings?

A cosine distance of 0 between two embeddings indicates that they are perfectly similar in terms of orientation; in other words, they are pointing in the same direction in the vector space. Cosine distance is usually calculated as 1 minus the cosine similarity.

What is a diffusion model?

A deep neural network that holds latent variables capable of learning the structure of a given image by removing its blur (i.e., noise). After a model's network is trained to "know" the concept abstraction behind an image, it can create new variations of that image.

Model Endpoint

A designated point on a dedicated AI cluster where a LLM can accept requests and send back responses

What is an RDMA super cluster?

A direct memory access from the memory of one computer into that of another without involving either one's operating system.

Base Models with Text Generation

(1) Cohere (2) Cohere Light (3) llama2

Cluster types

(1) Fine-Tuning - for training (2) hosting - hosting an end point for inference

What are the two types of RAG techniques?

(1) RAG Sequence (2) RAG Tokens

Fine Tuning Benefits

(1) more effective mechanism of improving model performance then prompt engineering (2) customizing data to domain-specific data creates more contextually relevant responses (3) reduce the number of tokens needed for your model

How many end points can a Hosting Dedicated AI Cluster have?

50

What is LangChain?

A Python library for building applications with Large Language Models

What are the pros of a RAG system?

Access the latest data, grounds the results.

When does a chain typically interact with memory in a run within the LangChain framework?

After user input but before chain execution, and again after core logic but before output

How does OCI ensure security on a Dedicated AI Cluster?

All GPUs isolated from other GPUs on the dedicated RDMA network

What is the Chain-of-Thought technique?

An LLM emits reasoning steps as a part of its prompts.

What is the difference between a LLM without a RAG and one with a RAG.

An LLM with a RAG uses an external database which is a vector database.

How are documents usually evaluated in the simplest form of keyword-based search?

Based on the presence and frequency of the user-provided keywords

Why is it challenging to apply diffusion models to text generation?

Because text representation is categorical unlike images

In the context of generating text with a Large Language Model (LLM), what does the process of greedy decoding entail?

Choosing the word with the highest probability at each step of decoding

Base Models with Summarization capability

Cohere.Command

In-Context Learning

Conditioning an LLM with instructions and or demonstrations of the task it is meant to complete.

What does in-context learning in Large Language Models involve?

Conditioning the model with task-specific instructions or demonstrations

A RAG Sequence....

Considers all of the documents together and constructs a single coherent response

Cohere, Llama ang GPT4 are examples of?

Decoders

True or False: StreamlitChatMessageHistory can be used in any type of LLM application.

False

Bert and Embed-light are examples of?

Encoders

Few Shot (K Shot) Prompting

Explicitly providing K examples of the intended ask in the prompt

Which are the main differences between PEFT and Fine-Tuning in terms of the number of parameters modified and the type of data used?

Fine-tuning modifies all parameters using labeled, task-specific data, whereas Parameter Efficient Fine-Tuning updates a few, new parameters also with labeled, task-specific data.

What does the RAG Sequence model do in the context of generating a response?

For each input query, it retrieves a set of relevant documents and considers them together to generate a cohesive response.

What does the Ranker do in a text generation system?

It evaluates and prioritizes the information retrieved by the Retriever.

What differentiates Semantic search from traditional keyword search?

It involves understanding the intent and context of the search.

When to use few shot prompting?

LLM already understands topics that are necessary for the text generations

When to use fine-tuning?

LLM does not perform a task well. Data required to adapt the LLM is too large for prompt engineering. Latency is too high.

How does the structure of vector databases differ from traditional relational databases?

It is based on distances and similarities in a vector space.

How does a presence penalty function in language model generation?

It penalizes a token each time it appears after the first occurrence.

Dedicated AI Clusters Available in OCI

Large Cohere - hosting and fine tuning Small Cohere - hosting and fine tuning Embed Cohere - Hosting Llama2 - hosting

What does the Loss metric indicate about a model's predictions?

Loss is a measure that indicates how wrong the model's predictions are.

What are the minimum unit hours required for a fine-tuning cluster that will be active for 10 days?

Minimum units is 2 is required for a fine-tuning cluster. 10 times 24 hours times 2 = 480 unit hours.

Top P

Model selects the next token based on the sum of the probabilities.

Top K

Model selects the next token from the top K tokens in its list.

What are the cons of a RAG system?

More complex setup, requires a compatible data source.

non-deterministic decoding

Pick randomly among high probability candidates at each step.

Hosting Dedicated AI Cluster Unit Consumption

One Unit

How do vector databases differ from traditional relational databases?

Optimized for operations like nearest neighbor search in high-dimensional space

Pros and Cons of fine Tuning?

PROS: Increase in model in model performance on a specific tasks. No impact on model latency. Requires labeled data which can be expensive and time consuming to acquire.

Parameter-Efficient Fine-Tuning involves updating a _____________ set of parameters ____________ computational costs.

Parameter-Efficient Fine-Tuning involves updating a small subset of parameters decreasing computational costs.

When customizing an LLM it is best to start with....

Prompt engineering as it is easiest to start with; test and learn quickly.

What do prompt templates use for templating in language model applications?

Prompt templates in language model applications typically use Python's str.format syntax or similar templating mechanisms. This is because str.format provides a straightforward way to interpolate variables or dynamic content into a string, which is very useful for generating prompts where certain pieces of information need to be inserted into a pre-defined text structure. It allows the developer to define a template with placeholders that can be replaced by actual values at run time, providing a flexible way to generate dynamic prompts based on user input or other data.

Pros and Cons of few shot prompting?

Pros: very simple, no training costs. Cons: Adds latency to each model reuqest.

If you need to optimize context in an LLM it is best to use...

RAG

Inference

Refers to the process of using a trained ML model to make predictions or decisions based on new input

What are some examples or knowledge bases or a corpus of information that retrievers get information from?

Retrievers are generally used to retrieve relevant information from a corpus of data, such as a knowledge base or the Internet.

Characteristics of Cohere Command-light?

Smaller, faster, not as capable as cohere. Model parameters: 6B; context window: 4096 Tokens

Vector databases improve accuracy by improving what type of relationships.

Symantec Relationships that capture the meaning and context of words.

Fine-Tuning

Take a pre-trained model and use labeled data for a specific task and train the model to perform the task by altering all of its parameters.

Parameter Efficient Fine Tuning (PEFT)

Takes a pre-trained model and uses labeled data for a specific task and trains the model to perform the task by altering all of a small set of parameters or new ones.

What are the minimum unit hours required for a hosting cluster?

The minimum hosting commitment is 744 unit hours.

Soft Prompting

This is a training a model that adds parameters to a prompt in order to queue it to complete a specific task. This training model uses labeled data to train.

What is the T-Few fine-tuning method?

This method is characterized by selectively updating a fraction of the model's weights. This approach is a form of parameter-efficient fine-tuning that aims to fine-tune large models without updating all of the weights, thus saving on computational resources and time.

What is the function of the Generator in a text generation system?

To generate human-like text using the information retrieved and ranked, along with the user's original query

What is the purpose of Retrieval Augmented Generation (RAG) in text generation?

To generate text using extra information obtained from an external data source

In the simplified workflow for managing and querying vector data, what is the role of indexing?

To map vectors to a data structure for faster searching, enabling efficient retrieval

What is the purpose of Retrievers in LangChain?

To retrieve relevant information from knowledge bases.

A given StreamlitChatMessageHistory will not be shared across user sessions.

True

True or False: A given StreamlitChatMessageHistory will NOT be persisted.

True

True or False: A single dedicated AI cluster can be used to to train multiple models

True

True or False: StreamlitChatMessageHistory will store messages in Streamlit session state at the specified key.

True

Fine Tuning Dedicated AI Cluster Unit Consumption

Two Units

Which method requires more training time: vanilla or t-few fine-tuning method?

Vanilla fine-tuning requires more training time then t-few fine-tuning method.

In which scenario is soft prompting appropriate compared to other training styles?

When there is a need to add learnable parameters to a Large Language Model (LLM) without task-specific training

In which scenario is soft prompting appropriate compared to other training styles?

When there is a need to add learnable parameters to a Large Language Model (LLM) without task-specific training.

When is soft-prompting appropriate?

When there is a need to add learnable parameters to an LLM without task specific training.

Domain Adaptation

adapting a model to enhance its performance outside of the domain/subject-area it was trained on

Base Models with embedding capability

cohere.embed

A RAG Token...

considers each part of the response and collects documents and then constructs the response incrementally.

In-Context Learning

constructing a prompt that has demonstrations of the task that the model is meant to complete.

Dedicated AI clusters GPUs _____________ host your custom models

exclusively

Groundedness pertains to...

factual correctness

Increasing temperature...

flattens the distribution, allowing for more varied word choices.

Hallucination

generated text that is non-factual or ungrounded

Decoder

models are designed to decode or generate text.

Decreasing temperature...

peaks distribution, allowing for less varied responses.

Answer Relevance concerns...

query relevance

least to most prompting

solve simpler problems first and use the solutions to the simple problems to solve more difficult problems.

Embedding

the process of converting a sequence of words into a single vector or a sequence of vectors.

Encoder

to encode text and produce embeddings.

Prompting alone maybe inappropriate when

training data exists, or domain adaptation is required.


Conjuntos de estudio relacionados

Failed Practice Permit Test Questions

View Set

The Baroque in Italy and Spain- BU Chap 24

View Set

Full Blast Plus 6 for Ukraine Hello Module p.4 Months, days, seasons

View Set