LLM Quiz Questions

¡Supera tus tareas y exámenes ahora con Quizwiz!

"Prompt Tuning is a technique used to adjust all hyperparameters of a language model." Is this true or false? True False

False Correct Prompt Tuning focuses on optimizing the prompts given to the model using trainable tokens that don't correspond directly to human language. The number of tokens you choose to train, however, would be a hyperparameter of your training process.

"Smaller LLMs can struggle with one-shot and few-shot inference:" Is this true or false? True False

True Correct Even when you include a couple of examples, smaller models might still struggle to learn the new task through examples.

"Parameter Efficient Fine-Tuning (PEFT) updates only a small subset of parameters. This helps prevent catastrophic forgetting." True or False? True False

True Performing full-finetuning can lead to catastrophic forgetting because it changes all parameters on the model. Since PEFT only updates a small subset of parameters, it's more robust against this catastrophic forgetting effect.

Which transformer-based model architecture has the objective of guessing a masked token based on the previous sequence of tokens by building bidirectional representations of the input sequence. a. Autoregressive b. Sequence-to-sequence c. Autoencoder

c. Autoencoder

Which of the following best describes how LoRA works? a. LoRA freezes all weights in the original model layers and introduces new components which are trained on new data. b. LoRA continues the original pre-training objective on new data to update the weights of the original model. c. LoRA decomposes weights into two smaller rank matrices and trains those instead of the full model weights. d. LoRA trains a smaller, distilled version of the pre-trained LLM to reduce model size

c. LoRA decomposes weights into two smaller rank matrices and trains those instead of the full model weights. Correct LoRA represents large weight matrices as two smaller, rank decomposition matrices, and trains those instead of the full weights. The product of these smaller matrices is then added to the original weights for inference.

Which configuration parameter for inference can be adjusted to either increase or decrease randomness within the model output layer? a. Max new tokens b. Top-k sampling c. Temperature d. Number of beams & beam search

c. Temperature Temperature is used to affect the randomness of the output of the softmax layer. A lower temperature results in reduced variability while a higher temperature results in increased randomness of the output.

Large Language Models (LLMs) are capable of performing multiple tasks supporting a variety of use cases. Which of the following tasks supports the use case of converting code comments into executable code? a. Information Retrieval b. Invoke actions from text c. Translation d. Text summarization

c. Translation

Fill in the blanks: __________ involves using many prompt-completion examples as the labeled training dataset to continue training the model by updating its weights. This is different from _________ where you provide prompt-completion examples during inference. a. In-context learning, Instruction fine-tuning b. Pre-training, Instruction fine-tuning c. Prompt engineering, Pre-training d. Instruction fine-tuning, In-context learning

d. Instruction fine-tuning, In-context learning

Which in-context learning method involves creating an initial prompt that states the task to be completed and includes a single example question with answer followed by a second question to be answered by the LLM? a. Hot shot b. Zero shot c. Few shot d. One shot

d. One Shot One shot inference involves providing an example question with answer followed by a second question to be answered by the LLM. Few shot inference provides multiple example prompts and answers while zero shot provides only one prompt to be answered by the LLM.

What is the purpose of fine-tuning with prompt datasets? a. To eliminate the need for instructions and prompts in training a language model. b. To decrease the accuracy of a pre-trained language model by introducing new prompts. c. To increase the computational resources required for training a language model. d. To improve the performance and adaptability of a pre-trained language model for specific tasks.

d. To improve the performance and adaptability of a pre-trained language model for specific tasks. Correct This option accurately describes the purpose of fine-tuning with prompt datasets. It aims to improve the performance and adaptability of a pre-trained language model by training it on specific tasks using instruction prompts.

Interacting with Large Language Models (LLMs) differs from traditional machine learning models. Working with LLMs involves natural language input, known as a _____, resulting in output from the Large Language Model, known as the ______ . Choose the answer that correctly fill in the blanks. a. tunable request, completion b. prediction request, prediction response c. prompt, fine-tuned LLM d. prompt, completion

d. prompt, completion

"RNNs are better than Transformers for generative AI Tasks." Is this true or false? True False

False

Do we always need to increase the model size to improve its performance? True False

False

"You can combine data parallelism with model parallelism to train LLMs." Is this true or false? True False

True

"PEFT methods can reduce the memory needed for fine-tuning dramatically, sometimes to just 12-20% of the memory needed for full fine-tuning." Is this true or false? True False

True Correct By training a smaller number parameters, whether through selecting a subset of model layers to train, adding new, small components to the model architecture, or through the inclusion of soft prompts, the amount of memory needed for training is reduced compared to full fine-tuning.

Which of the following statements about pretraining scaling laws are correct? Select all that apply: a. To scale our model, we need to jointly increase dataset size and model size, or they can become a bottleneck for each other. b. There is a relationship between model size (in number of parameters) and the optimal number of tokens to train the model with. c. When measuring compute budget, we can use "PetaFlops per second-Day" as a metric. d. You should always follow the recommended number of tokens, based on the chinchilla laws, to train your model.

a, b & c a. To scale our model, we need to jointly increase dataset size and model size, or they can become a bottleneck for each other. Correct For instance, while increasing dataset size is helpful, if we do not jointly improve the model size, it might not be able to capture value from the larger dataset. b. There is a relationship between model size (in number of parameters) and the optimal number of tokens to train the model with. Correct This relationship is describe in the Chinchilla paper, that shows that many models might even be overparametrized according to the relationship they found. c. When measuring compute budget, we can use "PetaFlops per second-Day" as a metric. Correct Petaflops per second-day is a useful measure for computing budget as it reflects the both hardware and time required to train the model.

What is the self-attention that powers the transformer architecture? a. A mechanism that allows a model to focus on different parts of the input sequence during computation. b. A measure of how well a model can understand and generate human-like language. c. The ability of the transformer to analyze its own performance and make adjustments accordingly. d. A technique used to improve the generalization capabilities of a model by training it on diverse datasets.

a. A mechanism that allows a model to focus on different parts of the input sequence during computation.

What is a soft prompt in the context of LLMs (Large Language Models)? a. A set of trainable tokens that are added to a prompt and whose values are updated during additional training to improve performance on specific tasks. b. A strict and explicit input text that serves as a starting point for the model's generation. c. A technique to limit the creativity of the model and enforce specific output patterns. d. A method to control the model's behavior by adjusting the learning rate during training.

a. A set of trainable tokens that are added to a prompt and whose values are updated during additional training to improve performance on specific tasks. Correct A soft prompt refers to a set of trainable tokens that are added to a prompt. Unlike the tokens that represent language, these tokens can take on any value within the embedding space. The token values may not be interpretable by humans, but are located in the embedding space close to words related to the language prompt or task to be completed.

Parameter Efficient Fine-Tuning (PEFT) methods specifically attempt to address some of the challenges of performing full fine-training. Which of the following options describe challenges that PEFT tries to overcome? a. Computational constraints b. Catastrophic forgetting c. Storage requirements d. Model performance

a. Computational constraints Correct Because most parameters are frozen, we typically only need to train 15%-20% of the original LLM weights, making the training process less expensive (less memory required) b. Catastrophic forgetting Correct With PEFT, most parameters of the LLM are unchanged, and that helps making it less prone to catastrophic forgetting. c. Storage requirements Correct With PEFT, we can change just a small amount of parameters when fine-tuning, so during inference you can combine the original model with the new parameters, instead of duplicating the entire model for each new task you want to perform fine-tuning.

Scaling laws for pre-training large language models consider several aspects to maximize performance of a model within a set of constraints and available scaling choices. Select all alternatives that should be considered for scaling when performing model pre-training? Select all: a. Compute budget: Compute constraints b. Batch size: Number of samples per iteration c. Model size: Number of parameters d. Dataset size: Number of tokens

a. Compute budget: Compute constraints c. Model size: Number of parameters d. Dataset size: Number of tokens

Which of the following statements about multi-task finetuning is correct? Select all that apply: a. FLAN-T5 was trained with multi-task finetuning. b. Multi-task finetuning requires separate models for each task being performed. c. Performing multi-task finetuning may lead to slower inference. d. Multi-task finetuning can help prevent catastrophic forgetting.

a. FLAN-T5 was trained with multi-task finetuning. Correct The FLAN family of models have been trained with multi-task instruction finetuning. d. Multi-task finetuning can help prevent catastrophic forgetting. Correct! However, remember that to prevent catastrophic forgetting it is important to fine-tune on multiple tasks with a lot of data.

Which of the following are Parameter Efficient Fine-Tuning (PEFT) methods? Select all that apply. a. Reparameterization b. Additive c. Subtractive d. Selective

a. Reparameterization Correct Reparameterization methods create a new low-rank transformation of the original network weights to train, decreasing the trainable parameter count while still working with high-dimensional matrices. LoRa is a common technique in this category. b. Additive Correct Additive methods freeze all of the original LLM weights and introduce new model components to fine-tune to a specific task. d. Selective Correct Selective methods is a category of PEFT that fine-tunes a subset of the original LLM parameters. It uses different approaches to identify which parameters to update.

Which evaluation metric below focuses on precision in matching generated output to the reference text and is used for text translation? a. ROUGE-1 b. BLEU c. HELM d. ROUGE-2

b. BLEU BLEU focuses on precision and text translation while Rouge focuses on text summarization.

Which of the following are true in respect to Catastrophic Forgetting? Select all that apply. a. Catastrophic forgetting only occurs in supervised learning tasks and is not a problem in unsupervised learning. b. Catastrophic forgetting occurs when a machine learning model forgets previously learned information as it learns new information. c. Catastrophic forgetting is a common problem in machine learning, especially in deep learning models. d. One way to mitigate catastrophic forgetting is by using regularization techniques to limit the amount of change that can be made to the weights of the model during training.

b. Catastrophic forgetting occurs when a machine learning model forgets previously learned information as it learns new information. Correct The assertion is true, and this process is especially problematic in sequential learning scenarios where the model is trained on multiple tasks over time. c. Catastrophic forgetting is a common problem in machine learning, especially in deep learning models. Correct This assertion is true because these models typically have many parameters, which can lead to overfitting and make it more difficult to retain previously learned information. d. One way to mitigate catastrophic forgetting is by using regularization techniques to limit the amount of change that can be made to the weights of the model during training. Correct One way to mitigate catastrophic forgetting is by using regularization techniques to limit the amount of change that can be made to the weights of the model during training. This can help to preserve the information learned during earlier training phases and prevent overfitting to the new data.

How can RLHF align the performance of large language models with human preferences? Select all that apply a. RLHF increases the model's size by adding new parameters that represent human preferences b. RLHF can help reduce model toxicity and misinformation c. RLHF can enhance the interpretability of generated text d. Inference is faster after RLHF, improving the user experience

b. RLHF can help reduce model toxicity and misinformation Correct The human feedback helps guiding the answers, if the human feedback rewards when toxicity and misinformation is avoided, it will learn reduce that. c. RLHF can enhance the interpretability of generated text Correct By involving human feedback, models can be tuned to provide explanations or insights into their decision-making processes, improving interpretability and allowing users to better understand the model's outputs.

Which of the following stages are part of the generative AI model lifecycle mentioned in the course? (Select all that apply) a. Performing regularization b. Selecting a candidate model and potentially pre-training a custom model. c. Manipulating the model to align with specific project needs. d. Defining the problem and identifying relevant datasets. e. Deploying the model into the infrastructure and integrating it with the application.

b. Selecting a candidate model and potentially pre-training a custom model. c. Manipulating the model to align with specific project needs. d. Defining the problem and identifying relevant datasets. e. Deploying the model into the infrastructure and integrating it with the application.

Which transformer-based model architecture is well-suited to the task of text translation? a. Autoregressive b. Sequence-to-sequence c. Autoencoder

b. Sequence-to-sequence

When using Reinforcement Learning with Human Feedback (RLHF) to align large language models with human preferences, what is the role of human labelers? a. To write prompts and completions from scratch that are used during fine-tuning with RLHF b. To score prompt completions, so that this score is used to train the reward model component of the RLHF process. c. To compare the original LLM completions to the RLHF updated model completions and ensure they don't diverge too much. d. To identify model weights that should be updated

b. To score prompt completions, so that this score is used to train the reward model component of the RLHF process. Correct In RLHF, human labelers score a dataset of completions by the original model based on alignment criteria like helpfulness, harmlessness, and honesty. This dataset is used to train the reward model that scores the model completions during the RLHF process.

Fine-tuning a model on a single task can improve model performance specifically on that task; however, it can also degrade the performance of other tasks as a side effect. This phenomenon is known as: a. Instruction bias b. Catastrophic loss c. Model toxicity d. Catastrophic forgetting

d. Catastrophic forgetting

Which of the following best describes the role of data parallelism in the context of training Large Language Models (LLMs) with GPUs? a. Data parallelism is used to increase the size of the training data by duplicating it across multiple GPUs. b. Data parallelism refers to a type of storage mechanism where data is stored across multiple GPUs. c. Data parallelism is a technique to reduce the model size so that it can fit into the memory of a single GPU. d. Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time. SkipSubmit

d. Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time. Data parallelism is a strategy that splits the training data across multiple GPUs. Each GPU processes a different subset of the data simultaneously, which can greatly speed up the overall training time.


Conjuntos de estudio relacionados

Use and Effects of Drugs Chapter 3 DHB

View Set

Animales para designar equipos en Quizlet

View Set

DoD Mandatory Controlled Unclassified Information (CUI) Training

View Set

Chapter 12, Section 4: British Imperialism in India

View Set