Model Monitoring and Debugging (subset of ML Ops)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

#layers #sop what are the standard "helper functions" you always need to make to text your model?

- all of the helper functions required to test your model quantitatively (data splits, training, plotting the loss)

#sop #startsmall You use an iterative approach to creating your model monitoring metrics - what is the first iteration when you get a new data set?

1. pick a small, simple, and fast model that you've done in the past that can help with the new data set/is relevant to the new data set 2. Then make a helper function to evaluate the model qualitatively on the new data set

#sop you are setting up a new model, in addition to making training metrics what are the two other "pre-steps"?

1. setting up a virtual environment 2. installing dependencies (refers to the process of installing the required software libraries, packages, and modules that the model relies on to function properly)

#troubleshooting You have a problem, and you know the main challenges in ML, what are they? How do they help you troubleshoot?

CHALLENGES lack of data poor data quality nonrepresentative data uninformative features excessively simple models that underfit the training data excessively complex models that overfit the data HOW TO OVERCOME Keep meticulous records of what you have done Keep versions from past data sets, etc if you can to help you find your way to "less terrible" Try creative ways to get better data Try more/different models - do lit review to see which ones have "worked" for others on similar data

#whatelse #data #shuffling What are the limits to how much help "deterministic shuffling" helps?

CHALLENGES: parallelism, floating-point arithmetic inconsistencies across platforms, and non-deterministic operations in some layers = limits how much your data shuffling "helps STIL DO IT: controlling factors like data shuffling is a step in the right direction.

#metrics #algorithms What is HSNW?

DEFINITION "Hierarchical Navigable Small World," it's an ML algo used for approximate nearest neighbor search, which is a problem where given a set of points and a query point, you want to find the nearest points to the query. HOW IT WORKS The core idea behind the algorithm is - create a hierarchical graph -- where each level of the hierarchy captures the structure of the dataset at a different scale - Then, it can quickly navigate to the region of interest in the high-dimensional space - and a more detailed search within that "region"

#REQUIREMENT You shuffle data in a "deterministic manner" -what does this mean? How do you do it? Give an example of use in a GAN

DEFINITION data shuffled the same way everytime during training WHY IMPORTANT: can be helpful for debugging or ensuring reproducibility Scenario: Imagine you're training a GAN to generate realistic images of faces. GANs consist of two networks: a generator and a discriminator. During training, the generator tries to produce images that the discriminator can't distinguish from real images, while the discriminator tries to get better at distinguishing real images from fake ones. Problem: At some point during training, you notice that the generated images start to degrade in quality, or maybe some artifacts begin to appear in the generated faces. Debugging with Deterministic Behavior: Reproducibility: To understand the issue, you'd like to reproduce the training exactly as it happened before. If your pipeline (including data shuffling, model initialization, etc.) is deterministic, you can run the training again and observe the problem at the exact same point in the process. This makes it much easier to narrow down potential causes. Isolating the Issue: Once you can consistently reproduce the problem, you might hypothesize that it's due to a particular layer in the generator network or maybe a certain configuration. With deterministic behavior, you can make isolated changes and see their exact impact on the outcome. Comparison: If you're experimenting with different variations of GAN architectures or training strategies, deterministic behavior ensures that any differences in results between experiments are due to the changes you made and not random variations. How Deterministic Shuffling Helps: The order in which the generator sees training samples can influence how it learns. If the shuffling is random each time, one run might degrade in quality because of a specific sequence of samples, while another run might not show the issue at all. With deterministic shuffling, the order of samples remains consistent across runs, ensuring that the training dynamics remain the same.

#basics Define Latency

DEFINITION the time taken to process an input and produce an output High = Bad (takes longer to produce a result) WHY? Real-time applications are the point of generative AI models, latency = bad

#fine-tuning What are challenges to enable a model with continuous learning? COME BAcK TO THIS ONE

Data Quality: Noise: Real-time feedback can often be noisy, especially if it's coming from user interactions or automated processes that haven't been rigorously validated. Bias: The feedback might be biased based on the subset of users providing it or the specific conditions under which it's collected. Computational Challenges: Processing Overhead: Online learning requires computational resources that might not be available, especially on edge devices. Memory Constraints: Storing data for real-time fine-tuning can be a challenge, especially if there are memory constraints. Model Stability: Catastrophic Forgetting: When a model is fine-tuned continuously, it might forget previous knowledge, especially if the new data is significantly different or if it's updated too aggressively. Overfitting: The model might overfit to recent feedback and lose its generalization ability. Model Update Frequency: Latency Issues: Depending on the model architecture and the amount of data, fine-tuning can introduce latency. Deciding Update Frequency: It's challenging to determine how frequently the model should be updated. Too frequent updates can lead to instability, while infrequent updates might not capture the benefits of real-time feedback. Security and Privacy: Sensitive Data: Real-time feedback can sometimes include sensitive or personal information. Proper care must be taken to ensure user data privacy. Adversarial Attacks: Real-time feedback mechanisms can be susceptible to adversarial attacks, where malicious actors provide misleading feedback to degrade model performance. Infrastructure Challenges: Deployment: Continuously updating models might require a robust deployment mechanism to ensure that updated models are seamlessly integrated without causing downtimes. Versioning: Keeping track of various model versions and ensuring the right version is used for the right task becomes crucial. Evaluation: Validation: Since the model is continuously updating, having a robust validation mechanism to track its performance becomes challenging. Feedback Loop: A faulty feedback loop can lead to model degradation. If the model's predictions influence the feedback and the feedback is used for training, it can create an echo chamber o

#layers #sop What do you need to use to make sure you see your layers are doing what they should be doing?

Make a test to see that your layer is right. For LLMs, 2 EXAMPLES 1. the RoPE embeddings have a specific property that you can test for 2. For Transformers, you can test that the attention is working by looking at the attention map

#troubleshooting Your model is producing nothing like what you expected, how do you start to fix that?

The model is likely overfitting the training data (or we got extremely lucky on the training data) HOW TO FIX - Getting more data - Simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model) - Reducing the noise in the training data

What are helper functions in the context of gen AI models?

These are required to test your model quantitatively EXAMPLES data splits training plotting the loss

#layers #sop. What does a layer test do? Give examples for layer tests for RoPE and Attention type layers

This is an SOP - you need to have tests in each layer to make sure they are doing as they should: 1. RoPE (Relative Positional Encodings): One property you can test for is whether the RoPE embeddings are correctly capturing the relative distances between tokens EXAMPLE import torch import torch.nn as nn # RoPE Embedding class RoPEEmbedding(nn.Module): def __init__(self, d_model, max_length): super(RoPEEmbedding, self).__init__() self.d_model = d_model self.max_length = max_length self.rope = self._generate_rope(max_length, d_model) def _generate_rope(self, max_length, d_model): # Generate RoPE embeddings rope = torch.zeros(max_length, d_model) for pos in range(max_length): for i in range(0, d_model, 2): rope[pos, i] = torch.sin(pos / (10000 ** ((2 * i) / d_model))) rope[pos, i + 1] = torch.cos(pos / (10000 ** ((2 * i) / d_model))) return rope def forward(self, x): # Add RoPE embeddings to input return x + self.rope[:x.size(1), :] # Testing RoPE embeddings d_model = 512 max_length = 100 rope_emb = RoPEEmbedding(d_model, max_length) # Create input tensor input_tensor = torch.randn(2, max_length, d_model) # Apply RoPE embeddings output_tensor = rope_emb(input_tensor) print(output_tensor.shape) OUTPUT (Shows the tensor shape's statistics) #In this example, we define a RoPEEmbedding module that generates RoPE embeddings based on the specified maximum sequence length and the dimension of the model. The `_generate_rope` function generates the RoPE embeddings based on the provided formula. To test the RoPE embeddings, we create an input tensor with a batch size of 2, sequence length of `max_length`, and model dimension of `d_model`. Then, we pass the input tensor through the RoPEEmbedding module, and the RoPE embeddings are added to the input tensor. Finally, we print the shape of the output tensor to verify that it matches the expected shape. 2. Transformer Attention: For the Transformer model, you can test whether the attention mechanism is working correctly by examining the attention map. The attention map represents how each token attends to other tokens within

#debugging what is "assert?"

This statement is used to check if a given condition is true; if the condition evaluates to false, it raises an AssertionError with an error message EXAMPLE # Checking if a variable is positive x = 5 assert x > 0, "x should be positive" # Checking if two variables have the same shape arr1 = np.zeros((3, 4)) arr2 = np.ones((3, 4)) assert arr1.shape == arr2.shape, "Arrays should have the same shape" OUTPUT (Returns either yes or an AssertionError) In this example, the first assert statement checks if the variable `x` is positive. If it's not, it raises an AssertionError with the specified error message. The second assert statement verifies whether `arr1` and `arr2` have the same shape. If their shapes are not equal, it raises an AssertionError.

#metrics F1 Score: what is this? When do you use it? How do you calculate it?

WHAT IS IT It is a metric to monitor (classification and other) models DEFINITION this metric calculates the harmonic mean of precision and recall, giving a balanced measure between the two. HOW DOES IT WORK Takes two variables: 1. Precision: Of the instances the model predicted as positive, how many were actually positive? 2. Recall (or Sensitivity): Of all the actual positive instances, how many were correctly predicted by the model? 3. Calculates that way (see image)

#debugging what is the plt.imshow function?

___ function is part of the matplotlib library and is used to display an image or array as a visual representation. It is commonly used to visualize images generated by generative AI models EXAMPLE import matplotlib.pyplot as plt # Creating an image array image = np.random.rand(32, 32) # Displaying the image plt.imshow(image, cmap='gray') plt.show() OUTPUT (would be the plot) # In this example, we create an image with dimensions 32x32 using random values. The `plt.imshow` function is then used to display the image using a gray color map. Finally, `plt.show` is called to render the image on the screen.

#debugging Define the "shape" function

___ obtains the dimensions or shape of a given array or tensor; returns a tuple representing the size of each dimension of the array EXAMPLE import numpy as np # Creating a numpy array arr = np.array([[1, 2, 3], [4, 5, 6]]) # Getting the shape of the array arr_shape = arr.shape print(arr_shape) OUTPUT (2, 3) #the `shape` function returns the shape of the array `arr`, which is `(2, 3)` indicating that it has 2 rows and 3 columns

#monitoring #finetuning What does it meant to do continuous learning for your model?

it means incorporating a mechanism to fine-tune your model on-the-fly with real-time feedback

#debugging Which functions are commonly used in the context of monitoring and debugging generative AI models

shape assert plt.imshow

Voir tous les ensembles d'études

Ensembles d'études connexes

PED 101 final

Pre and post us colonalisim

Physics Chapter 7

Chapter 51 Assessment and Management of Patients with Diabetes

pretest

OBHR 4310 - Exam 1

Aviation Insurance and Risk Management

Module 23: The Concept of Cognition, 23.A Alzheimer Disease, 23.C Schizophrenia, Schizophrenia - Exemplar 23.C, Schizophrenia/ Alzheimer's Exam 1, 23 A, Alzheimer's for 211 (for test 2), Mental Health practice Questions, 24.2 Pretest Confusion: Delir...

Model Monitoring and Debugging (subset of ML Ops)

Ensembles d'études connexes

PED 101 final

Pre and post us colonalisim

Physics Chapter 7

Chapter 51 Assessment and Management of Patients with Diabetes

pretest

OBHR 4310 - Exam 1

Aviation Insurance and Risk Management

Module 23: The Concept of Cognition, 23.A Alzheimer Disease, 23.C Schizophrenia, Schizophrenia - Exemplar 23.C, Schizophrenia/ Alzheimer's Exam 1, 23 A, Alzheimer's for 211 (for test 2), Mental Health practice Questions, 24.2 Pretest Confusion: Delir...

H11- Joints of the Skull (anatomy)

Mammals - Characteristics, Origin & Evolution

Operations Management final

MGMT 1A Ch. 6

Sports Management Chapter 6

Hurricanes Midterm #3

Chapter 12: multiple choice

Econ Test 2

Ch 13

ALS- Pre assessment Post Cardiac Arrest

Policy Provisions and Contract Law

Ch. 14 Computer Science 201: Data Structures & Algorithms