Deep Learning Exam 1

Ace your homework & exams now with Quizwiz!

Federated coefficient=sum(coef*n/ntotal) Hosp 1: 23.1 - 37.9age + 165.7male + 60.3dose Hosp 2: 1698.4 -22.8age - 37.3male + 24.7dose Federated coef: intercept= 23.1*(122/358) + 1698.4*(236/358) = 1127.5 age= -37.9*(122/358) - 22.8*(236/358) = -27.9 male= 165.7*(122/358) - 37.3*(236/358)= 31.9 dose= 60.3*(122/358) + 24.7*(236/358)= 36.8 Federated model= 1127.5 - 27.9age + 31.9male + 36.8dose

Build a federated model and make a prediction on length of survival for the following patient: Female patient with age of 60 and dose_gy of 75.

p=last layer neuron/sum of last layer neurons cat: p1=.07/(.07 + 1.28 + 5.25)= .011 dog: p2=1.28/(.07 + 1.28 + 5.25)= .194 deer: p3=5.25/(.07 + 1.28 + 5.25)= .795

Calculate final probability values (p1, p2, and p3) for the following network.

Cons: -oftentimes, low complications on the user side limits the size of the model -sensitive info can still be revealed to third party or central server during federation -many algorithmic and technical challenges to address the heterogeneity in incoming models and end nodes -different number of updates from the nodes as well as time differences

Cons of federated learning

MSE evaluates how well the predictions for the continuous target variable matches the true data (ex: if the target variable is quality) Accuracy evaluates how well the model is able to classify the data (ex: if the classification is cat/dog)

Deep learning model validation- MSE vs Accuracy

NOTE: This is a ACCURACY vs epoch graph Line 1: Training accuracy (increases at high epoch) Line 2: Validation accuracy (decreases at high epoch) The overfitting line is past the intersection of the two lines. Once training accuracy is higher than validation accuracy, overfitting begins.

Draw a vertical line to mark the initial epoch where the overfitting starts to occur.

NOTE: This is a LOSS vs epoch graph Line 1: Validation loss (increases at high epoch) Line 2: Training Loss (decreases at high epoch) The overfitting line is past the intersection of the two lines. Once validation loss is higher than training loss, overfitting begins.

Draw a vertical line to mark the initial epoch where the overfitting starts to occur.

Piecewise regression does not have an on/off switch like a neural activation function does. The activation function can be fitted to non-linear patterns, like a spiral, which is why deep learning is optimal for complex data. With piecewise functions, activation cannot harmonize, and only one function can be activated at a time.

Explain why piecewise regression is not able to perform classification task shown below:

total loss= (1-correct) + total of incorrect probabilities total loss= sum of losses Loss is different between true value and probability. Correct label has true value of 1 and the other labels have true value of 0 cat: loss 1= .011 - 0 = .011 dog: loss 2= .194 - 0= .194 deer: loss 3= 1 - .795= .205 total loss= .011 + .194 + .205 = .410

Given that the correct label is deer, calculate loss values for the following network.

16 weights + 1 bias = 17 betas

How many beta(s) does the boxed in neuron have?

4 neurons in previous layer plus intercept (beta0) = 5 betas

How many betas does the boxed in neuron have?

Global features: more layers are needed, high polynomial -Example: face shape

Layers and global features

Local features: few layers are needed, low polynomial -Example: eyebrows

Layers and local features

MSE: Truth in the training data set on which your model is built PMSE: Truth in real life (truth in new incoming data) MSE= training accuracy/loss PMSE= validation accuracy/loss Best model has smallest PMSE

PMSE vs MSE

Pros: -improvement in model performance when training data quantity is limited -Parallelization of computing power -Complete decentralization -Centralized server can continuously improve various models (like voice or face recognition) without transferring data

Pros of federated learning

4-9 affect the flexibility of the model. Increasing the number of layers and the number of neurons increases the number of parameters. More parameters increases the flexibility of the model.

The following table has 10 hyperparameters you can adjust. List out the hyperparameters affecting the flexibility of your model and explain how they affect the flexibility.

N=2000 K=5 2000/5=400 in each group 1 group is testing and 4 groups are training 400 testing images 1600 training images TRUE

There are 1000 dog images and 1000 cat images in your data set. TRUE/FALSE: Model training using the K-fold validation method with K=5 should result in your model trained on 1600 images

N=2000 K=1000 Leave-one-out validation has N folds, so 2000 folds. K=1000 is not the same as K=2000. FALSE

There are 1000 dog images and 1000 cat images in your data set. TRUE/FALSE: Results from the K fold validation with K=1000 should be equivalent to ones from the leave-one-out validation

60/20/20 is train/test/validate 60% of 1000 cat images= 1000 x 0.6= 600 training cat images

There are 1000 images of dog and 1000 images of cat in your dataset. Given that you are using 60/20/20 for the splitting, on how many cat images will your model be trained?

N=2400 K=5 2400/5=480 in each group 1 group is testing and 4 groups are training 480 testing images 1920 training images FALSE

There are 1200 dog images and 1200 cat images in your data set. TRUE/FALSE: Model training using the K-fold validation method with K=5 should result in your model trained on 1800 images

N=2400 K=1200 Leave-one-out validation has N folds, so 2400 folds. K=1200 is not the same as K=2400. FALSE

There are 1200 dog images and 1200 cat images in your data set. TRUE/FALSE: Results from the K fold validation with K=1200 should be equivalent to ones from the leave-one-out validation

They are more flexible, easier to calculate, easier to interpret

What are the advantages of piecewise regression as compared to polynomial regression?

Pros: -Learns faster Cons: -May not get best parameters -Does not properly converge, or may not converge at all because drastic updates lead to divergent behaviors

What are the pros and cons of using a large learning rate?

Pros: -Will get good, optimal parameters Cons: -Will take longer to train model -It may not learn

What are the pros and cons of using a small learning rate?

You split the data into K groups, and keep 1 for testing and the others for training. Number of images in each group is total number of images/number of folds (N/K) Example: For K=5, 1 set is test and 4 sets are training. If there were 1000 total images (N=1000), then there would be 200 images in each set (1000/5). 1 set (200 images) would be for testing, and the other 4 sets (800 images) would be for training.

What is K-fold validation?

It is difficult to determine the number of splines

What is a con of piecewise regression?

The number of sets equals the total number of images in the dataset (K=N). One set (single obs) is kept for testing and the others are used for training. Example: For a data set with 1000 total images (N=1000), the number of sets is K=1000. 1 obs is for testing and 999 for training.

What is leave-one-out validation?

Masking allows you to see which variable is the most impactful by hiding one variable at a time, "masking it." You can see how the impact of the other variables changes when a specific variable is masked.

What is the key component of TabNet which improves its explainability and how is the improvement achieved?

-Using what was learned from a particular task to solve a different task -Example: using a model classifying cats and dogs and retraining it for classification of racoons and deer Multiple ways: -Use model as-is and train it further with more data -Use model as part of a new network and then train it further with more data

What is transfer learning?

1st layer: 64 beta0 2nd layer: 32 beta0 3rd layer: 100 beta0 Total: 196 biases

You are building an ANN model for wine quality prediction. How many biases does your model have?

1st layer: 7 var input*64 neurons = 448 2nd layer: 64 neurons*32 neurons = 2048 3rd layer: 32 neurons*100 neurons = 3200 Bias: 196 Total: 448 + 2048 + 3200 + 196 = 5892 betas

You are building an ANN model for wine quality prediction. How many parameters does your model have?

Each neuron has a single bias (Beta0) 1st layer: 16 neurons=16 bias 2nd layer: 8 neurons=8 bias 3rd later: 2 neurons=2 bias Total biases: 26

You are building an ANN model which takes 200-by-200 RGB images. The following is the overview of your ANN model structure. How many biases does your model have?

Parameters= Bias + (total images)*(1st layer neurons) + (1st layer neurons)*(2nd layer neurons) + ... + (n-1 layer neurons)*(n layer neurons) Total images=200*200*RGB=200*200*3= 120000 Bias: 26 1st layer: 120000*16= 1920000 2nd layer: 16*8= 128 3rd layer: 8*2= 16 TOTAL: 26 + 1920000 + 128 + 16= 1920170 betas

You are building an ANN model which takes 200-by-200 RGB images. The following is the overview of your ANN model structure. How many parameters does your model have?

Total images=300*300*3 (RGB) = 270000 Bias= 20 + 12 + 3 = 35 bias 1st layer= 270000*20 = 5400000 2nd layer= 20*12= 240 3rd layer= 12*3= 36 Total parameters: 35 + 5400000 + 240 + 36 = 5400311 betas

You are building an ANN model which takes 300-by-300 RGB images. The following is the overview of your ANN model structure. How many parameters does your model have?


Related study sets

BA 200 Ch 18 Forms of Doing Business

View Set

Focus on personal finance chapter 2

View Set

Brunner & Suddarth CH33: Assessment and Management of Patients with Allergic Disorders

View Set

ECON HW 4 (Chapter 8 Perfect Competition)

View Set

Measuring Output and Income: Quiz

View Set

valencia college-Anatomy and physiology midterm

View Set

Chapter 10: Business Cycles, Unemployment, and Inflation

View Set

Chapter 6: Social Security & Medicare

View Set

1.3 - Implied Depth: Value and Space

View Set