CS 7643 Quiz 3

Ace your homework & exams now with Quizwiz!

L1 Loss

Sum of Absolute Value of (true - predicted)

L2 Loss

Sum of Absolute Value of (true - predicted)^2

Mask R-CNN

same as Faster R-CNN, but learns a mask that says which pixels touch the object helps deal w background pixels Lot of hyper parameters Slower than YOLO/SSD but more accurate in general

Mean Squared Error (MSE)

Average of (true - predicted)^2

Focal Loss

-1 * (1- prediction of true class)^gamma * log(prediction of true class)

Balanced Cross-Entropy Loss

-1 * alpha * log(prediction of true class)

Class Balanced Focal Loss

-1 * alpha_t (1- prediction of true class)^gamma * log(prediction of true class)

Binary Cross-Entropy Loss

-1* log(prediction of true class)

VGGNet

2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are storing activation produced in forward pass) Critical Development: Blocks of repeated structures

AlexNet

2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling (used 7 NN with different random weights) Critical development: More depth and ReLU

ResNet

Allow information from a layer to propagate to a future layer Passes residuals of a layer at depth x and adds it to the output of the layer at x+1 Averaging block at end Critical Development: Passing residuals of previous layers forward

Faster R-CNN

DL to do everything RPN (region proposal network) ... NN generates proposals select TopK of them outputs objectness score & bounding box Losses: 4 losses: bounding box loss, objectiveness score loss, regression loss, classifier loss for each class . anchors as grid

Inception Net

Deeper and more complex than VGGNet Average Pooling before FC Layer Repeated blocks that are repeated over again to form NN Blocks are made of simple layers, FC, Conv, MaxPool, and softmax Parallel filters of different sizes to get features at multiple scales Critical Development: Blocks of parallel paths Uses Network In Network concept i.e 1x1 Convolution -sort of Dimensionality reduction see slide Negative things: Increased Computational Work

Estimation Error

Even if finding the best hypothesis, weights, and parameters that minimize training error, may not generalize to test set

Optimization Error

Even if your NN can perfectly model the world, your algo may not find good weights that model the function. When model complexity increases, modeling error reduces, but optimization error increases.

R-CNN

Find regions of interest (ROIs) with object-like things. Slow -based on Selective Search; returns scores and bounding boxes; 100's of images/crops to process. Wasting a lot of computing resources for the same image portions. Classifier those regions and refine their bounding boxes

Number of parameters for CNN

For each layer sum ((Kernel_dim1 * Kernel_dim2 * Input_dim1) + 1) * NumberOfFilters Helpful for parameters: https://stackoverflow.com/questions/42786717/how-to-calculate-the-number-of-parameters-for-convolutional-neural-network

Number of parameters for FC Layers

For each layer sum NumHiddenUnits * (InputSize + 1) Helpful for parameters: https://stackoverflow.com/questions/42786717/how-to-calculate-the-number-of-parameters-for-convolutional-neural-network

Modeling Error

Given a particular NN architecture, the actual model that represents the real world may not be in that space. When model complexity increases, modeling error reduces, but optimization error increases.

Equivariance

If the input changes, the output changes in the same way if f(g(x) =g(f(x). If the beak of a bird in a picture moves a bit, the output values will move in the same way change to input causes equal change to output

Invariance

If the input changes, the output stays the same. That is f(g(x)) = f(x) E.g. rotating/scaling a number will still result in that number being classified the same.. change to input does not affect output Useful if we care more about if a feature is present than exactly where it is

Transpose convolution

In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input. If we feed X into a convolutional layer f to output Y=f(X) and create a transposed convolutional layer g with the same hyperparameters as f except for the number of output channels being the number of channels in X, then g(Y) will have the same shape as X. We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer. Sometimes it is known a deconvolution, the forward and backward passes are essentially reversed when compared to a regular convolution layer. Normal convolution layer map pixels -> features, transpose/deconv layer match features -> pixels. Source: https://machinelearningmastery.com/upsampling-and-transpose-convolution-layers-for-generative-adversarial-networks/

Adversarial examples

Inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence.

saliency maps

Instead of using deconvnets, can, instead of taking the error gradient wrt model parameters, take gradient of class score we're trying to visualize wrt to the image itself (the input of the network) gets the degree to which a pixel contributed to that class score take absolute value of score because we care about degree, not direction helps us understand why model gives response they did Another method to make saliency maps: guided backpropagation algorithm (combination of deconvnet and gradient of class score wrt input of network) Sensitivity of loss to individual pixel changes , uses pre-softmax scores (gradient, then absval, then sum across channels)

Guided Backprop

Layer by layer (deconvolution is similar to backprop) From details to more abstracted representations

Number of parameters for Pooling Layers

None

Memory per CNN layer (KB)

NumFilters * HeightOut * WidthOut * BytePerElement / 1024 BytePerElement = 4 for 32-bit floating point

Memory per FC Layer (KB)

NumHiddenNodes * BytePerElement / 1024 BytePerElement = 4 for 32-bit floating point

Receptive fields

Receptive Field (RF) is defined as the size of the region in the input that produces the feature. Basically, it is a measure of association of an output feature (of any layer) to the input region (patch) The size of the region in the input that produces the feature When dealing with high-dimensional inputs such as images, as we saw above it is impractical to connect neurons to all neurons in the previous volume. Instead, we will connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron (equivalently this is the filter size). The extent of the connectivity along the depth axis is always equal to the depth of the input volume. It is important to emphasize again this asymmetry in how we treat the spatial dimensions (width and height) and the depth dimension: The connections are local in 2D space (along width and height), but always full along the entire depth of the input volume. Easy to understand link: https://theaisummer.com/receptive-field/

Effectiveness of transfer learning under certain conditions

Remove last FC layer of CNN and initialize it randomly, then run new data through network to train only that layer In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn parameters in the FC layers. Performs very well on very small amount of training, if similar to the original data Does not work very well if the target task's dataset is very different If you have enough data in the target domain, and is different than the source, better to just train on the new data Transfer learning = reuse features we learn on a very large dataset on a completely new thing Steps: Train on very large dataset Take custom dataset and initialize network with weights trained in Step 1 (replace last fully connected layer since classes in new network will be different) Final step -> continue training on new dataset Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers (freezing reduces number of parameters that you need to learn)

SSD (Single-Shot Detector)

grid as anchors w different scales/aspect ratios Based on VGG model till conv5_3 layer

Forwards and backwards computation across a convolution layer (i.e. know whether backwards with respect to the weights or input is a convolution or cross-correlation.

https://medium.com/@pavisj/convolutions-and-backpropagations-46026a8f5d2c https://glassboxmedicine.com/2019/07/26/convolution-vs-cross-correlation/

Convolutional layers and how they work (forward/backward)

https://www.youtube.com/watch?v=Lakz2MoHy6o&t=1299s (Don't have a good short summary)

Fast R-CNN

map each ROI in image to corresponding region in feature maps Reuse comp by finding regions in feature maps Feature extraction once per image Issue: variable input size to FC layers Solved with ROI Pooling

Style Transfer

measure the difference in style between the synthesized image and the style image Sum of Gram matrix of each layer Stype vs prediction Where G is the Gram matrix that abstracts the correlation between the layers

Grad-CAM

more versatile version of CAM that can produce visual explanations for any arbitrary CNN, even if the network contains a stack of fully connected layers too let the gradients of any target concept score flow into the final convolutional layer; then compute an importance score based on the gradients and produce a coarse localization map highlighting the important regions in the image for predicting that concept What regions of image is model looking at to make prediction? Which individual regions have highest class activation as you extract layer from CNN? Direction/magnitude of gradients to determine which gradients are causing the most updates to the NN Objective: inspective given layer of CNN and correlate to output Task specific (if asked what is a dog -> dog pixels are more important)

YOLO (You Only Look Once)

single-scale faster for same size Customized architecture, full connected at the end NMS before results

Content Loss

the difference in content features between the synthesized image and the content image via the squared loss function

CAM = Class Activation Mapping

use Global Average Pooling layer as final layer to average the activations of each feature map and run through softmax loss layer to highlight the important regions of the image by projecting back the weights of the output on the convolutional feature maps


Related study sets

Possible Quiz Questions for A Man For All Seasons

View Set

Chapter 1 : Life Skills (Standard Foundations)

View Set

4.04 Quiz: Buying Clothes and Shopping 2

View Set

Ch 14: Collective Bargaining and Unions in Today's Workplace

View Set

Musculoskeletal/Integument Ch.38, 39, 40, 41, 42

View Set

Language of Medicine -Ch 21 -Review Sheet

View Set