CECS 456 Midterm

Ace your homework & exams now with Quizwiz!

Linear Regression

Using a straight line to predict a numerical value. Line that best fits. Slope intercept form => y(predicted value) = X(feature)*W(weight/constant) + b(y-intercept/w0)

What is bootstrap sampling?

A resampling procedure that uses data from one sample to generate a sampling distribution by repeatedly taking random samples from the known sample, with replacement.

How does a random forest differ from BAGGing?

A random forest uses a modified tree learning algorithm that selects at each candidate split in the learning process.

What are the three layers of a CNN?

1. Convolutional Layer 2. Pooling Layer 3. Fully-connected Layer Early layers focus on simple features. At the end of processing, the CNN recognizes larger elements, eventually identifying the object.

What is a single-layer perception model?

A representation of a linear separator in input space; or a straight line that separates inputs in an xy plane to determine the output. Think boolean functions.

What is ensemble learning?

A combination of predictors of several base estimators built with a given learning algorithm in order to improve generalizability/robustness over a single estimator. Combination of several ML models ("weak learners") to make one optimal predictive model ("strong learner").

What is regression?

A form of supervised learning where the model predicts results within a continuous output. Predicted values are numerical. Goal is to map a continuous function, like a parabola.

What is classification?

A form of supervised learning where the model specifies which of k categories some input belongs to. Model predicts results in a discrete output, or the model categorizes the inputs.

Convolutional Network/Convolutional Neural Network (CNN)

A kind of neural network for processing data with a grid like topology. The network employs "convolution". It is most often used for classification and computer vision tasks. Examples of uses: time series data (a 1D grid of regular time intervals), image data (a 2D grid of pixels)

What is information entropy?

A measure of the uncertainty of a random variable. A coin that is always heads has an entropy of 0. A fair coin has higher entropy than the heads coin. A coin with 99% chance heads should have entropy lower than a fair coin but slightly higher than all heads coin.

What is a multi-layer perception model?

A model of one or more hidden layers & non-linear activation functions to solve problems with linearly non-separable data.

What is an activation function?

A non-linear function that allows for the approximation of complex functions & provides non-linearities into the network. Default activation function in modern neural networks is ReLU g(z) = max {0, z}

What is polynomial regression?

A type of regression where the output is determined 2 or more independent variables.

What is linear regression?

A type of regression where the output is determined by one independent variable. Think y = mx+b.

What is a feature?

Abstract representations of data. Appear as a column in a dataset.

What are the classification metrics?

Accuracy - how often the model is making a correct prediction (TP + TN) / (TP + FP + TN + FN) Precision - how close the measured values are to one another TP / (TP + FP) Sensitivity (recall) - how good the model is at predicting positives TP / (TP + FN) F1-Score - combines recall & precision by taking harmonic mean (2 x precision x recall) / (precision + recall)

What is a feedforward neural network?

An ML algorithm loosely inspired by neuroscience that is a composition of various different functions. Layers between input & output layers are called hidden layers. Final layer is the output layer. Depth of a network is defined by the number of layer (including the output layer). Width of a network is its dimensionality or number of inputs. A layer consists of many units (neurons), each representing a vector-to-scalar function.

What are the components of ML?

Datasets, Features, and Algorithms

What is unsupervised learning?

Approach to problems with little or no idea what the results should look like. Uses a dataset where we don't know the effects of the variables. Algorithm discovers patters from unlabeled data & identifies hidden features. Applications: Clustering, Dimensionality Reduction, Association, & Anomaly Detection.

Guess the supervised learning type: Given a patient with a tumor, we have to predict whether the tumor is malignant of benign.

Classification

What is a dataset?

Collection of samples of data used to train a ML algorithm. Can include numbers, images, texts, etc.

What is logistic regression?

Estimates probability of an event occurring based on a given set of data points, such as voted or didn't vote. Since the outcome is a probability, the dependent variable is bound between 0 & 1. Can be used to solve both regression & classification problems. Mainly used to solve binary classification. Predicts a value between 0 & 1 exclusively.

What are the advantages of SVR

Flexible, robust to outliers, decision model can be easily updated, performs lower computations compared to other regression techniques.

Information gain equation

Gain(A) = B(p/p+n) - sum(k=1, d, ((pk + nk)/(p+n)) * B(pk/(pk+nk)) where B(q) = -(q*log2(q) + (1 - q) x log2(1 - q))

What is Bootstrap Aggregating (BAGGing)?

Given a standard training set D of size n, BAGGing generates k new training sets Di, each of size n' (typically n' = n), by sampling from D uniformly with replacement. Using bootstrap sampling to generate a new sample set for each predictor in ensemble learning.

How do you choose an attribute for a decision tree?

Ideally pick an attribute that splits input attributes into all positive & all negative

What is the difference between binary, multinominal, & ordinal logistic regression?

In binary logistic regression, the dependent variable is dichotomous. It is the Most common classifier for binary classification. Ex. spam detection In multinominal logistic regression, the dependent variable has 3 or more possible outcomes, however, these values have no specified order. Ex. types of food. In ordinal logistic regression, the dependent variable has 3 or more possible outcomes in a defined order. Ex. Grading scales from A to F or rating scales from 1 to 5.

What is semi-supervised learning?

Input data is mix of labeled & unlabeled samples. Programmer has to keep in mind the desired outcome but the model must find patters to structure the data & make predictions itself.

What are components of SVR?

Kernel: function/algorithm used to map lower dimensional data into higher dimensional data. Hyperplane: The line that helps us predict the target value. Middle line in graph Boundary lines: decision boundaries of the hyper plane - used to create a margin between data points. Lines on either side of the hyperplane. Support vectors: data points that are closest to decision boundaries. Used to define hyperplane.

What is a decision tree?

Non-parametric supervised learning algorithm that is hierarchical in structure like a tree graph. Represents a function that takes as input a vector of attribute values & returns a "decision" - a single output value. Used for both regression & classification.

Bayes' Law/Theorem

P(b | a) = P(a | b)P(b) / P(a) P(b | a) - the conditional probability that b occurs given a has occurred P(a | b) - the conditional probability of a occurring given b has occurred P(b) - the probability of b P(a) - the probability of a

Fully-Connected Layer

Performs task classification based on features extracted from previous layers

What are the pros & cons of KNN?

Pros: Easy to implement, adapts easily to new data, & few hyperparameters (only k & distance metric) Cons: Doesn't scale well, doesn't perform well with high dimensional data, & prone to overfitting

Guess the supervised learning type: Given a picture of a person, we have to predict their age on the basis of the given picture.

Regression

Guess the supervised learning type: Given data about the size of houses in the real estate market, try to predict their price.

Regression

What is normalization?

Scales features in the range of 0 to 1. Formula depends on min & max values of features, hence we need to ensure those values are correct & not affected by outliers.

What is standardization?

Scales the feature so that it's mean becomes 0 with a standard deviation of 1. No predefined range between feature will get scaled.

Logistic regression equation

Sigmoid function, y-hat = 1 / 1 + e^(-z) z = b + sum(wn*xn) b - bias (y-intercept), moves curve left & right w - model's learned weights (slope), values that control behavior of system. x - features (a known input)

What are the types of feature scaling?

Standardization (Z-score scaling) & normalization (min-max scaling)

What is AI?

The effort to automate intellectual tasks normally performed by humans. AI encompass ML & DL but includes more approaches that don't involve learning.

What is ML?

The field of study that gives computers the ability to learn without being explicitly programmed.

What are the regions of a confusion matrix?

True positive, true negative, false positive, & false negative

What is reinforcement learning?

Type of learning technique that enables an agent to learn in an interactive environment by trial & error using feedback from its own actions & experiences. Uses a reward system.

What is support vector regression (SVR)?

Type of regression that uses Support Vector Machine (SVM) to solve regression problems. Fits the error inside a certain threshold and as many data points without violating the margin (epsilon). Goal is to approximate the best value within a given margin (epsilon)-tube

What are the different types of learning?

Unsupervised learning, supervised learning, Semi-supervised learning, & reinforcement learning.

What is supervised learning?

Uses a dataset where we already know what the correct output should be. We have an idea what the relationship between the input & output should be. Trains on labeled data (input & output variables given). The system can predict future outcomes based on past data. Applications: Classification & Regression

Pooling Layer

Uses a pooling function to 'shrink' the output of a net of values to some summary statistic of the nearby outputs.

Convolutional Layer components

f - input g - kernel/filter (feature detector) f*g - feature map/activation map

KNN(lazy learning algo, mem-based)

non-parametric, distance-based k-nearest neighbor -> using the assumption that data points that are near each other would be in the same category/class, majority vote to classify a new data point. Choose odd number for k to avoid ties. Lazy learning algorithm

Naive Bayes Classifier

predicts the probability of a certain outcome based on prior occurrences of related events. Used in supervised learning. Every feature is indepedent P(b | a) - Posterior Probability P(a) - Evidence P(b) - Prior Probability P(a | b) - Likelihood


Related study sets

Quiz: CompTIA Network+ N10-008 Post-Assessment Quiz

View Set

Shock NCLEX Questions, Nclex Questions for Shock - Critical Care, Sepsis NCLEX, Ch 67: shock, SIRS, MODS

View Set

Marketing Chapter 18, Chapter 16, Intro to Bus. Chapter 13, Marketing 4, Chapter 16, Chapter 16, Marketing Chapter 17, HRIM 442 Ch 17 Exam 3, Marketing Ch 17, Marketing Ch 17-19, Marketing Chapter 17 & 18, Marketing Chapter 17, mkt ch 16, Marketing 4...

View Set

Principles of Information Security

View Set

PMBOK Ch. 8 - Project QUALITY Management (PQM)

View Set