Machine Learning

Ace your homework & exams now with Quizwiz!

What is regression?

A process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. A typical problem includes finding a model that takes in input and produces a numerical output. IN REGRESSION, WE ARE PREDICTING A CONTINUOUS VARIABLE

Given an inconsistent system Ax = b, derive the least squares solution, (i.e. an x₀ s.t |Ax₀ - b| is minimized)

An x₀ that minimizes |Ax₀ - b| is a solution to the normal equations A^TAx₀ = A^Tb (Linear Algebra and It's applications pg. 363)

What are the five methods of building models (within multiple linear regression)?

1) All-in (use all your variables) 2) Backward Elimination 3) Forward Selection 4) Bidirectional Elimination 5) Score Comparison

What are some applications of machine learning?

1) Facebook photo recognition 2) Voice recognition 3) Recommendation systems

Logistic Regression Algorithm

A categorical regression model used to predict a binary dependent variable. The logistic regression algorithm is a supervised learning algorithm We are typically https://en.wikipedia.org/wiki/Logistic_regression#Model_fitting

Formal definition of Machine learning

A computer program is said to learn from experience E with respect to tast T and some performance measure P, if its performance on T, as measured by P, improves with experience E (Stanford lecture 1)

What is Simple Linear Regression?

A machine learning algorithm in which we use a linear equation to model the relationship between a single independent variable and a single dependent variable. Does not require feature scaling

What is Multiple Linear Regression?

A machine learning algorithm in which we use a linear equation to model the relationship between multiple independent variables and a single dependent variable. Does not require feature scaling

What is reinforcement learning?

A paradigm of learning by trial-and-error, solely from rewards or punishments

What are interval scales? Give examples

Interval scales are numeric scales in which we know not only the order, but also the exact differences between the values. A good example of an interval scale is Celsius temperature or time.

What is the K-nearest neighbors (KNN) algorithm?

It's a classification (can be regression) algorithm that uses K of the nearest neighbors to classify a new data object. We are given a collection of labeled objects in features space and are then given an unlabeled point. We then use K of the nearest neighbors to classify the new object based on a majority https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm#Algorithm

What are ordinal scales? Give examples

Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc. With ordinal scales, it is the order of the values that is import, but the differences between each one is not really known. http://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/

What are nominal scales? Give examples

Scales used for labeling variables, without any quantitative value. Think of labels http://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/

What is the Vapnik-Chervonenkis (VC) dimension of a hypothesis class H? (VC(H)) (pg. 27)

The VC dimension of a hypothesis class H is the maximum number of points that can be arranged so that H shatters them. It is the maximum number of points that can be shattered by H (pg. 27)

What is the definition of Machine Learning?

The field of study that gives computers the ability to learn without being explicitly programmed. (Stanford Lecture 1)

What are Dummy Variables? What is the dummy Variable trap?

Variables we create to represent categorical variables. When using dummy variables, always omit one of the dummy variables (WHY????)

What is data preprocessing? What are the details?

What we do to prepare our dataset so that we can run it through an algorithm. Details include importing our data, filling missing data, creating training and testing arrays, feature scaling, etc.

Explain what standard deviation and variance are

http://www.mathsisfun.com/data/standard-deviation.html

What are the four types of data measurement scales?

nominal, ordinal, interval, and ratio http://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/

What does it mean for a hypothesis class H to shatter N points?

pg. 27

What are ratio scales? Give examples

Ratio scales tell us about order, exact values between units and they have an absolute zero. Good examples of ratio variables include height and weight

Which library do we use to help manage datasets?

pandas

Let A be an mxn matrix. Given the columns of A are linearly independent, state two logically equivalent statements regarding the least squares solution to the equation Ax = b

(pg. 365)

What assumptions are used in Linear Regression?

1) Linearity 2) Homoscedasticity 3) Multivariate normality 4) Independence of errors 5) Lack of multicollinearity

List of common classification algorithms

1) Logistic Regression 2) K-Nearest Neighbors 3) Support Vector Machine (SVM) 4) Kernel SVM 5) Naive Bayes 6) Decision Tree Classification 7) Random Forest Classification

What are support vector machines? Aka SVR

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. http://docs.opencv.org/2.4/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html

What is a random variable?

A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous.

What is class learning?

Class learning is finding a description that is shared by all positive examples but none of the negative examples? (Introduction to Machine Learning pg. 16)

What are the two types of random variables? Differentiate between them

Discrete and continuous. Discrete random variables take on distinct values (think of categories). Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten, etc. A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile.

What is Gradient Descent?

Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point (https://en.wikipedia.org/wiki/Gradient_descent)

What is overfitting? How is it avoided?

In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.

What is feature scaling? Why must it be done?

Since many algorithms are based on Euclidean distance, if our features don't all use the same scale, features with greater differences in values are more heavily weighted than they should be.

What is supervised learning?

Supervised learning is the machine learning task of inferring a function from labeled training data. Both regression and classification are supervised learning problems (Wikipedia)

What is data mining?

The application of machine learning algorithms to large amounts of data (Introduction to Machine Learning pg. 16)

logistic function

The function used in logistic regression https://en.wikipedia.org/wiki/Logistic_regression#Model_fitting

What is unsupervised learning?

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.

What is a classification problem?

When our outputs take discrete (rather than continuous) values.


Related study sets

Bio Lab Quiz Questions (Final ) Quiz answers

View Set

CH 5 Activity-based costing (ABC) and Activity-based Management (ABM) (SU3)

View Set

Chemistry module 4 -- organic chemistry (Chapter 14 'Alcohols')

View Set