Machine Learning Algorithms - Supervised Learning
What are the two dilemmas in model fitting?
Over fitting and under fitting
What is over fitting?
Overfitting or a low bias model is a model that maps to the training set very closely. The model is essentially memorizing data and is unable to draw generalized examples from unseen data.
What are the applications of linear regression?
Prediction of employee salary based on years of experience.
What are the two types of data?
Quantitative and qualitative. Quantitative is data that have meaning as measurement. Qualitative is data that has no mathematical meaning but are instead categorical.
What are the two types of supervised learning?
Regression and classification
What is regression?
Regression is a type of supervised learning that estimates continuous values (real valued output)
How can we remedy over fitting ?
Regularization.
What is C in SVR?
C is the regularization parameter
What are the different forms of regression performance measurement?
Mean Absolute Error, Mean Square error, Root mean square error
What are the three types of data sources?
Structures, unstructured and semi-structured
What is regression used for ?
To investigate the relationship between variables
What is a common file format to save the data in?
CSV. Each row in a csv is an observation and each value in a row delimited by a comma is an attribute.
What is deep learning?
Deep Learning involves taking large volumes of structured or unstructured data and using complex algorithms to train neural networks. It performs complex operations to extract hidden patterns and features (for instance, distinguishing the image of a cat from that of a dog).
What is Gamma in SVR?
Determines the curvature of a decision boundary.
What are the 3 different techniques in regularization?
Elastic net, lasso and ridge
What is regularization?
It is a form of regression that constrains/regularizes or shrinks the coefficients estimates towards zero. This technique discourages learning a more complex model to avoid the risk of overfitting.
What is support vector regression?
It is a form of regression where we form two lines on either side of a given line and the data points within these lines are disregarded from the error point of view. These two surrounding lines are called the decision boundary and the the best fit line is the hyperplane.
What is the kernel in SVR?
It is a function that is used to take data as an input and transform it into the required form of processing data. The most common kernels are rbf-Radial Basis Function, poly or sigmoid.
What is linear regression?
It is a linear modelling algorithm to find the relationship between independent variables X and dependent variable Y. (y=mx+b)
What is a regression model?
It is a model that gives a target prediction based on independent variables.
What is polynomial regression?
It is a regression algorithm in which the relationship between independent and dependent variables are modeled as the nth degree of a polynomial.
Why is deep learning called as such?
It is called deep learning as there are multiple hidden layers between the input and output layer.
What is unstructured data?
It is data in the rawest form and to extract value out of it we need to extract structured features from the data then abstract meaning from it. Examples of this could be any text, videos, sound or pictures.
What is structured data?
It is tabular data which is very well defined. There are clear key-value pairs and the attributes are well-defined and known.
What is the r-squared value?
It is the coefficient of determination denoted by R^2 that is the square of the correlation. It measures the proportion of variation in the dependent variable that can be attributed to the multiple independent variables. This value is always between 0 and 1 inclusive.
How does SVR perform against outliers
It is very robust to outliers in that it is resistant to outliers
What does an r-squared value of 1 mean?
It means that all movements of a security (or another dependent variable) are COMPLETELY explained by the movements in an index. In the context of linear regression, this means that all data points lie on the line.
What does a negative r-squared value mean?
It means that there is a negative association between the dependent variable and the independent variables. This means that if the dependent variable increases , the independent variables decrease.
What does variances refer to ?
It refers to a statistical representation of the spread of numbers in a dataset.
How does deep learning work?
It works through algorithms that learn from data by updating probability weights assigned to features nodes in testing how relevant specific features are in determining the general type of item.
What are the algorithms in supervised learning regression?
Linear regression, Polynomial regression , support vector regression and MLP feedforward artificial neural network
What is semi-structured data?
Semi-structured data refers to data that has a consistent format but is not strict. It is not necessarily tabular and parts of the data may be incomplete or differing types. Examples of these are JSON and XML.
What is supervised learning?
Supervised learning is a machine learning task where of inferring a function from labelled training data.
What is the linear regression approach?
The approach of linear regression is to find the best fit line by placing a line such that the distance between all data points are minimized.
What is the goal of a neural net?
The goal is to arrive at the point of least error as fast as possible. Guess > Measure error > Update weights > incremental adjustments to coeffecients
What are the important parameters in Support Vector Regression?
The important parameters are kernel, C and gamma.
What does it mean when a model performs exceptionally well on the training set but not on the testing set?
The model is over fitted to the training set and as such is unable to draw more accurate predictions for data it has not seen before.
What is underfitting?
Underfitting or high bias is when the form of our hypothesis function h maps poorly to the trend of the data. The model is unable to capture the relationship between the input examples
What is unsupervised learning?
Unsupervised learning is a machine learning task of inferring a function to describe hidden structures in unlabeled data. (Clustering)
When does bias occur?
When the algorithm has limited flexibility to learn the true signal form a dataset
What is correlation denoted by in linear regression?
r
What are the types of machine learning?
supervised, unsupervised and reinforcement