Linear Regression
What is gradient descent used to solve for?
- Gradient descent is used to minimize the cost function
What is the squared error?
- Residual error squared
Why do we "guess" values for the Beta coefficient
Because finding the derivative is too computationally intensive
What is a residual plot
- Should be no clear line or curve - Scalable to any number of features
What is the equation for a cost function (minimization)?
- The number 2 in denominator is because when you are trying to minimize something, you take the derivative and set to zero - Minimizing a function is the same as minimizing half that value
What are considered "good/valid" residual errors?
- residual errors are random - residual errors close to normal distribution (residuals should be close to zero)
How do you test if data is normalized?
A normal probability plot
What is the difference between a regression task and a classification task?
Continuous versus categorical data
What is a cost function?
How we define the error we are trying to minimize and how we relate to beta coefficients
Why is the root mean square error (RMSE) used instead of MSE?
It puts it back into units understood
What is simple linear regression?
Linear regression with only 1 x variable
What are hyper-perameters
Params or constants in the model itself that we can add to improve model
What is Anscombe's Quartet?
Shows four equal regression lines with problem data where linear regression is not valid (and should chose another model)
What is the residual error?
True Value - Prediction Value - Want to minimize for regression
What is the equation for average squared error for m rows?
What we need for cost function
Mathematically, what is the goal of linear regression?
We want to minimize the squared error
When do you need to normalize data?
When units are not consistent across features