Machine Learning

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What are the steps in a backward elimination model method?

1. Select a significance level to stay in the model 2. Fit the full model with all possible predictors 3. Consider the predictor with the highest P-value. if P > SL, go to step 4, otherwise go to FIN (if all your p-value(s) are less than your significance level then your model is finished.) 4. Remove predictors with the lowest p-values 5. Fit the model without this/these variable

What are the steps in a All Possible model method?

1.Select a criterion of the goodness of fit (e.g. Akaike criterion) 2. Construct All Possible Regression Models: 2^n-1 total combinations 3.Select the one with the best criterion

True and False/ Yes and No is coded as what when encoding for Machine Learning? Why?

0 and 1 because they are binary

What are the five methods of building models?

1. All in 2. Backward Elimination 3. Foward Selection 4. Bidirectional Elimination 5. Score Comparison

What are the five assumptions of linear regression?

1. Linearity 2. Homoscedasticity 3. Multivariate normality 4. Independence of errors 5. Lack of multicollinearity

What are the steps in a bidirectional elimination model method?

1. Select a significance level to enter and to stay in the model e.g: SL ENTER = 0.05, SL STAY =0.05 2. Perform the next step of the Forward Selection (new variables must have: P < SL ENTER to enter) 3. Perform ALL steps of Backward Elimination (old variables must have P < SL STAY to stay) 4. No new variables can enter and no old variables can exit

What are the steps in a Forward Selection model method?

1. Select a significance level to enter the model 2. Fit all simple regression models y~xn(all variables). Select the one with the lowest P-value 3. Keep this variable and fit all possible models with one extra predictor added to the one(s) you already have 4. Consider the predictor with the lowest P-value. If P < SL(when this is NOT true), go to step 3, otherwise go to FIN

Reinforcement Learning

Algorithms that take actions to maximize cumulative reward

Why do we call the it a linear regression?

As the independent increase or decreases the dependent variable increases or decreases in a linear fashion.

What are the 10 applications of Machine Learning discussed in the course?

Face recognition, facebook ads, voice recognition, Amazon, Audible, Netflix, physical video games, medical imaging, space imaging, robots.

What is the dummy variable trap?

If there are n binary variables and each observation falls into one and only one category, then the regression will fail because of multicollinearity. Always omit one dummy variable

In a regression y is the

dependent variable

Not all Artificial Intelligence could count as Machine Learning since some basic Rule-based engines could be classified as AI but they don't learn from experience therefore they do not belong to the machine learning category

True

The simple linear regression package takes care of feature scaling(standardizing and normalizing the data)

True

Deep Learning

A specialized field of Machine Learning that relies on training of Deep Artificial Neural Networks (ANNs) using a large dataset such as images or texts.

Machine Learning

A subfield of Artificial Intelligence that enables machines to improve at a given task with experience. It is important to note that all machine learning techniques are classified as Artificial Intelligence ones.

Why is a simple linear regression simple?

Because it exams the relationship between two variables only.

We should use Simple Linear Regression to predict the winner of a football game

False. Simple linear regression is used to predict continuous data. The football question is a yes/no which means it is discrete (or categorical data), you will need a classification model for that like Logistic Regression

The split ratio parameter in the sample.split function can only use numerics i.e 0.8

False. You can also use fractions i.e 2/3

You should include all of your dummy variables in your multiple linear regression models

False. You should never include all of your dummy variables in your model.

What package is the stepAIC function in?

MASS

Brackets instead of parentheses in R represent

Indexes

Artificial Neural Networks

Information processing models inspired by the human brain

Why is a polynomial linear regression still linear when the visualization is curved

It is still linear because it is referring to the coefficients and the type of the equation which is b+b1+b2 etc

What does this function do? split = sample.split(dataset$Purchased, SplitRatio = 0.8)

It splits the dependent variable into a ratio of 80/20

Why would you use the all in method to build a model?

Prior Knowledge You have to (company requested etc.) Preparing for Backward Elimination

All in model method

Putting all your variables in your dataset in your model.

Artificial Intelligence

Science that empowers computers to mimic human intelligence such as decision making, text processing, and visual perception. Ai is a broader field (i.e.: the big umbrella)that contains several subfields such as machine learning, robotics, and computer vision.

what does this function mean? dataset[2:3]

Since R's columns start with an index of 1 this function is telling R to take indexes(columns) 2-3 from dataset.

Why do you have to encode your categorical variables to factors when modeling?

Since machine learning is used by using mathematical equations we have to transform these type of variables to fit into an equation for standardizing and normalizing the data

What do we know about Ordinary Least Squares?

That yi represents the observed value and yi hat represents the fitted value or the "best fit" on the trend line. The formula is calculated with the sum(yi-yi^)^2. The ordinary least squares are the minimum squared value or the 'best fit'

What is this part of the function doing? dataset$Age = ifelse(is.na(dataset$Age),

This function is saying that if the column Age in the dataset has an NA then return true or false

Supervised Learning

Training algorithms using labeled input/output data. classification and clustering are supervised learning techniques.

Unsupervised Learning

Training algorithms with no labeled data. It attempts at discovering hidden patterns on its own. Clustering is considered unsupervised learning.

When looking at your data what something you should distinguish between the variables?

What are the dependent and independent variables?

What should you do if you need to put categorical variables in your linear regression model?

You have to transform the categorical variable into dummy variables with discrete values that 'count' the category of the new dummy variable. There should be as many dummy variables as categories in the previous categorical variable

Why do you have to split the data into a training and test set when modeling?

Your model could become overfitted and to test that you would need a separate dataset.

Statistical Significance

a statistical statement of how likely it is that an obtained result occurred by chance

What package in R takes care of the dummy variable trap for you?

caTools

In a regression b1 is the

coefficient

In a regression b0 is the

constant

The lower the p-value the higher the

effect it has on the dependent variable

Multiple Linear Regression

examines the relationship between more than two variables

How can you fit a linear regression model in R with all variables except one predictor value?

formula = y ~ . - x ,where y is the objective variable and x is the predictor we want to remove from the model

In a regression x1 is the

independent variable

Before building a linear regression model you need to make sure what is true?

linear regression assumptions

What is the function used in R to create a simple linear regressor ?

lm

Polynomial Regression

models the relationship between the independent variable and the dependent variable as an nth degree of a polynomial in x

what does the alt + n shortcut do in R?

selects the tilde symbol

Write down the function used to split a dataset

split = sample.split(dataset$Purchased, SplitRatio = 0.8)

Feature Scaling

standardizing or normalizing your numeric variables to ensure the range is similar and no variable is dominated by the other.

What is this part of the function doing? ave(dataset$Age, FUN = function(x) mean(x,na.rm = TRUE)), dataset$age)

this function is getting the average of the age column in and specifying the function used (mean). Including na.rm = TRUE tells R that we want to include the missing values when R calculates the age mean. The last part is what is returned if the first condition is not True.

When would you NOT split the dataset into a training and test dataset when creating a model?

when the observations are too small (n<10)

What are the first three levels of the factor function

x(the character variable) , levels(the current name of values) , labels(the new value of the previous values)

What is the regression equation?

y = b0 + b1 * x1

What is the correct way of writing a simple linear regression equation in the formula parameter in R ?

y ~ x


संबंधित स्टडी सेट्स

Exam 1- Significant Figures (practice)

View Set

Essential Words for Ielts Unit 8: Society

View Set

Business Law: Chapter 32-Agency Formation and Duties

View Set

Removal of the Unabsorbed Portion of Poisons and Mechanisms of Antidotes

View Set

Colons, Semicolons, and Apostrophes

View Set