Deep Learning/Machine Learning

Ace your homework & exams now with Quizwiz!

How to import TensorFlow?

# import tensorflow as tf

What is Normal Distribution?

- A "normal" distribution is also known as a bell-shaped curve or Gaussian curve. - In a Gaussian or normal distribution, the mean , mode and median would all have the same (or similar) value and would look like the figure. - A normal distribution is perfectly symmetrical around its center. - Total area under the curve = 1

What is Bias and Variance Tradeoff?

- Bias and variance is a ways to diagnose the performance of a prediction algorithm by breaking down its prediction error.There are 2 types of prediction error: BIAS and VARIANCE - <1>The bias is an error from erroneous assumptions in the learning algorithm. Bias occurs when an algo' has limited flexibility to learn the true signal from a dataset. - It is the difference between your model expected prediction and the true value. - <2> Variance is the algorithm's sensitivity to specific sets of training data. Variance is the variability of model prediction for a given data point.

When will you use classification over regression?

- Classification is used when your target variable is CATEGORICAL. E.g. predict gender of a person, type of color,etc. - Regression is used when target variable is CONTINUOUS. E.g. estimate sale and price of a product, predicting sports score, amount of rainfall, etc. Both belong to the category of Supervised ML Algorithms.

Types of recommender systems?

- Collaborative filters are one of the most popular recommender models used in the industry and have found huge success for companies such as Amazon. - Collaborative filtering can be broadly classified into two types: Content-based systems:

What is Linear Regression?

- It is a way of finding a relationship between a single, continuous variable called Dependent or Target variable and one or more other variables (continuous or not) called Independent Variables. - It's a straight line curve. - The distance between dots and regression line is errors. - Linear regression aims at finding best fitting straight line by minimizing the sum of squared vertical distance between dots and regression line.

What is Simple vs. Multiple Linear Regression?

- Linear regression can be simple linear regression when you have only one independent variable . - Whereas Multiple linear regression will have more than one independent variable.

How to detect Outliers?

- Most commonly used method to detect outliers is visualization. - We use various visualization methods, like Box-plot, Histogram, Scatter Plot

What is Overfitting?

- Overfitting occurs when the model fit/learn the training data too closely/too well instead of predicting unseen data. It is the result of a complex model with many variables. - A model that is overfitted is inaccurate because the trend does not reflect the reality of the data.

What are the applications of supervised Machine Learning in modern businesses?

- Spam detection - Sentiment analysis - Healthcare Diagnosis - Fraud Detection

What is normally distribution?

- The normal distribution is the most important and most widely used distribution in statistics. - It is sometimes called the "bell curve," - A normal distribution is perfectly symmetrical around its center. - A normal distribution has a bell-shaped density curve described by its mean and standard deviation . -The density curve is symmetrical, centered about its mean, with its spread determined by its standard deviation.

How can you choose a classifier based on training set size?

- When the training set is small, a model that has a high bias and low variance seems to work better because they are less likely to overfit. e.g. Naive Bayes work best - When the training set is large, model with low bias and high variance tends to perform better as they work fine with complex relationships. e.g. decision tree.

Why might it be preferable to include fewer predictors over many?

- When we add irrelevant features, it increases model's tendency to overfit because those features introduce more noise. - When two variables are correlated, they might be harder to interpret in case of regression, etc. - curse of dimensionality - adding random noise makes the model more complicated but useless - computational cost

What are the three stages to build a model in Machine learning?

1> Model building: - Choose the suitable algorithm for the model and train it according to the requirement. 2> Model testing: - Check the accuracy of the model through the test data. 3> Applying the model: - Make the require changed after testing and apply the final model.

Equation of entropy?

= - p x log2(p) - q x log2(q)

What does it mean to have high variance?

A high variance indicates that the data points are very spread out from the mean, and from one another. - A model is too specific (overfitting), leading to high variance

What does it mean when standard deviation is higher than mean?

A large standard deviation indicates that the data points can spread far from the mean and a small standard deviation indicates that they are clustered closely around the mean. ... The third population has a much smaller standard deviation than the other two because its values are all close to 7.

What does a standard deviation of 1 mean?

A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.

When a normal distribution is called Standard Normal Distribution (SND)?

A normal distribution is called Standard Normal Distribution (SND) when its mean is zero and SD is equal to 1.

What is normal distributed determined by?

A normal distribution is determined by two parameters the mean and the variance

What does it mean to have small variance?

A small variance indicates that the data points tend to be very close to the mean, and to each other.

What error metric would you use to evaluate how good a binary classifier is? What if the classes are imbalanced? What if there are more than 2 groups?

Accuracy: proportion of instances you predict correctly. Pros: intuitive, easy to explain, Cons: works poorly when the class labels are imbalanced and the signal from the data is weak AUROC: plot fpr on the x axis and tpr on the y axis for different threshold. Given a random positive instance and a random negative instance, the AUC is the probability that you can identify who's who. Pros: Works well when testing the ability of distinguishing the two classes, Cons: can't interpret predictions as probabilities (because AUC is determined by rankings), so can't explain the uncertainty of the model logloss/deviance: Pros: error metric based on probabilities, Cons: very sensitive to false positives, negatives When there are more than 2 groups, we can have k binary classifications and add them up for logloss. Some metrics like AUC is only applicable in the binary case.

Applications of Reinforcement leanring?

Applications of ____ in: Economics Genetics Game playing

Given a database of all previous alumni donations to your university, how would you predict which recent alumni are most likely to donate?

Based on frequency and amount of donations, graduation year, major, etc, construct a supervised regression (or binary classification) algorithm.

In a search engine, given partial data on what the user has typed, how would you predict the user's eventual search query?

Based on the past frequencies of words shown up given a sequence of words, we can construct conditional probabilities of the set of next sequences of words that can show up (n-gram). The sequences with highest conditional probabilities can show up as top candidates. To further improve this algorithm, we can put more weight on past sequences which showed up more recently and near your location to account for trends show your recent searches given partial data

You're Uber and you want to design a heatmap to recommend to drivers where to wait for a passenger. How would you approach this?

Based on the past pickup location of passengers around the same time of the day, day of the week (month, year), construct Ask someone for more details. Based on the number of past pickups account for periodicity (seasonal, monthly, weekly, daily, hourly) special events (concerts, festivals, etc.) from tweets

Vectors are (ordered/unordered) collections of numbers?

Because vectors are ordered collections of numbers, they are often seen as column matrices: they have just one column and a certain number of rows.

What is the difference between bias and precision?

Bias is the average difference between the estimator and the true value. Precision is the standard deviation of the estimator. One measure of the overall variability is the Mean Squared Error, MSE, which is the average of the individual squared errors.

What is bias?

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. - Bias is the average difference between the estimator and the true value. - High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). - Model with high bias pays very little attention to the training data and oversimplifies the model. - It always leads to high error on training and test data.

Consider the long list of ML algo' given a dataset, how do you decide which one to use?

Choosing an algo' depends on following questions: -

Explain confusion matrix with respect to Machine Learning algorithm?

Confusion matrix (or error matrix) is a specific table that is used to measure the performance of an algorithm. - It is mostly used in supervised learnign ( in un-supervised learning it is called matching matrix) - Confusion matrix has 2 dimentsions: 1) Actual 2) Predicted - It also has identical sets of features in both these dimensions

What is meant by dependent and independent variables?

Dependent variable depends upon independent variable.

What is Machine Learning?

Give computer ability to learn to make decisions from data without being explicitly programmed.

What is Item-based filtering?

If a group of people have rated two items similarly, then the two items must be similar. Therefore, if a person likes one particular item, they're likely to be interested in the other item too.

What is k-fold cross-validation?

In _____, we basically do holdout cross-validation many times. So in ______, we partition the dataset into k equal-sized samples. This cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation.

What is Holdout cross-validation?

In _______, we hold out a percentage of observations and so we get two datasets. One is called the training dataset and the other is called the testing dataset. Here, we use the testing dataset to calculate our evaluation metrics, and the rest of the data is used to train the model.

What is Variable Transformation?

In data modelling, transformation refers to the replacement of a variable by a function. For instance, replacing a variable x by the square / cube root or logarithm x is a transformation. In other words, transformation is a process that changes the distribution or relationship of a variable with others.

What does high bias mean?

In machine learning terminology, underfitting means that a model is too general, leading to high bias, while overfitting means that a model is too specific, leading to high variance. ... Since you can't realistically avoid bias and variance altogether, this is called the bias-variance tradeoff.

Bias-variance tradeoff is___

In prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to "bias" and error due to "variance". There is a tradeoff between a model's ability to minimize bias and variance. Understanding these two types of error can help us diagnose model results and avoid the mistake of over- or under-fitting.

What is Z distribution?

In statistics, the Z-distribution is used to help find probabilities and percentiles for regular normal distributions (X). -It serves as the standard by which all other normal distributions are measured. -The Z-distribution is a normal distribution with mean zero and standard deviation 1;

What is nodes and edges in Tensorflow do?

In the data flow graphs, nodes represent mathematical operations, while the edges represent the data, which usually are multidimensional data arrays or tensors, that are communicated between these edges.

What are the essential steps in a predictive modeling project?

It consists of the following steps - 1 - Establish business objective of a predictive model 2 - Pull Historical Data - Internal and External 3 - Select Observation and Performance Window 4 - Create newly derived variables 5 - Split Data into Training, Validation and Test Samples 6 - Clean Data - Treatment of Missing Values and Outliers 7 - Variable Reduction / Selection 8 - Variable Transformation 9 - Develop Model 10 -Validate Model 11 - Check Model Performance 12 - Deploy Model 13 - Monitor Model

Advantage of holdout cross-validation?

It is very easy to implement and it is a very intuitive method of cross-validation.

Problem of holdout cross-validation?

It provides a single estimate for the evaluation metric of the model. This is problematic because some models rely on randomness. So in principle, it is possible that the evaluation metrics calculated on the test sometimes they will vary a lot

LSA is stand for___

Latent Semantic Analysis

LSA stands for___

Latent Semantic Analysis

What is Variable Type of Linear Regression?

Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).

What are the types of Outliers?

Outlier can be of two types: Univariate and Multivariate. - Multi-variate outliers are outliers in an n-dimensional space.

What is Random Forest?

Random Forest is a supervised ML Algo that is generally used for classification problems. - Random Forest operates by constructing multiple Decision trees during training phase. The decision of the majority of the trees is chosen by the random forest as the final decision.

What is regularization and where might it be helpful? What is an example of using regularization in a model?

Regularization is useful for reducing variance in the model, meaning avoiding overfitting . For example, we can use L1 regularization in Lasso regression to penalize large coefficients.

What is difference between simple linear and multiple linear regressions?

Simple linear regression has only one x and one y variable. Multiple linear regression has one y and two or more x variables. For instance, when we predict rent based on square feet alone that is simple linear regression. When we predict rent based on square feet and age of the building that is an example of multiple linear regression.

SVD stands for ___

Single Value Decomposition

How do you calculate normal distribution?

So to convert a value to a Standard Score ("z-score"): first subtract the mean, then divide by the Standard Deviation.

How do you convert a normal distribution to a standard normal distribution?

So to convert a value to a Standard Score ("z-score"): first subtract the mean, then divide by the Standard Deviation.

Reinforcement learning

Software agents interact with an environment: - Learn how to optimize their behavior - Given a system of rewards and punishments - Draw inspiration from behavioral psychology

What is the importance of standard deviation?

Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. A low standard deviation means that most of the numbers are very close to the average. A high standard deviation means that the numbers are spread out.

Given a Dataset) Analyze this dataset and give me a model that can predict this response variable.

Start by fitting a simple model (multivariate regression, logistic regression), do some feature engineering accordingly, and then try some complicated models. Always split the dataset into train, validation, test dataset and use cross validation to check their performance. Determine if the problem is classification or regression Favor simple models that run quickly and you can easily explain. Mention cross validation as a means to evaluate the model. Plot and visualize the data.

What are the 4 assumptions of linear regression?

The 4 assumptions are: - Linearity of residuals - Independence of residuals - Normal distribution of residuals - Equal variance of residuals

What is Z * in statistics?

The Z score is a test of statistical significance that helps you decide whether or not to reject the null hypothesis. - The p-value is the probability that you have falsely rejected the null hypothesis. - Z scores are measures of standard deviation. ... Both statistics are associated with the standard normal distribution.

What is residual?

The difference between an observed (actual) value of the dependent variable and the value of the dependent variable predicted from the regression line.

Define observation and performance window?

The first step of building a predictive model is to choose period for predictors (independent variables) and target variable (dependent variable). For that we need to define the observation and performance period (window). It is important to spend enough time on this step of the predictive modeling project. 1 - Observation Period: - is the period from where independent variables /predictors come from. - In other words, the independent variables are created considering this period (window) only. 2 - Performance Period: - is the period from where dependent variable /target come from. - It is the period following the observation window.

What is the goal of Linear Regression?

The goal of simple (univariate) linear regression is to model the relationship between a single feature (explanatory variable x) and a continuous valued response (target variable y). y = ax + b

What is User-based filtering?

The main idea behind (...) is that if we are able to find users that have bought and liked similar items in the past, they are more likely to buy similar items in the future too. Therefore, these models recommend items to a user that similar users have also liked

What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?

The model that has high training accuracy might have low test accuracy. Without further knowledge, it is hard to know which dataset represents the population data and thus the generalizability of the algorithm is hard to measure. This should be mitigated by repeated splitting of train vs test dataset (as in cross validation). When there is a change in data distribution, this is called the dataset shift. If the train and test data has a different distribution, then the classifier would likely overfit to the train data. This issue can be overcome by using a more general learning method. This can occur when: P(y|x) are the same but P(x) are different. (covariate shift) P(y|x) are different. (concept shift) The causes can be: Training samples are obtained in a biased way. (sample selection bias) Train is different from test because of temporal, spatial changes. (non-stationary environments) Solution to covariate shift importance weighted cv

What is Standard Normal Distribution?

The standard normal distribution (z distribution) is a normal distribution with a mean of 0 and a standard deviation of 1.

What are z scores used for?

The standard score (more commonly referred to as a z-score) is a very useful statistic because it (a) allows us to calculate the probability of a score occurring within our normal distribution and (b) enables us to compare two scores that are from different normal distributions.

What is Z score in normal distribution?

The standard score (more commonly referred to as a z-score) is a very useful statistic because it (a) allows us to calculate the probability of a score occurring within our normal distribution and (b) enables us to compare two scores that are from different normal distributions.

How do you handle missing or corrupted data in a dataset?

The ways to handle missing/corrupted ata is to drop those rows/columns or replace them completely with some other value. There are 2 methods in Pandas. 1> Isnull() and dropna() will help finding the columns/rows with missing data and drop them. 2> Fillna() will replace the wrong values with a placeholder value(0)

How To Treat Outliers?

There are several methods to treat outliers - Percentile Capping Box-Plot Method Standard Deviation Weight of Evidence Transformation ------------------------------------ - A box plot is a graphical display for describing the distribution of the data. Box plots use the median and the lower and upper quartiles. An outlier is defined as the value above or below the upper or lower fences. - If a value is higher than the mean plus or minus three Standard Deviation is considered as outlier. It is based on the characteristics of a normal distribution for which 99.87% of the data appear within this range.

What are various ways to predict a binary response variable? Can you compare two of them and tell me when one would be more appropriate? What's the difference between these? (SVM, Logistic Regression, Naive Bayes, Decision Tree, etc.)

Things to look at: N, P, linearly seperable?, features independent?, likely to overfit?, speed, performance, memory usage Logistic Regression features roughly linear, problem roughly linearly separable robust to noise, use l1,l2 regularization for model selection, avoid overfitting the output come as probabilities efficient and the computation can be distributed can be used as a baseline for other algorithms (-) can hardly handle categorical features SVM with a nonlinear kernel, can deal with problems that are not linearly separable (-) slow to train, for most industry scale applications, not really efficient Naive Bayes computationally efficient when P is large by alleviating the curse of dimensionality works surprisingly well for some cases even if the condition doesn't hold with word frequencies as features, the independence assumption can be seen reasonable. So the algorithm can be used in text categorization (-) conditional independence of every other feature should be met Tree Ensembles good for large N and large P, can deal with categorical features very well non parametric, so no need to worry about outliers GBT's work better but the parameters are harder to tune RF works out of the box, but usually performs worse than GBT Deep Learning works well for some classification tasks (e.g. image) used to squeeze something out of the problem

How do you find the Z value?

To find the Z score of a sample, you'll need to find the mean, variance and standard deviation of the sample. To calculate the z-score, you will find the difference between a value in the sample and the mean, and divide it by the standard deviation.

How to prevent overfitting?

To prevent overfitting, we can use techniques like : 1> Cross-validation, 2> Regularization, early stopping, pruning, Bayesian priors, dropout and model comparison. Make a simple model: withe lesser variables and parameters, the variance can be reduced.

Difference Between Linear And Logistic Regression?

Two main difference are as follows - 1> -Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups). - While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). - Multinomial or ordinary logistic regression can have dependent variable with more than two categories. 2> -Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value. - While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood)

What is Unsupervised leanring?

Uncovering hidden patterns from unlabeled data

What is variance?

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. - Variance is the average of the squared distances from each point to the mean. - Variance shows how subject the model is to outliers, meaning those values that are far away from the mean. -Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn't seen before. -As a result, such models perform very well on training data but has high error rates on test data.

What are some ways I can make my model more robust to outliers?

We can have regularization such as L1 or L2 to reduce variance (increase bias). Changes to the algorithm: Use tree-based methods instead of regression methods as they are more resistant to outliers. For statistical tests, use non parametric tests instead of parametric ones. Use robust error metrics such as MAE or Huber Loss instead of MSE. Changes to the data: Winsorizing the data Transforming the data (e.g. log) Remove them only if you're certain they're anomalies not worth predicting

What is deep learning?

____ is a sub-field of machine learning that is a set of algorithms that is inspired by the structure and function of the brain

What is offset or residuals?

____are the vertical lines from the regression line to the sample points, call prediction errors. vertical offset = |y^ - y|

What is regression line?

___is the best-fitting line.

What is entropy?

___is the measure of disorder or how messy the data is.

Cross validation?

a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it. - Cross validation is a model evaluation method that is better than residuals. - The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen.

What is Vectors ?

are special types of matrices, which are rectangular arrays of numbers.

Descriptive statistics

are statistics that describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. - Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

How would you suggest to a franchise where to open a new store?

build a master dataset with local demographic information available for each location. local income levels, proximity to traffic, weather, population density, proximity to other businesses a reference dataset on local, regional, and national macroeconomic conditions (e.g. unemployment, inflation, prime interest rate, etc.) any data on the local franchise owner-operators, to the degree the manager identify a set of KPIs acceptable to the management that had requested the analysis concerning the most desirable factors surrounding a franchise quarterly operating profit, ROI, EVA, pay-down rate, etc. run econometric models to understand the relative significance of each variable run machine learning algorithms to predict the performance of each location candidate

2 kinds of cross validation ?

exhaustive and non-exhaustive

How would you predict who someone may want to send a Snapchat or Gmail to?

for each user, assign a score of how likely someone would send an email to the rest is feature engineering: number of past emails, how many responses, the last time they exchanged an email, whether the last email ends with a question mark, features about the other users, etc. Ask someone for more details. People who someone sent emails the most in the past, conditioning on time decay.

What is the name of "TensorFlow" derived from?

from the operations which neural networks perform on multidimensional data arrays or tensors!

explicitly

hiểu, rõ ràng; hoàn hảo, chi tiết

What Is P-value And How It Is Used For Variable Selection?

p-value is level of significance at which you can reject null hypothesis. p-value or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true. - One commonly used p-value = 0.05. - p-value < 0.05, --> reject the null hypothesis and accept the alternative hypothesis. - p-value > 0.05, accept the null hypothesis.

What is TensorFlow?

the second machine learning framework that Google created and used to design, build, and train deep learning models.

What TensorFlow can do?

to do numerical computations, which are done with data flow graphs.

lexicon

từ vựng

What does a standard deviation of 0 mean?

xi - x = 0. This means that every data value is equal to the mean. This result along with the one above allows us to say that the sample standard deviation of a data set is zero if and only if all of its values are identical.

bias

xu hướng

What does the Z score tell you?

z-score is how many standard deviations away from the mean a data point is.


Related study sets

Chapter 15: Earth's Oceans Study Guide

View Set

Pharmacology-Autonomic Nervous system

View Set

Chapter 7-11 and 14 Gaddis 6th, Starting out with java

View Set

Bio - Chapter 21 - evidence for evolution (a), Bio - Chapter 21 - evidence for evolution (b), Bio - Chapter 21 - evidence for evolution (c), Bio - Chapter 21 - evidence for evolution (d), Bio - Chapter 21 - evidence for evolution (e)

View Set

Carbon and Alloy Steels and Alloy Steel Filler Metals

View Set

CNA HIPAA (Health Insurance Portability Accountability Act)

View Set