ML
R-squared (R2) is ___ of __.
coefficient of determination
What algorithms are used for density estimation?
Algorithms like expectation maximization are used for density estimation
What are alternative terms of a Cost Function? #2
• Squared error function. • Mean squared error.
What standard deviation of 0 means?
A standard deviation close to 0 indicates that the data points tend to be very close to the mean (also called the expected value) of the set
A stock with high volatility means ?
A stock with high volatility generally has a high standard deviation, - The more volatile a security, the larger the variance and standard deviation.
What is recall?
#of relevant documents retrieved out of all RELEVANT document.
What is precision?
#of relevant documents retrieved out of all RETRIEVED documents.
What is standard deviation?
(___) is the square root of the variance
Covariance is a measure of (what)?
(what) is nothing but a measure of correlation?
What is Interpolation and Extrapolation?
- *Interpolation* is an estimation of a value within two known values in a sequence of values. - Extrapolation is an estimation of a value based on extending a known sequence of values or facts beyond the area that is certainly known. - Polynomial interpolation is a method of estimating values between known data points.
The Difference Between Standard Deviation and Average Deviation ?
- 2 of the most popular are standard deviation and average deviation, also called the mean absolute deviation. - Standard deviation is the most common measure of variability and is frequently used to determine the volatility of stock markets or other investments. - To calculate the standard deviation, you must first determine the variance. - Variance in itself is an excellent measure of variability and range, as a larger variance reflects a greater spread in the underlying data. - Squaring the differences between each point and the mean avoids the issue of negative differences for values below the mean, but it means the variance is no longer in the same unit of measure as the original data. - Taking the square root of the variance means the standard deviation returns to the original unit of measure and is easier to interpret and use in further calculations.
What is the difference between variance and covariance?
- A covariance refers to the measure of how two random variables will change together and is used to calculate the correlation between variables. - The variance refers to the spread of the data set—how far apart the numbers are in relation to the mean, for instance.
What is L1 and L2 regularization?
- A regression model that uses L1 regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. - L1 and L2 are two loss functions in machine learning which are used to minimize the error. - The key difference between these two is the penalty term. Ridge regression adds "squared magnitude" of coefficient as penalty term to the loss function. - The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights.
What does covariance tell you?
- Covariance is a measure of how changes in one variable are associated with changes in a second variable. - Specifically, covariance measures the degree to which two variables are linearly associated.
What is covariance divided by variance?
- It is called the covariance, and is a measure of how much the two variables change in the same direction, or are correlated. - It is proportional to the slope of the regression line. - This slope, in fact, is the covariance divided by the variance of the independent variable, sx2.
What is meant by least square method?
- Least squares is a statistical method used to determine a line of best fit by minimizing the sum of squares created by a mathematical function. - A "square" is determined by squaring the distance between a data point and the regression line.
What is Normal Distribution?
- Mean (average), Median (middle) are the same, or at least very close - Distribution curve is bell-shaped - The curve is symmetric about its center. - The wide of the curve is determined by standard deviation of the distribution
Can variance and standard deviation be negative?
- Standard deviation can not be negative because it is square rooted variance. ... - Variance is calculated by summing all the squared distances from the mean and dividing them by number of all cases. - So if one data entry in calculating variance is negative, it will always become positive when squared.
What standard deviation measure?
- Standard deviation is the most common measure of variability and is used to measure market volatility of stock markets or other investments.
Explain how a ROC curve works.
- The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. - It's often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
What is covariance?
- The formula to calculate the relationship between two variables is called covariance. - This calculation shows you the direction of the relationship as well as its relative strength. - If one variable increases and the other variable tends to also increase, the covariance would be positive. - If one variable goes up and the other tends to go down, then the covariance would be negative.
How will you define the number of clusters in a clustering algorithm?
- The k-means clustering algorithm attempts to split a given anonymous data set (a set containing no information as to class identity) into a fixed number (k) of clusters. Initially k number of so called centroids are chosen. A centroid is a data point (imaginary or real) at the center of a cluster. - In general, there is no method for determining exact value of K, but an accurate estimate can be obtained using the following techniques. One of the metrics that is commonly used to compare results across different values of K is the mean distance between data points and their cluster centroid. - The optimal number of clusters can be defined as follow: Compute clustering algorithm (e.g., k-means clustering) for different values of k. ... For each k, calculate the total within-cluster sum of square (wss) Plot the curve of wss according to the number of clusters k.
You want to forecast sales for your company and you've concluded that your company's sales go up and down depending on changes in GDP. - What is dependent and inpendent values?
- The sales you are forecasting would be the dependent variable because their value "depends" on the value of GDP - GDP is independent variable.
Are covariance and standard deviation the same?
- The standard deviation is the square root of the variance. - The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units,
Why do we use standard deviation instead of variance?
- The standard deviation is the square root of the variance. - The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units, but for looking at a distribution, you can use either just so long as you are clear about what you are using.
Why variance is used?
- Variance is used in statistics for probability distribution. - Since variance measures the variability (volatility) from an average or mean and volatility is a measure of risk, the variance statistic can help determine the risk an investor might assume when purchasing a specific security.
What is variance?
- Variance measures the average degree to which each point differs from the mean. - The greater the variance, the larger the overall data range
What is variance, covariance and correlation?
- Variance refers to the spread of the data set—how far apart the numbers are in relation to the mean. - Co-variance refers to the measure of how two random variables will change together and is used to calculate the correlation between variables
What is correlation related to covariance?
- We need to standardize the covariance in order to allow us to better interpret and use it in forecasting, and the result is the correlation calculation. - The correlation calculation simply takes the covariance and divides it by the product of the standard deviation of the two variables. - Correlation value is (-1; +1).
Naive bayes?
- a simple "probabilistic classifiers" based on Bayes' theorem, - with an assumption of independence among predictors. - In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. - Naive Bayes is considered "Naive" because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life.
Confusion matrix?
- also known as an error matrix, table of confusion. - The confusion matrix is a two by two table that contains four outcomes produced by a binary classifier. - describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. - Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the numbers of observations in different classes vary greatly).
Least square method
- finds the line of best fit for a dataset, providing a visual demonstration of the relationship between the data points. - "Least squares" means that the overall solution minimizes the sum of the squares of the residuals made in the results of every single equation. - The best fit in the least-squares sense minimizes the sum of squared residuals (a residual being: the difference between an observed value, and the fitted value provided by a model). - When the problem has substantial uncertainties in the independent variable (the x variable), then simple regression and least-squares methods have problems; in such cases, the methodology required for fitting errors-in-variables models may be considered instead of that for least squares. - Least-squares problems fall into two categories: linear or ordinary least squares and nonlinear least squares, depending on whether or not the residuals are linear in all unknowns. - The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution; - linear or ordinary, aims to create a straight line that minimizes the sum of the squares of the errors generated by the results of the associated equations, such as the squared residuals resulting from differences in the observed value and the value anticipated based on the model. - The nonlinear problem is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases. - Polynomial least squares describes the variance in a prediction of the dependent variable as a function of the independent variable and the deviations from the fitted curve. -
Why is naive Bayes so naive?
Naive Bayes is so naive because it assumes that all of the features in a dataset are equally important and independent.
What are the three stages to build the hypotheses or model in machine learning?
1. Model building 2. Model testing 3. Applying the model
How do you find the standard deviation?
1. calculate Mean (the simple average of the numbers) 2. Then for each number: subtract the Mean and square the result. 3. Then work out the mean of those squared differences. 4. Take the square root of that and we are done!
Correlation formular in related with covariance?
= cov(x,y) / ( standard variation(x) * standard variation(y) )
What does high variance mean?
A _____ indicates that the data points are very spread out from the mean.
What high standard deviation means?
A high standard deviation means that there is a large variance between the data and the statistical average, thus not as reliable.
What does perfect negative correlation mean?
A perfect ____ correlation means the relationship that exists between two variables is negative 100% of the time.
Why is covariance important?
A positive covariance means that assets generally move in the same direction. - Negative covariance means assets generally move in opposite directions. - Covariance is an important measurement used in modern portfolio theory (MPT). ... - This can reduce the volatility of the portfolio.
What are the assumptions of logistic regression?
Assumptions of Logistic Regression. Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms - particularly regarding linearity, normality, homoscedasticity, and measurement level.
What is the relationship between covariance and correlation?
Both of these two determine the relationship and measures the dependency between two random variables
How do we measure the accuracy of a hypothesis function?
By using a cost function, usually denoted by J.
To forecast sale, you would then need to determine the strength of the relationship between these two variables, sales and GDP. - If GDP increases/decreases by 1%, how much will your sales increase or decrease? -What formula do you use to calculate the relationship between 2 variables?
Co-variance
What is a high covariance?
Covariance is a measure of the directional relationship between the returns on two risky assets. - A positive covariance means that asset returns move together while a negative covariance means returns move inversely.
What do you understand by the term Normal Distribution?
Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve.
What are the disadvantages of decision trees?
Decision trees are prone to be overfit. However, this can be addressed by ensemble methods like random forests or boosted trees.
Example of Least Squares Method
For example, an analyst may want to test the relationship between a company's stock returns and the returns of the index for which the stock is a component. In this example, the analyst seeks to test the dependence of the stock returns on the index returns. To do this, all of the returns are plotted on a chart. The index returns are then designated as the independent variable, and the stock returns are the dependent variable. The line of best fit provides the analyst with coefficients explaining the level of dependence.
Why is Overfitting called high variance?
High variance means that your estimator (or learning algorithm) varies a lot depending on the data that you give it. ... This type of high variance is called ____. Thus usually ____ is related to high variance. This is bad because it means your algorithm is probably not robust to noise for example.
What is a no correlation?
If there is no correlation between x and y, that just means that there's no relationship, connection, or interdependence between the two variables. - You could think of it as meaning that x and y have nothing to do with each other.
What are the advantages of Naive Bayes?
In Naïve Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data. The main advantage is that it can't learn interactions between features.
What does high bias mean?
In machine learning terminology, underfitting means that a model is too general, leading to ____ ____, while overfitting means that a model is too specific, leading to high variance. ... - Since you can't realistically avoid bias and variance altogether, this is called the bias-variance tradeoff.
What is perfect correlation?
In statistics, a ___ correlation is represented by 1, while 0 indicates no correlation, and negative 1 indicates a perfect negative correlation.
What is naive Bayes used for?
It has been successfully used for many purposes, but it works particularly well with natural language processing (NLP) problems.
Mention the difference between Data Mining and Machine learning?
Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns. During this process machine, learning algorithms are used.
How do you explain a negative correlation?
Negative correlation is a relationship between two variables in which one variable increases as the other decreases, and vice versa. In statistics, a perfect negative correlation is represented by the value -1.00, a 0.00 indicates no correlation, and a +1.00 indicates a perfect positive correlation.
Can correlation be negative?
Negative correlation means that there is an inverse relationship between two variables - when one variable decreases, the other increases. - The vice versa is a negative correlation too, in which one variable increases and the other decreases.
What are the disadvantages of neural networks?
Neural Network requires a large amount of training data to converge. It's also difficult to pick the right architecture, and the internal "hidden" layers are incomprehensible.
Another name of Bell Curve?
Normal Distribution
What is OLS regression?
Ordinary Least Square regression - a tool commonly used in forecasting and financial analysis. - ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. - OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being predicted) in the given dataset and those predicted by the linear function.
What is decision tree pruning?
Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.
Why is regularization important?
Regularization in Machine Learning is an important concept and it solves the overfitting problem. It is very important to understand regularization to train a good model. Sometimes one resource is not enough to get you a good understanding of a concept.
What is regularization?
Regularization refers to the method of preventing overfitting, by explicitly controlling the model complexity. It leads to smoothening of the regression line and thus prevents overfitting. It does so by penalizing the bent of the regression line that tries to closely match the noisy data points.
What does ridge regression do?
Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. ... By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.
Why do we use standard deviation?
Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. - A low standard deviation means that most of the numbers are very close to the average. - A high standard deviation means that the numbers are spread out.
What does the mean and standard deviation tell you?
Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. - A low standard deviation means that most of the numbers are very close to the average. - A high standard deviation means that the numbers are spread out.
What are the advantages of naive Bayes?
Super simple, you're just doing a bunch of counts. If the NB conditional independence assumption actually holds, a Naive Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data.
What is F1 score?
The ___ is the harmonic mean, or weighted average, of the precision and recall scores. - Formula: F1 = 2 * (Precision * Recall /(Precision + Recall )) The F1 measure penalizes classifiers with imbalanced precision and recall scores, like the trivial classifier that always predicts the positive class. - A model with perfect precision and recall scores will achieve an F1 score of one.
What does the mean represent?
The ____ doesn't necessarily represent the middle of the data, and instead represents the average score, including all outlying data points.
What are the advantages decision trees?
The advantages decision trees are: • Decision trees are easy to interpret • Nonparametric • There are relatively few parameters to tune
What are the advantages of Naive Bayes?
The advantages of Naive Bayes are: • The classifier will converge quicker than discriminative models • It cannot learn the interactions between features
Why standard deviation use square (root of variance)?
The calculation of variance uses squares because it weights outliers more heavily than data very near the mean. - This also prevents differences above the mean from canceling out those below, which can sometimes result in a variance of zero.
What is the difference between L1 and L2 regularization?
The difference between L1 and L2 regularization are as follows: • L1/Laplace tends to tolerate both large values as well as very small values of coefficients more than L2/Gaussian • L1 can yield sparse models while L2 doesn't • L1 and L2 regularization prevents overfitting by shrinking on the coefficients • L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none, while L1 (Lasso) can shrink some coefficients to zero, performing variable selection • L1 is the first-moment norm |x1-x2| that is simply the absolute dıstance between two points where L2 is second-moment norm corresponding to Euclidean Distance that is |x1-x2|^2. • L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse
What are the disadvantages of Naive Bayes?
The disadvantages of Naive Bayes are: • It is because the problem arises for continuous features • It makes a very strong assumption on the shape of your data distribution • It can also happen because of data scarcity
Why is the regression line called the Line of Best Fit?
The regression line is sometimes called the "line of best fit" because it is the line that fits best when drawn through the points.
How do you avoid overfitting in decision trees?
There are several approaches to avoiding overfitting in building decision trees. Pre-pruning that stop growing the tree earlier, before it perfectly classifies the training set. Post-pruning that allows the tree to perfectly classify the training set, and then post prune the tree.
What is pre pruning?
There are several approaches to avoiding overfitting in building decision trees. Pre-pruning that stop growing the tree earlier, before it perfectly classifies the training set. Post-pruning that allows the tree to perfectly classify the training set, and then post prune the tree.
How can you assess a good logistic model?
There are various methods to assess the results of a logistic regression analysis- • Using Classification Matrix to look at the true negatives and false positives. • Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening. • Lift helps assess the logistic model by comparing it with random selection.
What are the advantages of neural networks?
This is the advanced Machine Learning Interview Questions asked in an interview. Neural networks have led to performance breakthroughs for unstructured datasets such as images, audio, and video. Their incredible flexibility allows them to learn patterns that no other Machine Learning algorithm can learn.
Why variance is used?
Variance is used in statistics for probability distribution. - Since variance measures the variability (volatility) from an average or mean and volatility is a measure of risk, the variance statistic can help determine the risk an investor might assume when purchasing a specific security.
What is lambda in regularization?
When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in the cost function. This lambda is then used to update the theta parameters in the gradient descent algorithm.
In k-NN classification, the output is a ____.
class membership
What does a covariance of 0 mean?
Zero covariance - if the two random variables are independent, the covariance will be zero. However, a covariance of zero does not necessarily mean that the variables are independent. A nonlinear relationship can exist that still would result in a covariance value of zero.
What is bias (1) and variance (2) in machine learning?
__(1)__ is the same as the mean square error (MSE). __(2)__ shows how subject the model is to outliers, meaning those values that are far away from the mean.
knn is parametric/non parametric method?
___ is non parametric method.
knn is used for ___ and ___?
___ is used for both regression and classification.
Type I error?
___ occurs when the null hypothesis (H0) is true, but is rejected.
Univariate
___ refer to function of only 1 variable.
What is a residual?
____ = y - yˆ ___ is the difference between an observed value of the response variable and the value predicted by the regression line.
Multivariate
____ involving more than two variables
What does R Squared mean?
____ is a statistical measure of how close the data are to the fitted regression line. - It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
Type II error ?
____ occurs when the null hypothesis is false, but erroneously fails to be rejected.
What are the three stages to build the hypotheses or model in machine learning?
a) Model building b) Model testing c) Applying the model
Type I error is denoted as ___?
alpha
Type II error is denoted as ___?
beta
A type II error may be compared with a so-called ____ (where an actual 'hit' was disregarded by the test and seen as a 'miss')
false negative
A type I error may be likened to a so-called ____ (a result that indicates that a given condition is present when it actually is not present).
false positive
What does density estimation mean ?
is a function that describes the relative likelihood for this random variable to take on a given value.
What is knn?
k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. - Input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: - In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. - In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.
Formula to for the slope of a regression line?
m = (y2 - y1)/(x2 - x1)
In k-NN regression, the output is the ____ for the object.
property value
discriminative
rõ ràng, tách bạch
Is naive Bayes supervised or unsupervised?
supervised
standard deviation is the square root of the _____?
variance