Data Science Definitions

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is not Machine Learning?

a) Artificial Intelligence b) Rule based inference

what is running time?

# of computational steps an algorithm takes-important because we want to know how the # of steps scales when the size of the input **it's not literally the time to execute the algo

When should you use classification over regression?

-Classification produces discrete values and dataset to strict categories -regression gives you continuous results that allow you to better distinguish differences between individual points -use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories

Bias

-Definition: error due to erroneous or overly simplistic assumptions in the algorithm -Results: underfitting your data, lower high predictive accuracy and more difficulty generalizing your knowledge from the training set to the test set.

Variance

-Definition: error due to too much complexity in the algo -Result: algo is highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data--> too much noise from your training data for your model to be very useful for your test data.

What's the F1 score? How would you use it?

-It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. -You would use it in classification tests where true negatives don't matter much.

Explain the difference between L1 and L2 regularization.

-L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. -L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.

What's the difference between probability and likelihood?

-Likelihood is the probability (conditional probability) of a event ( a set of success ) occur by knowing the probability of a success occur. -Probability is the percentage that a success occur. Ex: probability of getting a heads coin toss=0.5. likelihood is the probability that some event (toss multiple times) will happen knowing that the probability is 0.5 ( toss one time)

How is a decision tree pruned?

-When certain branches of a decision tree have weak predictive power, we want them removed to reduce the complexity of the model and increase model's predictive accuracy -Reduced error pruning is perhaps the simplest version: replace each node. If it doesn't decrease predictive accuracy, keep it pruned.

What are the main factors of analyzing an algo

-correctness: proof by induction

What's the difference between a generative and discriminative model?

-generative model: learn categories of data -discriminative model: learn the distinction between different categories of data. **Discriminative models will generally outperform generative models on classification tasks.

How do you ensure you're not overfitting with a model?

1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data. 2- Use cross-validation techniques such as k-folds cross-validation. 3- Use regularization techniques such as LASSO that penalize certain model parameters if they're likely to cause overfitting.

How can you avoid overfitting ?

1. use a lot of data 2. cross validation

What's a Fourier transform?

A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Ex: given a pie, it's how we find the recipe. -The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it's a very common way to extract features from audio signals or other time series such as sensor data.

What is classifier in machine learning?

A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous feature values and outputs a single discrete value, the class.

How would you implement a recommendation system for our company's users?

A lot of machine learning interview questions of this type will involve implementation of machine learning models to a company's problems. You'll have to research the company and its industry in-depth, especially the revenue drivers the company has, and the types of users the company takes on in the context of the industry it's in.

How would you simulate the approach AlphaGo took to beat Lee Sidol at Go?

AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning. The Nature paper above describes how this was accomplished with "Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play."

How would you handle an imbalanced dataset?

An accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump: 1- Collect more data to even the imbalances in the dataset. 2- Resample the dataset to correct for imbalances. 3- Try a different algorithm altogether on your dataset.

What are some differences between a linked list and an array?

An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.

Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test?

Bayes' Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu.

What are Bayesian Networks (BN) ?

Bayesian Network is used to represent the graphical model for probability relationship among a set of variables .

Explain the two components of Bayesian logic program?

Bayesian logic program consists of two components. The first component is a logical one ; it consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain. The second component is a quantitative one, it encodes the quantitative information about the domain.

Which is more important to you- model accuracy, or model performance?

Depends. -Ex: if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all!

What is the difference between artificial learning and machine learning?

Designing and developing algorithms according to the behaviours based on empirical data are known as Machine Learning. While artificial intelligence in addition to machine learning, it also covers other aspects like knowledge representation, natural language processing, planning, robotics etc.

Why ensemble learning is used?

Ensemble learning is used to improve the classification, prediction, function approximation etc of a model.

When to use ensemble learning?

Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.

Name an example where ensemble techniques might be useful.

Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. -reduce overfitting in models and make the model more robust -Ex: bagging, boosting, "bucket of models" method

Explain what is the function of 'Supervised Learning'?

a) Classifications b) Speech recognition c) Regression d) Predict time series e) Annotate strings

What is Genetic Programming?

Genetic programming is one of the two techniques used in machine learning. The model is based on the testing and selecting the best choice among a set of results.

What is dimension reduction in Machine Learning?

In Machine Learning and statistics, dimension reduction is the process of reducing the number of random variables under considerations and can be divided into feature selection and feature extraction

What is Perceptron in Machine Learning?

In Machine Learning, Perceptron is an algorithm for supervised classification of the input into one of several possible non-binary outputs.

What are the advantages of Naive Bayes?

In Naïve Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data. The main advantage is that it can't learn interactions between features.

What is 'Overfitting' in Machine learning?

In machine learning, when a statistical model describes random error or noise instead of underlying relationship 'overfitting' occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfit.

What is 'Training set' and 'Test set'?

In various areas of information science like machine learning, a set of data is used to discover the potentially predictive relationship known as 'Training Set'. Training set is an examples given to the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner. Training set are distinct from Test set.

What is an Incremental Learning algorithm in ensemble?

Incremental learning method is the ability of an algorithm to learn from new data that may be available after classifier has already been generated from already available dataset.

What is Inductive Logic Programming in Machine Learning?

Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical programming representing background knowledge and examples.

Why instance based learning algorithm sometimes referred as Lazy learning algorithm?

Instance based learning algorithm is also referred as Lazy learning algorithm as they delay the induction or generalization process until classification is performed.

How is KNN different from k-means clustering?

KNN needs labeled points while k-means doesn't

What is algorithm independent machine learning?

Machine learning in where mathematical foundations is independent of any particular classifier or learning algorithm is referred as algorithm independent machine learning?

Where do you usually source datasets?

Machine learning interview questions like these try to get at the heart of your machine learning interest. Somebody who is truly passionate about machine learning will have gone off and done side projects on their own, and have a good idea of what great datasets are out there. If you're missing any, check out Quandl for economic and financial data, and Kaggle's Datasets collection for another great list.

How do you think Google is training data for self-driving cars?

Machine learning interview questions like this one really test your knowledge of different machine learning methods, and your inventiveness if you don't know the answer. Google is currently using recaptcha to source labelled data on storefronts and traffic signs. They are also building on training data collected by Sebastian Thrun at GoogleX — some of which was obtained by his grad students driving buggies on desert dunes!

What is Machine learning?

Machine learning is a branch of computer science which deals with system programming in order to automatically learn and improve with experience. For example: Robots are programed so that they can perform the task based on data they gather from sensors. It automatically learns programs from data.

2) Mention the difference between Data Mining and Machine learning?

Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns. During this process machine, learning algorithms are used.

What's your favorite algorithm, and can you explain it to me in less than a minute?

Make sure you have a choice and make sure you can explain different algorithms so simply and effectively that a five-year-old could grasp the basics!

Why is Bayes Theorem useful for machine learning?

Mathematically, it's expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition.

What is PAC Learning?

PAC (Probably Approximately Correct) learning is a learning framework that has been introduced to analyze learning algorithms and their statistical efficiency.

What is PCA, KPCA and ICA used for?

PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component Analysis) and ICA ( Independent Component Analysis) are important feature extraction techniques used for dimensionality reduction.

In what areas Pattern Recognition is used?

Pattern Recognition can be used in a) Computer Vision b) Speech Recognition c) Data Mining d) Statistics e) Informal Retrieval f) Bio-Informatics

Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

Popular tools include R's ggplot, Python's seaborn and matplotlib, and tools such as Plot.ly and Tableau.

What is sequence learning?

Sequence learning is a method of teaching and learning in a logical manner.

What is batch statistical learning?

Statistical learning techniques allow learning a function or predictor from a set of observed data that can make predictions about unseen or future data. These techniques provide guarantees on the performance of the learned predictor on the future unseen data based on a statistical assumption on the data generating process.

What are support vector machines?

Support vector machines are supervised learning algorithms used for classification and regression analysis.

What's the "kernel trick" and how is it useful?

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

How would you approach the "Netflix Prize" competition?

The Netflix Prize was a famed competition where Netflix offered $1,000,000 for a better collaborative filtering algorithm. The team that won called BellKor had a 10% improvement and used an ensemble of different methods to win. Some familiarity with the case and its solution will help demonstrate you've paid attention to machine learning for a while.

What are the areas in robotics and information processing where sequential prediction problem arises?

The areas in robotics and information processing where sequential prediction problem arises are a) Imitation Learning b) Structured prediction c) Model based reinforcement learning

If a classifier assigns labels at random, how many true positives and true negatives does it have?

The classifier has as many true positives as true negatives (this is the straight diagonal of the ROC curve)

What is the difference between heuristic for rule learning and heuristics for decision trees?

The difference is that the heuristics for decision trees evaluate the average quality of a number of disjointed sets while rule learners only evaluate the quality of the set of instances that is covered with the candidate rule.

List down various approaches for machine learning?

The different approaches in Machine Learning are a) Concept Vs Classification Learning b) Symbolic Vs Statistical Learning c) Inductive Vs Analytical Learning

What are the different methods for Sequential Supervised Learning?

The different methods to solve Sequential Supervised Learning problems are a) Sliding-window methods b) Recurrent sliding windows c) Hidden Markow models d) Maximum entropy Markow models e) Conditional random fields f) Graph transformer networks

What are the different Algorithm techniques in Machine Learning?

The different types of techniques in Machine Learning are a) Supervised Learning b) Unsupervised Learning c) Semi-supervised Learning d) Reinforcement Learning e) Transduction f) Learning to Learn

What is bias-variance decomposition of classification error in ensemble method?

The expected error of a learning algorithm can be decomposed into bias and variance. A bias term measures how closely the average classifier produced by the learning algorithm matches the target function. The variance term measures how much the learning algorithm's prediction fluctuates for different training sets.

What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?

The general principle of an ensemble method is to combine the predictions of several models built with a given learning algorithm in order to improve robustness over a single model. Bagging is a method in ensemble for improving unstable estimation or classification schemes. While boosting method are used sequentially to reduce the bias of the combined model. Boosting and Bagging both can reduce errors by reducing the variance term.

What are the components of relational evaluation techniques?

The important components of relational evaluation techniques are a) Data Acquisition b) Ground Truth Acquisition c) Cross Validation Technique d) Query Type e) Scoring Metric f) Significance Test

What is inductive machine learning?

The inductive machine learning involves the process of learning by examples, where a system, from a set of observed instances tries to induce a general rule.

Why overfitting happens?

The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

What is Model Selection in Machine Learning?

The process of selecting models among different mathematical models, which are used to describe the same data set is known as Model Selection. Model selection is applied to the fields of statistics, machine learning and data mining.

Give a popular application of machine learning that you see on day to day basis?

The recommendation engine implemented by major ecommerce websites uses Machine Learning

What are the three stages to build the hypotheses or model in machine learning?

The standard approach to supervised learning is to split the set of example into the training set and the test.

What are the two methods used for the calibration in Supervised Learning?

The two methods used for predicting good probabilities in Supervised Learning are a) Platt Calibration b) Isotonic Regression These methods are designed for binary classification, and it is not trivial.

What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are a) Sequential ensemble methods b) Parallel ensemble methods

What are two techniques of Machine Learning ?

The two techniques of Machine Learning are a) Genetic Programming b) Inductive Learning

How can we use your machine learning skills to generate revenue?

This is a tricky question. The ideal answer would demonstrate knowledge of what drives the business and how your skills could relate. For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run. The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.

Pick an algorithm. Write the psuedo-code for a parallel implementation.

This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your ability to write code that reflects parallelism.

What is ensemble learning?

To solve a particular computational program, multiple models such as classifiers or experts are strategically generated and combined. This process is known as ensemble learning.

Which method is frequently used to prevent overfitting?

When there is sufficient data 'Isotonic Regression' is used to prevent an overfitting issue.

What cross-validation technique would you use on a time series dataset?

Wrong: standard k-folds cross-validation-realize that a time series is not randomly distributed data — it is inherently chronological Right-forward chaining where you'll be able to model on past data then look at forward-facing data. fold 1 : training [1], test [2] fold 2 : training [1 2], test [3] fold 3 : training [1 2 3], test [4] fold 4 : training [1 2 3 4], test [5] fold 5 : training [1 2 3 4 5], test [6]

How do you handle missing or corrupted data in a dataset?

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value. In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.

What evaluation approaches would you work to gauge the effectiveness of a machine learning model?

You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data. You should then implement a choice selection of performance metrics: here is a fairly comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion matrix. What's important here is to demonstrate that you understand the nuances of how a model is measured and how to choose the right performance measures for the right situations.

What are the two classification methods that SVM ( Support Vector Machine) can handle?

a) Combining binary classifiers b) Modifying binary to incorporate multiclass learning

Explain what is the function of 'Unsupervised Learning'?

a) Find clusters of the data b) Find low-dimensional representations of the data c) Find interesting directions in data d) Interesting coordinates and correlations e) Find novel observations/ database cleaning

What are the different categories you can categorized the sequence learning process?

a) Sequence prediction b) Sequence generation c) Sequence recognition d) Sequential decision

Bias and Variance tradeoff

decomposes the learning error from any algo. -if you make the model more complex and add more variables, you'll lose bias but gain some variance

Describe a hash table.

hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.

Why is "Naive" Bayes naive?

it makes an assumption that is virtually impossible in real data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life. Ex: A Naive Bayes classifier that figured out that you liked pickles and ice cream would probably naively recommend you a pickle ice cream.


Ensembles d'études connexes

MGR - Module 2: Diverse Workforce Matters and Diversity Management

View Set

Environmental Health Quiz Questions

View Set

Lección 4 Estructura 4.4: 2 - COMPLETAR

View Set

(Chemistry) Chapter 7 Section B- Carbohydrates and Fats Study Guide- Jaren Katz

View Set