ML - Chapter1

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What is the purpose of validation set?

A holdout set that you use to validate the selected model and hyperparameters. A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

What is the difference between a model parameter and a learning algorithm's hyperparameter?

A hyperparameter is a parameter of the learning algorithm, not the model. A model has one or more model parameters that determine what it will predict given a new instance (e.g., the slope of a linear model). A learning algorithm tries to find optimal values for these parameters such that the model generalizes well to new instances. A hyperparameter is a parameter of the learning algorithm itself, not of the model (e.g., the amount of regularization to apply).

What is a labeled training set?

A labeled training set is a data set you feed to an algorithm which includes the desired solutions, called labels.

What is a test set and why would you want to use it?

A test set is a subset of the training data that is used to test the model. A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.

How would you define Machine Learning?

Machine Learning is the science (and art) of programming computers so they can learn from data. Machine Learning is about building systems that can learn from data. Learning means getting better at some task, given some performance measure.

What type of machine Learning algorithm would you use to allow a robot to walk in various unknown terrains?

Reinforcement Learning (Semisupervised Learning) Reinforcement Learning is likely to perform best if we want a robot to learn to walk in various unknown terrains since this is typically the type of problem that Reinforcement Learning tackles. It might be possible to express the problem as a supervised or semisupervised learning problem, but it would be less natural.

Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

Supervised Learning

What is out-of-core learning?

The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data. Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer's main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.

Can you name four common unsupervised tasks?

Visualizations Algorithms: feed them a log of complex and unlabeled data, and they output a 2D or 3D representation of you data the can easily be plotted. Dimensionality reduction: the goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one. Anomaly detection: This system is trained with normal instances, and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly. Association rule Learning: the goal is to dig into large amounts of data and discover interesting relations between attributes. Common unsupervised tasks include clustering, visualization, dimensionality reduction, and association rule learning.

Can you name four of the main challenges in Machine Learning?

1) Insufficient Quantity of Training Data 2) Nonrepresentative Training Data 3) Poor-Quality Data 4) Irrelevant Features Some of the main challenges in Machine Learning are the lack of data, poor data quality, nonrepresentative data, uninformative features, excessively simple models that underfit the training data, and excessively complex models that overfit the data.

Can you name four types of problems where it shines?

1) Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better. 2) Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. 3) Fluctuating environments: a Machine Learning system can adapt to new data. 4) Getting insights about complex problems and large amounts of data. Machine Learning is great for complex problems for which we have no algorithmic solution, to replace long lists of hand-tuned rules, to build systems that adapt to fluctuating environments, and finally to help humans learn (e.g., data mining).

What are the two most common supervised tasks?

Classification and regression.

What type of learning algorithm relies on similarity measure to make predictions?

Instance-based learning An instance-based learning system learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.

What do model-based learning algorithms search for? What is the most common strategy they use to succeed? How do they make predictions?

Model-based learning algorithms search for an optimal value for the model parameters such that the model will generalize well to new instances. We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on the training data, plus a penalty for model complexity if the model is regularized. To make predictions, we feed the new instance's features into the model's predictions function using the parameter values found by the learning algorithm.

What type of algorithm would you use to segment your customers into multiple groups?

Online Learning If you don't know how to define the groups, then you can use a clustering algorithm (unsupervised learning) to segment your customers into clusters of similar customers. However, if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised learning), and it will classify all your customers into these groups.

If your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?

The model has overfit the training data. Possible solutions are: 1) To simplify the model by selecting one with fewer parameters, by reducing the number of attributes in the training data, or by constraining the model. 2) To gather more training data. 3) To reduce the noise in the training data (e.g. fix data errors and remove outliers). If a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data (or we got extremely lucky on the training data). Possible solutions to overfitting are getting more data, simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model), or reducing the noise in the training data.

What can go wrong if you tune hyperparameters using the test set?

The model will be tuned for that set, but is unlikely to perform well on new sets. If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect).

What is an online learning system?

You train the system incrementally be feeding it data instances sequentially, either individually or by small groups called mini-batches. An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.

What is cross-validation and why would you prefer it to a validation set?

the training set is split into complementary subsets, and each model is trained against different combination of these subsets and validated against the remaining parts. This avoids "wasting" too much training data in validation sets. Cross-validation is a technique that makes it possible to compare models (for model selection and hyperparameter tuning) without the need for a separate validation set. This saves precious training data.


संबंधित स्टडी सेट्स

The Client with Musculoskeletal Health Problems

View Set

Unit 2: Rate, Ratio, and Proportional Reasoning Using Equivalent Fractions

View Set