Machine Learning

Ace your homework & exams now with Quizwiz!

What is logistic regression?

Logistic regression is a classification algorithm. It is used to predict a binary outcome based on a set of independent variables. Logistic regression is used to calculate the probability of a binary event occurring, and to deal with issues of classification. For example predicting if an incoming email is spam or not spam, or predicting if a credit card is fraud or not fraud. Is a tumor malignant or not malignant?

What are factorization machines?

Factorization Machines is most often associated with high-dimensional sparse datasets that we are likely to see in one-hot encoded datasets.

What visualizations can be used for multiple variable distributions for several objects?

A box plot or scatter plot can be used for multiple variable distributions.

What is PCA?

Principle component analysis (PCA) is an unsupervised algorithm that attempts to reduce dimensional its while retaining as much information as possible.

What is a visualization that compares values between variables in a look up fashion?

A bar chart can compare values between variables in a loop up fashion. I.e. how much does this car cost compared to these other cars.

What visualization is used to find relationships between three variables?

A bubble chart can be used to describe a relationship between three variables.

What visualization can be used for compositions of variables within a object that is static?

A pie chart can be used for compositions of a variables within an object that is static? I.e. categories of spending for the month of April

What is an example of a ML unsupervised clustering problem?

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

What is DeepAR?

DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce both point and probabilistic forecasts.

What happens if AWS sagemaker does not support a desired algorithm?

If sagemaker does not support a desired algorithm, you can either bring your own or buy/subscribe to an algorithm from the AWS marketplace.

What is a machine learning classification problem?

In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

What is an example of a ML Non-clustering problem?

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

What is one-hot encoding?

One hot encoding is a process by which categorical variables are converted into a numeric form that could be provided to ML algorithms to do a better job in prediction.

What is the Sematic Segementation algorithm?

Sematic Segmentation can perform edge detection, which could be used to identify orientation. Object Detection and Image Analysis are better used for images where Sematic segmentation can great a segmentation mask or outline of part of an image.

What are the two types of unsupervised learning?

The two types of unsupervised learning are clustering and non-clustering.

What is machine learning unsupervised learning?

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables. We can derive this structure by clustering the data based on relationships among the variables in the data. With unsupervised learning there is no feedback based on the prediction results.

What is the XGBoost algorithm?

XGBoost is an extremely flexible algorithm which. An be used in regression, binary classification and multi-class classification.

What visualization can be used for compositions or variables of an object that change over time?

A stacked bar chart or a stacked area chart can be used for compositions of variables that change over time.

What visualization is used to find relationships between two variables?

A scatter plot is the best visualization to describe a relationship between two variables.

What is Amazon Comprehend?

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.

What are the two supervised learning category types?

Supervised learning problems are categorized into "regression" and "classification" problems.

What is Object2Vec

The AWS sagemaker Object2Vec algorithm is a general purpose neural embedding algorithm that is highly customizable. You can use the learned embedding to efficiently compute nearest neighbors of objects an to visualize natural clusters of related objects in low-intensional space.

What is Machine Learning?

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." Example: playing checkers. E = the experience of playing many games of checkers T = the task of playing checkers. P = the probability that the program will win the next game. In general, any machine learning problem can be assigned to one of two broad classifications: Supervised learning and Unsupervised learning.

What visualization can be used for a single distribution?

A histogram can be used for single distributions. I.e. what were the grades for a math class?

What visualization compares changes over time between variables?

A line chart compares changes over time between variables.

What is a machine learning regression problem?

In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

What is supervised Learning?

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

What is Stochastic Gradient Descent (SGD)?

Stochastic Gradient Descent (SGD) is a cost function that seeks to find the minimal error. This can be analogous to trying to find the lowest point on a landscape.


Related study sets

Chapter 32: Skin Integrity and Wound Care

View Set

Chapter 7 - investing activities

View Set

Chapter 8 The Cellular Basis of Reproduction & Inheritance

View Set

Adult Health 1 Exam 2 Concepts - Modules 3 and 4

View Set

CH 11 Pricing Strategies Additional Considerations

View Set

Investments Chapter 6 Study Guide

View Set

ATI PN Pharmacology Proctor Exam 2020

View Set

XCEL Chapter 4: Life Insurance Policies - Provisions, Options and Riders

View Set