C3.ai Glossary Terms

Ace your homework & exams now with Quizwiz!

What is R squared?

1 - (unexplained variation / total variation)

What is a hyperparameter?

A hyperparameter is a parameter whose value is set before the machine learning process begins

What is Stochastic Optimization?

A method of generating and using random variables to represent an optimization problem to produce more suitable and consistent results

What is a Gaussian Mixture Model?

A probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters

What is XGBoost?

A supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.

What is a reason to do dimensionality reduction?

Algorithms have a hard time learning patterns when there are many sources of input data relative to the amount of training data

What is a Generalized Linear Model?

An expansion of linear regression that allow different output distribution functions ("link" functions) to describe the variance of observations from the predicted values

What is the C3 AI Platform?

An open, extensible, multi-cloud platform for a wide range of skill sets to take advantage of the latest innovations in AI/ML

What are three benefits of Gaussian Mixture Models?

Are found in open-source libraries, are easy to implement, and are faster and more stable than other solutions like gradient descent in converging to a minimum

How are Shapley values useful to ML?

By interpreting a model trained on a set of features as a value function on a coalition of players, Shapley values provide a natural way to compute which features contribute to a prediction

What are Gradient-Boosted Decision Trees?

Each iteration of a decision tree involves adjusting the values of the coefficients, weights, or biases applied to each of the input variables being used to predict the target value to minimize a loss function

What are the benefits of Random Forest models?

Easy to tune with intuitive hyperparameters, easy to view the relative importance of input features, generalizes well without overfitting for sufficient number of trees

What is the False Positive Rate?

FP / (FP + TN)

What is the x-axis in the receiver operating characteristic curve?

FPR

When should F1 score be used?

For (binary) classification when data is unbalanced

What is the False Positive Rate in words?

How many actual negatives did the model get wrong?

What is LIME?

Local interpretable model-agnostic explanations is a technique that approximates any black box machine learning model with a local, interpretable model to explain each individual prediction

How can overfitting be handled?

More data including through augmentation, cross-validation, select fewer better features, regularization

How do you handle underfitting?

More training data, more model parameters, more model complexity, train longer, decrease regularization

What correlation coefficient measures the linear relationship between two variables?

Pearson's

How can overfitting be handled in Deep Learning models?

Reduce layers or nodes in hidden layers, apply regularization, dropout layers, early stopping

What correlation coefficient measures the non-linear relationship between two variables?

Spearman's

What is recall?

TP / (TP + FN)

What is precision?

TP / (TP + FP)

What is the y-axis in the receiver operating characteristic curve?

TPR

What if the F1 score formula?

The harmonic mean of precision and recall

What is Mean Absolute Percent Error (MAPE)?

The mean of absolute relative errors (expressed as percentage)

What is the Mean Absolute Error?

The sum of the absolute value of the difference between ground truth and predicted values over the sample size

What is Deep Learning?

The use of multi-layered neural networks to transform input data into successively higher-level values to produce results similar to human experts

True or False: Clustering algorithms are a type of classification technique

True

What is information leakage?

When information that should not be in the training data inflates the model's ability to learn, causing poor performance in production

What are the 10 fundamental capabilities of an enterprise AI Platform?

data aggregation, multi-cloud, edge, data virtualization, enterprise semantic model, microservices, data governance, system simulation, open platform, cross-collaboration

What are two phrases to describe an overfit model?

high variance, low bias

What are three loss functions used in classification?

hinge loss, cross-entropy loss, and KL (Kullback-Leibler) divergence loss

What are two phrases to describe an underfit model?

low variance, high bias

What are three loss functions used in regression?

mean square error loss (MSE), mean absolute error loss (MAE), and quantile loss

See all study sets

C3.ai Glossary Terms

Related study sets

Chapter 5 Federal Gov Questions

Chapter 1 HW Questions

Contemporary Social Probs Ch. 4

Chapter 5 Preventing Violence and Injury

cardiovascular (heart) semifinals exam

Chapter 36: Nutrition NCLEX

Parts of a Dictionary Word Entry

BUSA 1105 chapter 2

M15: Adolescence

IA2_Chapter 1

Environmental Policy and Human Population Lab

Modern Database Management - Self Check 06

quiz 1 Mid, Quiz 2, Quiz 3 Mid, quiz 4 mid, quiz 5 Mid, quiz 6 Mid

Business Law Chapter 5

Exam 19-21

AGEC 103 - Spring 2022 - Final Exam Practice

Strategic Management Exam 2

NUR424 Chapter 66 Prep-U

Other Punctuation

CH. 11 psychology