CPSC 483 Machine Learning Final Exam Terms
AdaBoost
An example of a sequential ensemble model.
A prediction system
Decision Theory involving choices, possibilities, and outcomes
Ensemble learning
This helps improve machine learning results by combining several methods and where its ensemble of classifiers may or may not be more accurate than that of any of its individual models.
Entropy
This is a measure of uncertainty of a random variable; it characterizes the impurity of an arbitrary collection of examples.
Decision tree
This is not an example of an ensemble method. It is also the thing that branches out based on whether each node is true or false.
common classes of machine learning
clustering, regression, classification
The most widely used metrics and tools to assess a classification model.
confusion matrix, cost sensitive accuracy, area under the ROC curve.
The benefits of data preprocessing
cost reduction, time reductions, smarter business decisions
Offline Learning and Batch Learning
data is trained on only a single batch with these.
Logistic Regression
not a variant of Naive Bayes
KNN algorithm
the thing that does more computation on test time rather than train time.
Root mean squared error
the thing that gives the difference between the predict value and true value
The difference between bagging and boosting
Bagging can be parallelized whereas boosting cannot be parallelized.
The advantages of Classification and Regression trees
Decision trees implicitly perform variable screening or feature selection, can handle both numerical and categorical data, and can handle multi-output problems.
Bayes Theorem
P(A|B) = P(A)/P(B|A)
Supervised learning
The learning that includes training data which also has the desired output.
Principal components
The new features created in PCA are known as this.
redundant, irrelevant
Feature selection tries to eliminate these features
K means clustering performing poorly
Outliers and number of dimensions increases are both causes of this. It is solved using dimensionality reduction.
Differences between clustering and classification
The number of classes are known vs unknown, it is based on a training set vs no prior knowledge, supervised vs unsupervised.
The number of classes in the classification problem using the One vs all method.
The number of times that you need to train the SVM model
Dimensionality reduction
one of the possible ways to reduce the computation time required to build a model.
Deep learning
A subset of machine learning
Machine learning
A type of artificial intelligence that leverages massive amounts of data so that computers can improve the accuracy of actions and predictions on their own without additional programming.
Methods for handling missing or corrupted data in a dataset
Drop missing rows or columns; assign a unique category to missing values; replace missing values with mean/median/mode.
Leaf node
In random forests/decision trees, the target attribute is shown with this
Root nodes
Information gain is biased towards choosing attributes with a large number of values as this.
Training data
The thing on which machine learning algorithms build a model based on sample data.
The maximum number of iterations
The thing that can be used as a stopping criteria for k-means clustering
PCA
The thing that can be used for projecting and visualizing data in lower dimensions.
Naive Bayes
The thing that is a supervised learning algorithm used for classification assumes that the features are conditionally independent given the class. This also belongs in the probabilistic model. This is also based on Bayes theorem.
Independent trees
The thing that makes multiple trees the most effective.
All vs All
The thing that requires more computing and memory than one vs all.
Training error
The thing that usually increases while the amount of training data increases and the generalization error decreases.
Pearson Correlation, Spearman Correlation
The things that are appropriate for feature selection.
Stopping rules for step 2 of classification
The tree is stopped when each group has the same number of presences and absences; the tree is stopped when all groups are relatively homogeneous; the tree is stopped when a predefined maximum number of splits is reached.
Random Forest
This has multiple decision trees as base learning models.
The difference between k means clustering and k nearest neighbors
k nearest neighbors makes predictions for new data based on existing data whereas k means clustering looks for patterns within the data.
sigmoid function
linear regression assumes that the data follows a linear function, Logistic regression models the data using this.
examples of indirect feedback from users
page views, clicks, purchase records