ML MID 2

Ace your homework & exams now with Quizwiz!

Reward

... in a Markov Decision Process is a numerical value that represents the immediate benefit or cost of taking an action in a particular state. The goal of the agent is to maximize the cumulative ... over time. r(s, a) takes a state and an action and outputs a number giving the ... for that state/action pair.

Terminal State

... in a Markov Decision Process is a state in which the agent stops making decisions. It can be reached either because the agent has achieved its goal or because it has reached a point of no return.

Discount Factor

... in a Markov Decision Process is a value between 0 and 1 that represents the importance of future rewards relative to immediate rewards. It is used to ... the future rewards to account for the time value of money and uncertainty.

k-Nearest Neighbors (kNN)

... is a popular machine learning algorithm used for classification and regression problems. It works by finding the k nearest neighbors of a query point based on a distance metric and using the labels or values of those neighbors to predict the label or value of the query point.

Policy Iteration

... is an algorithm used in Markov Decision Processes to find the optimal policy. It alternates between policy evaluation, which computes the value function of a policy, and policy improvement, which updates the policy based on the computed value function.

Intrinsic Dimension:

... is the minimum number of parameters needed to represent a dataset in a lower-dimensional space without losing too much information kNN usually performs well if the intrinsic dimension of the data is small concept that is closely related to the curse of dimensionality and is often used in feature selection and dimensionality reduction techniques.

Action

...In a Markov Decision Process, an ... is a decision made by the agent that affects the state of the environment. It is selected based on the current state and the policy of the agent.

Transition Function

...in a Markov Decision Process is a function that specifies the probability of moving from one state to another when an action is taken. It takes the current state and the action as input and returns a probability distribution over the next states.

Principle Component

linear combination of the original variables in a dataset that captures the maximum amount of variation or information in the data. The first ... explains the largest amount of variation, the second ... explains the second largest amount of variation, and so on.

Markov Decision Process

mathematical framework used in decision-making problems where the outcomes are uncertain. It consists of a set of states, actions, rewards, transition probabilities, and a discount factor, and provides a way to model the interaction between an agent and an environment.

Gain function

mathematical function that measures the information gain resulting from a particular split in a decision tree algorithm. It is used to determine the best feature to split the data on at each node of the tree. Common ... include the information gain and the Gini index.

Boosting

technique that combines multiple weak learners to form a strong learner that can achieve high accuracy on a classification or regression problem. The algorithm works by iteratively training weak learners on weighted versions of the training data and adjusting the weights of the data points based on the errors of the previous iteration.

Principle Component Analysis

technique used in machine learning for dimensionality reduction. It works by transforming the original data into a new set of variables called principal components, which capture the most important patterns and relationships in the data.

Weighted ERM (empirical risk minimization)

technique used to optimize a model by minimizing a weighted sum of the errors on the training data. The weights are assigned based on the importance of each data point, and the algorithm learns to give more weight to the points that are more difficult to classify correctly.

Ellipsoid

three-dimensional geometric shape that is defined by three semiaxes. In PCA, the principal components can be visualized as an ..., where the length of each semiaxis represents the amount of variation captured by the corresponding principal component.

Time Horizon

... in a Markov Decision Process is the number of time steps for which the agent is making decisions. It is the duration of the decision-making process.

Value Iteration

... is an algorithm used in Markov Decision Processes to find the optimal value function. It starts with an initial estimate of the value function and iteratively updates it until it converges to the optimal value function.

Cost Complexity Pruning

technique used to optimize the size of a decision tree by balancing the trade-off between accuracy and complexity. It works by adding a penalty term to the objective function that minimizes the sum of squared errors, which encourages the algorithm to favor smaller trees.

Pruning

technique used to reduce the size of a decision tree by removing branches that are unlikely to improve its accuracy. It helps to prevent overfitting and improve the generalization ability of the model.

Initial State

the ... in a Markov Decision Process is the starting point of the environment. It is the state in which the agent begins its decision-making process.

Clustering

type of unsupervised machine learning algorithm that groups similar data points together into ... based on some similarity metric. The goal of ... is to discover the underlying structure of the data and identify groups or patterns that are not immediately apparent.

Policy

a ... in a Markov Decision Process is a mapping from states to actions. It specifies what action the agent should take in each state to maximize its expected cumulative reward.

Value Function

... in a Markov Decision Process is a function that assigns a value to each state or state-action pair. It represents the expected cumulative reward that the agent will receive if it starts in that state and follows a particular policy.

Consistent Classifier

A ... is a machine learning algorithm that converges to the true model as the sample size increases. Ensures that the predictions become more accurate with more data. A ... is ... if it converges to the true model with lim N to inf (k/N) = 0 <-- as the sample size approaches infinity

State

In a Markov Decision Process, a ... is a description of the current situation of the environment that the agent is in. It contains all the relevant information that the agent needs to make decisions.

Reinforcement Learning

Make a decision now, sometimes don't see any feedback until much later. Decisions made in the present can change what is possible in the future.

Voronoi cell

Voronoi cell of a point 𝑥j = set of points 𝑥 that are closer to 𝑥j than any other training point 𝑥i. region of space defined by a ... diagram. In the kNN algorithm, the ... represents the area around a training point where that point is the closest neighbor

Support Vector

a feature vector (or training example), that is linearly combined with other support vectors to produce the SVM decision boundary hyperplane.

Universal Kernel

a kernel function that is capable of approximating any continuous function with arbitrary accuracy. It is a desirable property for a kernel function to have as it means that the machine learning algorithm can accurately model complex relationships between variables. However, finding a universal kernel is a difficult problem, and no kernel function has been proven to be universal for all possible data distributions

Decision Stump

a simple decision tree model that consists of only a single decision node and two leaf nodes. It is often used as a weak learner in boosting algorithms, where multiple decision stumps are combined to form a more complex model.

Eigenvector

an ... is a non-zero vector that, when multiplied by a square matrix, results in a scalar multiple of the same vector. The scalar multiple is called the eigenvalue of the matrix, and the set of all eigenvalues of a matrix is called its spectrum. In PCA, the ... and eigenvalues of the covariance matrix are used to compute the principal components.

ID3 (Iterative Dichotomiser 3)

decision tree algorithm that is used for classification problems. The algorithm works by recursively splitting the data based on the feature that provides the maximum information gain until a stopping criterion is met. will always achieve zero training error if it possible to do so

Manhattan distance

distance metric used in machine learning to measure the distance between two points in a grid-like space. It is calculated as the sum of the absolute differences between the coordinates of the two points.

Kernel

function that transforms data into a higher-dimensional space, allowing it to be more easily separated and classified. It is commonly used in Support Vector Machines (SVMs) to map input data into a space where a linear classifier can be applied. The choice of ... function can significantly impact the performance of a machine learning algorithm

Voronoi diagram

geometric structure that partitions a space into regions based on the distance to a set of points. In machine learning, ... are used to represent the nearest neighbors of a query point in the kNN algorithm.

K-means++

improvement over the K-means algorithm that improves the initialization of the cluster centroids. Instead of choosing the initial centroids randomly, ... selects them based on a probability distribution that favors points that are far from existing centroids. This initialization method often leads to faster convergence and better results than the random initialization used in K-means.

Cluster (Cluster Centroid/Cluster Representative/Cluster Center)

in ..., a ... centroid, ... representative, or cluster ... is a data point that represents the center or average of a ... It is used to characterize the ... and to assign new data points to the ...

Vector similarity search

is a search technique used in machine learning to find similar data points in a high-dimensional space. It is commonly used in recommendation systems and search engines, where the goal is to find items or documents that are similar to a given query.

Gram Matrix

matrix of inner products between all pairs of data points in a dataset. containing the dot products of all pairs of vectors in a given dataset In other words, it is a matrix that contains information about the similarity between each pair of data points. It is often used in kernel methods such as Support Vector Machines (SVMs) to efficiently compute the inner products required for the kernel function

Covariance matrix

matrix that describes the ... or the relationship between two or more variables in a dataset. It is a square matrix where the diagonal elements are the variances of the individual variables, and the off-diagonal elements are the ... between the variables.

Gini index

measure of impurity used in decision tree algorithms. It measures the probability of incorrectly classifying a randomly chosen data point based on the distribution of the classes in a subset of the data. The goal of the decision tree algorithm is to minimize the ... at each node of the tree.

Edge

measure of the quality of the classifier produced by the weak learner. It is defined as the difference between the accuracy of the classifier and the accuracy of random guessing, and it is used to assign weights to the data points in the next iteration of the algorithm.

Information gain

measure of the reduction in entropy or uncertainty achieved by splitting the data based on a particular feature. It is often used in decision tree algorithms, where the goal is to maximize the ... at each node to produce a tree that accurately classifies or predicts the outcome variable

Weak learner

model or algorithm that performs slightly better than random guessing on a classification or regression problem. In machine learning, ... are often combined to form a strong learner, which can achieve high accuracy on the same problem.

K-means

popular clustering algorithm that partitions a dataset into k clusters, where k is a user-defined parameter. The algorithm works by iteratively assigning each data point to the nearest cluster centroid and updating the centroids based on the mean of the points in the cluster.

Radial Basis Function

popular kernel function used in machine learning. It is a type of kernel that measures the similarity between two data points as a function of the distance between them in a high-dimensional feature space. always fit the training data perfectly if (1) no examples appears twice in the training set, and (2) the regularization constant is 0. The ... kernel is often used in SVMs for classification and regression problems

Decision Tree

popular machine learning algorithm that uses a tree-like structure to model decisions and their possible consequences. It is often used in classification and regression problems and works by recursively partitioning the data into subsets based on the most significant features until a certain stopping criterion is met.

Curse of dimensionality

problem that arises in machine learning when the number of dimensions or features in a dataset is large. It leads to sparsity and makes it difficult for machine learning algorithms to find patterns and relationships in the data.

Split

process of dividing a set of data into two or more subsets based on a particular feature or criterion. In decision tree algorithms, the ... is the point at which the data is divided into two or more subsets, with the goal of maximizing the information gain or minimizing the impurity of the resulting subsets.

Dimensionality reduction

process of reducing the number of features or variables in a dataset while retaining most of the important information. It is a technique used in machine learning to improve model performance, reduce computation time, and visualize high-dimensional data.

Hypothesis class

set of possible prediction functions, that can be used to learn from a set of data. The goal of the machine learning algorithm is to find the best model or function from the ... that accurately predicts the outcome variable on new data.

Adaboost

specific boosting algorithm that uses decision stumps as weak learners. It works by iteratively training decision stumps on weighted versions of the training data and adjusting the weights of the data points based on the errors of the previous iteration.

Projection matrix

square matrix that maps a vector onto a subspace by ... it onto the basis vectors of the subspace. In PCA, the ... matrix is used to ... the original data onto the principal components.

Symmetric Positive Definition Matrix

square matrix where all the eigenvalues are positive, and the matrix is symmetric. In PCA, the covariance matrix is a ... , which means that it has real and positive eigenvalues


Related study sets

Chapter 3 - The Economic Problem

View Set

Chapter 2-1 - Structure of neurons

View Set

ISYS 310 Chapter 5 (Network and transport layers)

View Set

Microeconomics Assignments Chapters 1-3

View Set

Tesco Interview Preparation: Situation Questions

View Set