INF264 EKSAMEN

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Loss Function (Cost Function)

A function that measures the difference between the predicted values and the actual values, used to guide the training of a model.

Radial Basis Function

A function used in various types of neural networks and machine learning algorithms, particularly in kernel methods like SVMs, for classifying data that is not linearly separable.

Sigmoid Function

A mathematical function having an "S" shaped curve, often used as the activation function in logistic regression and neural networks to introduce nonlinearity.

Activation Function

A mathematical function in neural networks that determines the output of a node, given an input or set of inputs.

F1 Score

A measure of a model's accuracy in classification, calculated as the harmonic mean of precision and recall. Better than accuracy in imbalanced data set because it does not give good scores if a model predicts only majority class. We are often more interested in minority class.

Information Gain

A measure of the effectiveness of an attribute in classifying a dataset, used in decision tree algorithms.

Bootstrap Sample

A sample drawn with replacement from a data set, used for estimating properties like the mean or variance of the population.

Test Data

A subset of a dataset not used in training the model, but rather used to assess its performance and generalization ability.

Validation Data

A subset of a dataset used to provide an unbiased evaluation of a model fit during the training phase, used to fine-tune model parameters.

Deep Learning

A subset of machine learning involving algorithms inspired by the structure and function of the brain called artificial neural networks, particularly effective in identifying patterns in unstructured data like images and sound.

Support Vector Machine (SVM)

A supervised learning model used for classification and regression analysis, which finds the best margin (or gap) that separates classes in the data.

Cross-Validation

A technique for assessing how a predictive model will perform in practice, by partitioning the original dataset into a training set to train the model and a test set to evaluate it.

Undersampling

A technique used to adjust the class distribution of a dataset in which the majority class is under-represented to counteract imbalance.

Oversampling

A technique used to adjust the class distribution of a dataset in which the minority class is over-represented to counteract imbalance.

Rectified Linear Unit (ReLU)

A type of activation function commonly used in neural networks, defined as the positive part of its argument.

Recall (Machine Learning)

Also known as sensitivity, a metric for classification models that measures the proportion of actual positives that are correctly identified as such. Recall is related to type 2 error

Random Forest

An ensemble learning method for classification and regression that constructs multiple decision trees at training time and outputs the mode of the classes or mean prediction of the individual trees. At each node in the decision tree, choose k features at random and choose the best split among them (typically k << d) d is total number of features??

Boosting

An ensemble technique that combines weak learners to create a strong learner by focusing on examples that previous learners misclassified. Train a sequence of learners so that subsequent models compensate the errors made by the earlier ones Can reduce bias and reduces variance

Missing Data (MCAR, MAR, MNAR)

Categories of missing data: MCAR (Missing Completely At Random), where missingness is independent of any factors; MAR (Missing At Random), where missingness is related to observed data; MNAR (Missing Not At Random), where missingness is related to unobserved data.

Types of Clustering Algorithms

Centroid based - trough centers Density based - trough density areas Connectivity based - trough connectivity Gaussian mix models - trough probability dist

Basis Function

Functions that are combined in linear superposition to represent other functions in terms of a coordinate system or framework. allow modeling non linearity in the data while keeping linearity in parameters

Frequenist view

Objective, handles aleatory uncertanty. Frequencies from repetitions of experiments P(A) is the proportion of times that A occurs to be true.

Ensemble Methods

Techniques that combine several machine learning models to produce better predictive performance than any single model alone.

Training Data

The dataset used to train a machine learning model, allowing the model to learn the relationships between inputs and outputs.

Occam's Razor

The model with the simplest explanation is usually the one that is most likely to be correct. Introduces inductive bias If two models perform EQUALY WELL always choose the simpler one

Model Evaluation

The process of assessing the performance of a predictive model by comparing its predictions against actual outcomes.

Dimensionality Reduction

The process of reducing the number of random variables under consideration by obtaining a set of principal variables, often to improve model performance or data visualization.

Pruning

The process of removing parts of a decision tree or neural network to reduce complexity and avoid overfitting. Replace subtree with majority label and check if accuracy gets worse or better. If better or same accuracy then prune

Imputation

The process of replacing missing data with substituted values in a dataset.

Feature Selection

The process of selecting a subset of relevant features for use in model construction to improve model accuracy or to reduce overfitting.

Back-Propagation

the method calculates the gradient of the error function with respect to the neural network's weights.

Bagging

Short for Bootstrap Aggregating, a method that improves the stability and accuracy of machine learning algorithms by combining multiple models. Does not change bias, reduces variance Example random forrest

Bayesian View

Subjective, degree of belief Handles both aleatory and epistemic uncertainty I believe to the extent of P(A) that A is true

Inductive Bias

The set of assumptions a learning algorithm uses to make predictions beyond the observed data.

Neural Network

A computational model inspired by the structure of the brain, consisting of layers of interconnected nodes (neurons) that can learn to recognize patterns and make decisions.

Overfitting

A modeling error that occurs when a function is too closely aligned to a limited set of data points, leading to poor predictive performance on new data. When train acc is much higher than val acc Solved by choose simpler model Add regularization like Max depth in d tree or higher K in knn

Underfitting

A modeling error that occurs when a machine learning model is too simple to capture the underlying pattern in the data, leading to poor performance. if both train acc and val acc are small Solved by: transforming data or complexing model.

Decision Tree

A model that uses a tree-like graph of decisions and their possible consequences, widely used in classification and regression tasks.

Curse of Dimensionality

A phenomenon where the performance of an algorithm degrades as the number of features or dimensions in the dataset increases.

Regression

A type of predictive modeling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictors).

Classification

A type of supervised learning where the output is a category, such as 'spam' or 'not spam' in email filtering.

Adding more basis functions in a linear model...

Decreases bias

Posterior distribution

Distribution of parametres after seeing the data

Generalization

The ability of a machine learning model to perform well on new, unseen data.

Regularization regression

To reduce overfitting, one may want to penalize complexity Any modification in the objective function (or more generally learning algorithm) that intends to reduce generalization error but not training error

Naive Bayes

A family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between features.

Kernel

A function used in kernel methods to enable them in high-dimensional space, commonly used in support vector machines.

Bias-Variance Tradeoff

A fundamental problem in supervised learning where decreasing the bias increases the variance, and vice versa.

Mean Absolute Error (MAE)

A measure of errors between paired observations expressing the same phenomenon, calculated as the average of the absolute differences between predicted values and observed values. However, if being "off" by 20 is twice as bad as being off by 10 then it's better to use the MAE.

Hierarchical Clustering

A method of cluster analysis which seeks to build a hierarchy of clusters.

Precision machine learning

A metric for classification models, calculated as the number of true positive predictions divided by the total number of positive predictions. Precision is related to type 1 error

K-Means

A popular clustering algorithm that partitions a dataset into K clusters, each represented by the mean of its points.

K-Nearest Neighbors (k-NN)

A simple, instance-based learning algorithm that classifies new cases based on a similarity measure (e.g., distance functions).

Imbalanced Data

A situation in datasets where some classes are significantly more frequent than others, which can lead to poor performance in predictive modeling.

Root Mean Squared Error (RMSE)

A standard way to measure the error of a model in predicting quantitative data, calculated as the square root of the average squared differences between predicted and actual values. If you would like to give more weights to observations that are further from the mean

Linear Regression

A statistical method for modeling the relationship between a dependent variable and one or more independent variables. y = ax+b

Logistic Regression

A statistical model that in its basic form uses a logistic function to model a binary dependent variable, often used for classification tasks.

Principal Component Analysis (PCA)

A statistical technique used to reduce the dimensionality of a dataset, by transforming to a new set of variables (principal components) that are linear combinations of the original variables. PRESERVES VARIANCE

Confusion Matrix

A table used to evaluate the performance of a classification algorithm, showing the number of correct and incorrect predictions made by the model.

No Free Lunch Theorem

A theorem in machine learning stating that no one algorithm works best for every problem, emphasizing the importance of choosing the right algorithm for each specific problem.

Unsupervised Learning

A type of machine learning that involves training a model on data with no pre-existing labels and allowing the model to act on that information without guidance.

Supervised Learning

A type of machine learning where the model is trained on a labeled dataset, learning to predict outputs from inputs.

Autoencoder

A type of neural network used to learn efficient codings of unlabeled data, typically for dimensionality reduction.

Clustering

A type of unsupervised learning grouping similar data points together based on certain characteristics or features. K-means, DBSCAN, Single linkage, Guassian mixture models

Lloyd's Algorithm

An iterative heuristic procedure used to partition a space into a number of regions, commonly used in the k-means clustering algorithm.

Gradient Descent

An optimization algorithm used to minimize some function by iteratively moving in the direction of the steepest descent, as defined by the negative of the gradient.

Stochastic Gradient Descent

An optimization method for training a model that updates parameters randomly selected subsets of data at each step, rather than the entire dataset.

Regularization

Does not reduce complexity, but punishes complex models leading us to find lower complex models that are as good. A technique in machine learning that imposes penalties on model parameters to prevent overfitting and improve model generalizability.

Bayesian prediction...

Gives a probability distribution over labels

Decision Boundary

In classification, the surface that separates different classes in the feature space, determined by the chosen model.

Entropy

In machine learning, a measure of the unpredictability or randomness in a dataset, often used in the context of decision trees and information theory.

Variance in data

In machine learning, the extent to which a model's predictions vary for a given data point, reflecting the model's sensitivity to fluctuations in the training dataset. Bias variance tradeoff. High variance low bias gives overfit

Learning Rate (Step Size)

In optimization algorithms like gradient descent, the size of the steps taken towards the minimum of the function.

Solutions to missing data

Remove datapoint (bad) ○Impute (inference) ■ Average of training set ■ Model it from complex relationship with different value ■ Can bias your model ○Use naive bayes that handles it well

A naive bayes classifier...

Selects the class with highest POSTerior probability

Fairness

The concept of ensuring that a machine learning model does not create or reinforce unfair biases against certain groups or individuals.

Model Selection

The process of selecting one final machine learning model from among a set of candidate models.

Feature Extraction

The process of transforming raw data into a set of features that can be effectively used in machine learning.

Accuracy

The proportion of correct predictions made by a model, calculated as the number of correct predictions divided by the total number of predictions.

Epistemic Uncertainty

Type of uncertainty that results from lack of knowledge. We are able to obtain observations which can reduce this uncertainty Two observers may have different uncertainty of this type

Aleatory uncertainty

Type of uncertanty due to randomness. We are not able to obtain observations which can reduce this uncertainty

Type 2 error

a false negative conclusion (recall) the test result says you don't have coronavirus, but you actually do.

Type 1 error

a false positive conclusion. (precission) the test result says you have coronavirus, but you actually don't.

Single Linkage

distance between clusters as the shortest distance between any two points in the clusters. Produces long chains a->b->c->d

Complete Linkage

distance between two clusters is defined as the maximum distance between any two points in the clusters. Forces "spherical" clusters with consitent "diameter"

Bayesias can model uncertatny due to ....

lack of knowledge

Bias

refers to the error introduced by approximating a real-world problem which may not fit the assumptions of the algorithm. xxx: Systematic error -Inductive bias (assumptions made by choice of algo) -Bias to seen data (Choices based on data) -Human bias (Choices we make based on feelings) -Sample/surviv/Measurement bias (When collecting data)

Regularization deep learning

used to fix reduce overfitting. Parameter norm regularization (penalty term in the objective function) Parameter sharing (Force sets of parameters to be equal, CNN) Early stopping (Stop when validation loss incrases) Dropout (For each minibatch, randomly ignore part of the input and hidden units) Adversarial training


Ensembles d'études connexes

ESC5 Lesson 6 Text Two - 狄晓晴

View Set

Maintaining a Healthy Body Composition and Body Image Assignment

View Set

Vistas 2, Chapter 9, ea-¿Qué? and ¿cuál? 1. Activity #98 Textbook, ¿Qué? and ¿cuál? Estructura 9.3

View Set

Myers Psychology 8e - Chapter 01

View Set

Chapter 1.2: How are our Ecological Footprints Affecting the Earth?

View Set

Toddlers/Preschoolers-Pediatric Nursing

View Set

Chapter 16: The Strategy of Persuasion

View Set