INF264 EKSAMEN
Loss Function (Cost Function)
A function that measures the difference between the predicted values and the actual values, used to guide the training of a model.
Radial Basis Function
A function used in various types of neural networks and machine learning algorithms, particularly in kernel methods like SVMs, for classifying data that is not linearly separable.
Sigmoid Function
A mathematical function having an "S" shaped curve, often used as the activation function in logistic regression and neural networks to introduce nonlinearity.
Activation Function
A mathematical function in neural networks that determines the output of a node, given an input or set of inputs.
F1 Score
A measure of a model's accuracy in classification, calculated as the harmonic mean of precision and recall. Better than accuracy in imbalanced data set because it does not give good scores if a model predicts only majority class. We are often more interested in minority class.
Information Gain
A measure of the effectiveness of an attribute in classifying a dataset, used in decision tree algorithms.
Bootstrap Sample
A sample drawn with replacement from a data set, used for estimating properties like the mean or variance of the population.
Test Data
A subset of a dataset not used in training the model, but rather used to assess its performance and generalization ability.
Validation Data
A subset of a dataset used to provide an unbiased evaluation of a model fit during the training phase, used to fine-tune model parameters.
Deep Learning
A subset of machine learning involving algorithms inspired by the structure and function of the brain called artificial neural networks, particularly effective in identifying patterns in unstructured data like images and sound.
Support Vector Machine (SVM)
A supervised learning model used for classification and regression analysis, which finds the best margin (or gap) that separates classes in the data.
Cross-Validation
A technique for assessing how a predictive model will perform in practice, by partitioning the original dataset into a training set to train the model and a test set to evaluate it.
Undersampling
A technique used to adjust the class distribution of a dataset in which the majority class is under-represented to counteract imbalance.
Oversampling
A technique used to adjust the class distribution of a dataset in which the minority class is over-represented to counteract imbalance.
Rectified Linear Unit (ReLU)
A type of activation function commonly used in neural networks, defined as the positive part of its argument.
Recall (Machine Learning)
Also known as sensitivity, a metric for classification models that measures the proportion of actual positives that are correctly identified as such. Recall is related to type 2 error
Random Forest
An ensemble learning method for classification and regression that constructs multiple decision trees at training time and outputs the mode of the classes or mean prediction of the individual trees. At each node in the decision tree, choose k features at random and choose the best split among them (typically k << d) d is total number of features??
Boosting
An ensemble technique that combines weak learners to create a strong learner by focusing on examples that previous learners misclassified. Train a sequence of learners so that subsequent models compensate the errors made by the earlier ones Can reduce bias and reduces variance
Missing Data (MCAR, MAR, MNAR)
Categories of missing data: MCAR (Missing Completely At Random), where missingness is independent of any factors; MAR (Missing At Random), where missingness is related to observed data; MNAR (Missing Not At Random), where missingness is related to unobserved data.
Types of Clustering Algorithms
Centroid based - trough centers Density based - trough density areas Connectivity based - trough connectivity Gaussian mix models - trough probability dist
Basis Function
Functions that are combined in linear superposition to represent other functions in terms of a coordinate system or framework. allow modeling non linearity in the data while keeping linearity in parameters
Frequenist view
Objective, handles aleatory uncertanty. Frequencies from repetitions of experiments P(A) is the proportion of times that A occurs to be true.
Ensemble Methods
Techniques that combine several machine learning models to produce better predictive performance than any single model alone.
Training Data
The dataset used to train a machine learning model, allowing the model to learn the relationships between inputs and outputs.
Occam's Razor
The model with the simplest explanation is usually the one that is most likely to be correct. Introduces inductive bias If two models perform EQUALY WELL always choose the simpler one
Model Evaluation
The process of assessing the performance of a predictive model by comparing its predictions against actual outcomes.
Dimensionality Reduction
The process of reducing the number of random variables under consideration by obtaining a set of principal variables, often to improve model performance or data visualization.
Pruning
The process of removing parts of a decision tree or neural network to reduce complexity and avoid overfitting. Replace subtree with majority label and check if accuracy gets worse or better. If better or same accuracy then prune
Imputation
The process of replacing missing data with substituted values in a dataset.
Feature Selection
The process of selecting a subset of relevant features for use in model construction to improve model accuracy or to reduce overfitting.
Back-Propagation
the method calculates the gradient of the error function with respect to the neural network's weights.
Bagging
Short for Bootstrap Aggregating, a method that improves the stability and accuracy of machine learning algorithms by combining multiple models. Does not change bias, reduces variance Example random forrest
Bayesian View
Subjective, degree of belief Handles both aleatory and epistemic uncertainty I believe to the extent of P(A) that A is true
Inductive Bias
The set of assumptions a learning algorithm uses to make predictions beyond the observed data.
Neural Network
A computational model inspired by the structure of the brain, consisting of layers of interconnected nodes (neurons) that can learn to recognize patterns and make decisions.
Overfitting
A modeling error that occurs when a function is too closely aligned to a limited set of data points, leading to poor predictive performance on new data. When train acc is much higher than val acc Solved by choose simpler model Add regularization like Max depth in d tree or higher K in knn
Underfitting
A modeling error that occurs when a machine learning model is too simple to capture the underlying pattern in the data, leading to poor performance. if both train acc and val acc are small Solved by: transforming data or complexing model.
Decision Tree
A model that uses a tree-like graph of decisions and their possible consequences, widely used in classification and regression tasks.
Curse of Dimensionality
A phenomenon where the performance of an algorithm degrades as the number of features or dimensions in the dataset increases.
Regression
A type of predictive modeling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictors).
Classification
A type of supervised learning where the output is a category, such as 'spam' or 'not spam' in email filtering.
Adding more basis functions in a linear model...
Decreases bias
Posterior distribution
Distribution of parametres after seeing the data
Generalization
The ability of a machine learning model to perform well on new, unseen data.
Regularization regression
To reduce overfitting, one may want to penalize complexity Any modification in the objective function (or more generally learning algorithm) that intends to reduce generalization error but not training error
Naive Bayes
A family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between features.
Kernel
A function used in kernel methods to enable them in high-dimensional space, commonly used in support vector machines.
Bias-Variance Tradeoff
A fundamental problem in supervised learning where decreasing the bias increases the variance, and vice versa.
Mean Absolute Error (MAE)
A measure of errors between paired observations expressing the same phenomenon, calculated as the average of the absolute differences between predicted values and observed values. However, if being "off" by 20 is twice as bad as being off by 10 then it's better to use the MAE.
Hierarchical Clustering
A method of cluster analysis which seeks to build a hierarchy of clusters.
Precision machine learning
A metric for classification models, calculated as the number of true positive predictions divided by the total number of positive predictions. Precision is related to type 1 error
K-Means
A popular clustering algorithm that partitions a dataset into K clusters, each represented by the mean of its points.
K-Nearest Neighbors (k-NN)
A simple, instance-based learning algorithm that classifies new cases based on a similarity measure (e.g., distance functions).
Imbalanced Data
A situation in datasets where some classes are significantly more frequent than others, which can lead to poor performance in predictive modeling.
Root Mean Squared Error (RMSE)
A standard way to measure the error of a model in predicting quantitative data, calculated as the square root of the average squared differences between predicted and actual values. If you would like to give more weights to observations that are further from the mean
Linear Regression
A statistical method for modeling the relationship between a dependent variable and one or more independent variables. y = ax+b
Logistic Regression
A statistical model that in its basic form uses a logistic function to model a binary dependent variable, often used for classification tasks.
Principal Component Analysis (PCA)
A statistical technique used to reduce the dimensionality of a dataset, by transforming to a new set of variables (principal components) that are linear combinations of the original variables. PRESERVES VARIANCE
Confusion Matrix
A table used to evaluate the performance of a classification algorithm, showing the number of correct and incorrect predictions made by the model.
No Free Lunch Theorem
A theorem in machine learning stating that no one algorithm works best for every problem, emphasizing the importance of choosing the right algorithm for each specific problem.
Unsupervised Learning
A type of machine learning that involves training a model on data with no pre-existing labels and allowing the model to act on that information without guidance.
Supervised Learning
A type of machine learning where the model is trained on a labeled dataset, learning to predict outputs from inputs.
Autoencoder
A type of neural network used to learn efficient codings of unlabeled data, typically for dimensionality reduction.
Clustering
A type of unsupervised learning grouping similar data points together based on certain characteristics or features. K-means, DBSCAN, Single linkage, Guassian mixture models
Lloyd's Algorithm
An iterative heuristic procedure used to partition a space into a number of regions, commonly used in the k-means clustering algorithm.
Gradient Descent
An optimization algorithm used to minimize some function by iteratively moving in the direction of the steepest descent, as defined by the negative of the gradient.
Stochastic Gradient Descent
An optimization method for training a model that updates parameters randomly selected subsets of data at each step, rather than the entire dataset.
Regularization
Does not reduce complexity, but punishes complex models leading us to find lower complex models that are as good. A technique in machine learning that imposes penalties on model parameters to prevent overfitting and improve model generalizability.
Bayesian prediction...
Gives a probability distribution over labels
Decision Boundary
In classification, the surface that separates different classes in the feature space, determined by the chosen model.
Entropy
In machine learning, a measure of the unpredictability or randomness in a dataset, often used in the context of decision trees and information theory.
Variance in data
In machine learning, the extent to which a model's predictions vary for a given data point, reflecting the model's sensitivity to fluctuations in the training dataset. Bias variance tradeoff. High variance low bias gives overfit
Learning Rate (Step Size)
In optimization algorithms like gradient descent, the size of the steps taken towards the minimum of the function.
Solutions to missing data
Remove datapoint (bad) ○Impute (inference) ■ Average of training set ■ Model it from complex relationship with different value ■ Can bias your model ○Use naive bayes that handles it well
A naive bayes classifier...
Selects the class with highest POSTerior probability
Fairness
The concept of ensuring that a machine learning model does not create or reinforce unfair biases against certain groups or individuals.
Model Selection
The process of selecting one final machine learning model from among a set of candidate models.
Feature Extraction
The process of transforming raw data into a set of features that can be effectively used in machine learning.
Accuracy
The proportion of correct predictions made by a model, calculated as the number of correct predictions divided by the total number of predictions.
Epistemic Uncertainty
Type of uncertainty that results from lack of knowledge. We are able to obtain observations which can reduce this uncertainty Two observers may have different uncertainty of this type
Aleatory uncertainty
Type of uncertanty due to randomness. We are not able to obtain observations which can reduce this uncertainty
Type 2 error
a false negative conclusion (recall) the test result says you don't have coronavirus, but you actually do.
Type 1 error
a false positive conclusion. (precission) the test result says you have coronavirus, but you actually don't.
Single Linkage
distance between clusters as the shortest distance between any two points in the clusters. Produces long chains a->b->c->d
Complete Linkage
distance between two clusters is defined as the maximum distance between any two points in the clusters. Forces "spherical" clusters with consitent "diameter"
Bayesias can model uncertatny due to ....
lack of knowledge
Bias
refers to the error introduced by approximating a real-world problem which may not fit the assumptions of the algorithm. xxx: Systematic error -Inductive bias (assumptions made by choice of algo) -Bias to seen data (Choices based on data) -Human bias (Choices we make based on feelings) -Sample/surviv/Measurement bias (When collecting data)
Regularization deep learning
used to fix reduce overfitting. Parameter norm regularization (penalty term in the objective function) Parameter sharing (Force sets of parameters to be equal, CNN) Early stopping (Stop when validation loss incrases) Dropout (For each minibatch, randomly ignore part of the input and hidden units) Adversarial training