BMEN35 All lectures
- Plot three different cost functions for binary classification?
Alla ser likadana ut typ, rankade i ökande lutning: - Logistic loss - Hinge loss (lite ovanför den förra) - Squared hinge loss (brant lutning
- Explain the concept of Bagging?
Bagging is a machine learning ensemble method that involves training multiple models on different subsets of the training data and averaging their predictions. The goal of bagging is to reduce the variance of the final model by aggregating the predictions of multiple models, which can be more robust than a single model.
- What is said about deidentification in GDPR?
GDPR defines de-identification as the process of rendering personal data into a form that is not directly or indirectly identifiable, such that the data cannot be linked to a specific individual without the use of additional information. This may involve the removal or masking of directly identifying information, as well as the use of techniques such as aggregation or perturbation to obscure the data.
- What is said about processing of personal data in GDPR?
GDPR requires that personal data may only be processed if there is a legal basis for the processing, such as the explicit consent of the data subject or the need to fulfill a contract. - Also storage of personal data for a short period of time, such as in a search engine's "intermediate" memory or "cache memory" constitutes a processing of personal data in the meaning of the GDPR. This broad definition makes it almost impossible to handle personal data in practice without processing it in the sense of the GDPR.
- Which are the requirements when using personal data in an open network according to GDPR?
GDPR requires that personal data that is shared in an open network, such as on the internet, must be adequately secured and that appropriate measures are in place to protect the data from unauthorized access or misuse. This may include measures such as encryption, access controls, and pseudonymization.
- What level of risk is acceptable?
In order to achieve effective anonymisation, the risk of identifying an individual needs to be reduced to a level that ensures GDPR's requirements for anonymised personal data.
- What does GDPR say about data sharing from a hospital to company that only performs services to the hospital?
In this case, GDPR does not classify the supplier as a third party but as part of the healthcare provider
- Explain Linear regression?
Linear regression is a statistical method used to model the linear relationship between a dependent variable and one or more independent variables. It is used to make predictions about the response variable based on the values of the predictor variables.
- Explain Logistic regression?
Logistic regression is a statistical method used to model the probability of a binary outcome, such as success or failure, based on one or more predictor variables. It is used to predict the probability of an event occurring, such as whether a customer will churn or not. In logistic regression, the function that is learned is called the logistic function and has the following form: p = 1 / (1 + e^(-y))
- Is there a model involved in KNN?
Tror det? Man väljer egen hyperparameter men ingen träning behövs förens en prediction görs.
- What is meant by profiling in GDPR? Are there exceptions?
Profiling in GDPR refers to the automated processing of personal data to evaluate certain aspects of an individual's characteristics, such as their performance at work or their creditworthiness. There are exceptions to the prohibition on profiling in GDPR, such as when it is necessary for the performance of a contract or when it is carried out in the course of legitimate activities with appropriate safeguards.
- Explain Pseudo anonymisation?
Pseudo-anonymization is a technique that removes or masks directly identifying information from personal data, while retaining some indirectly identifying information. It is used to reduce the risk of re-identification, but is not a complete anonymization solution.
- Plot three different cost functions for regression?
Squared Error (SE): andragrad 'U' Absolute Error (AE): absolutkurva 'V' Huber Loss: (Plattare andragrad). This cost function is a combination of SE and AE. It is defined as SE for small errors and AE for large errors. It is less sensitive to outliers than SE and is more robust to large errors. However, it is more difficult to optimize than SE or AE.
- What does Stochastic gradient descent mean? What are Epochs and Batches?
Stochastic gradient descent (SGD) is an optimization algorithm that uses small batches of data to compute the gradient of the cost function and update the model parameters. It runs for a specified number of epochs, cycling through the training data and making updates until convergence or a minimum is reached. SGD is efficient for large-scale machine learning tasks, but can be sensitive to the learning rate.
- Why was the LSTM developed from the RNN?
The Long Short-Term Memory (LSTM) network was developed to address the problem of vanishing gradients in traditional RNNs. It does this by using special units called memory cells, which can store information for long periods of time, as well as gates that control the flow of information into and out of the cells.
- What is the main principle behind the k-NN method?
The main principle behind the k-NN (k-nearest neighbors) method is that the output value for a given data point is determined based on the output values of the k closest data points in the training set.
- Explain the risks of distinctiveness, linkability and inference in anonymisation described in GDPR?
The risks of distinctiveness, linkability, and inference in anonymization are related to the potential for personal data to be re-identified or linked to other data sources, even if it has been anonymized. - Distinctiveness refers to the risk that the data is unique or characteristic enough to be linked to a specific individual. - Linkability refers to the risk that the data can be linked to other data sources to re-identify the individual. - Inference refers to the risk that the data can be used to make inferences about the individual, even if the data itself does not directly identify the individual.
- What happens to variance when using bagging?
When using bagging, the variance of the final model is typically reduced compared to a single model. This is because bagging averages the predictions of multiple models, which can help reduce the impact of outliers and reduce overfitting.
- What is zero-padding in the context of CNN?
Zero-padding in the context of CNNs refers to the addition of extra rows and columns of zeros around the edges of the input data. This can be used to preserve the spatial dimensions of the data when applying convolutional filters, which can be useful for maintaining the spatial relationships between features in the data.
- What is the difference between validation and test data?
- Validation data is used to tune the hyperparameters of a model and evaluate its performance during the training process. - Test data is used to evaluate the final performance of the model after it has been trained and any necessary hyperparameter tuning has been completed.
- Illustrate Binary cross-entropy loss and Logistic loss in a figure?
Binary cross-entropy loss is a loss function used for binary classification. It measures the difference between the predicted probability of the positive class and the true label. Logistic loss is another loss function used for binary classification. It measures the difference between the predicted log-odds of the positive class and the true label. Both loss functions increase as the predicted probability moves away from the true label. However, the binary cross-entropy loss function has a steeper increase near 0 and 1, while the logistic loss function has a more gradual increase.
- What is included in biometric data in GDPR?
Biometric data in GDPR refers to data that relates to the physical, physiological, or behavioral characteristics of an individual, and that can be used to identify that individual. This includes data such as fingerprints, facial images, and voice prints.
- What are convolutional layers with stride, and pooling layers?
Convolutional layers with stride refer to the use of a stride value greater than 1 when applying convolutional filters to the input data. This results in a downsampling of the data and can be used to reduce the size of the output and increase the computational efficiency of the network. Pooling layers are used to further reduce the size of the data by applying a down-sampling operation, such as max pooling or average pooling, to the output of the convolutional layers.
- How is Gradient boosting different that Adaptive boosting?
Gradient boosting is a type of boosting algorithm that works by fitting weak learners (e.g., shallow decision trees) to the residuals of the previous model and combining them using gradient descent. This is different from AdaBoost, which fits weak learners to the original training data and combines them using a linear combination.
- Write an expression for a linear model?
The expression for a linear model is y = β0 + β1 * x1 + β2 * x2 + ... + βn * xn, where y is the dependent variable, x1, x2, ..., xn are the independent variables, and β0, β1, β2, ..., bn are the coefficients that represent the weights of the corresponding variables in the model.
- What is the problem with specificity and unbalanced data sets?
The problem with specificity and unbalanced data sets is that specificity measures the ability of the model to identify negative examples, and unbalanced data sets may have a large class imbalance, with a disproportionate number of examples belonging to one class compared to the other. This can make it difficult for the model to accurately identify the minority class, leading to poor specificity.
- What is included in "Vårdinformation"?
"Vårdinformation" refers to the data and information that is used to support the delivery of healthcare in Sweden. It includes clinical data such as patient records, test results, and diagnoses, as well as administrative data such as financial and resource utilization data.
- Explain "Självkostnadskalkyl"
- A cost calculation has as its starting point allocating all costs to a calculation object, for example a patient. - A cost estimate consists of direct costs in terms of labor (salary) and materials as well as indirect costs (which are often significant in healthcare). - Direct costs can be easily attributed to a patient, while indirect costs are collected in cost units and distributed to the calculation objects/patients via allocation keys. - Costs can also be classified as variable and fixed costs. A variable cost varies with production volume (number of patients or care contacts), while a fixed cost is unaffected by production volume.
- Describe the potential when clinical information is combined with administrative information.
- An example where KPV has been used in combination with clinical data is within the emergency care organization. - A KPV study was carried out there, which led to more doctors being available for assessments at the emergency admission. - This meant that each doctor was given more time to make correct assessments, which also led to fewer patients having to be admitted to the hospital and thereby increasing care consumption. - This is a clear example where Region Halland was able to "invest" (more doctors available who cost more) to save, which led to better decisions and more efficient care consumption later in the care chain.
- Give four examples of resources in a healthcare organisation?
- Clinical resources, such as meetings with doctors and nurses, assistant nurses. - Clinical resources in the form of materials and equipment. - Administrative resources in demand, such as administrative staff and patient record systems. - Buildings and other overhead are consumed during a care cycle. - Preparedness that exists in healthcare to deal with variability. Variability exists both in the form of volume capacity and skill requirements. Therefore, for example, it is not possible to achieve full resource efficiency and flow efficiency at the same time.
- What is a convex and a non-convex cost function?
- Convex cost function: A function that has a single global minimum and is always curved downward -Non-convex function: A function is a function that has multiple local minima and is not always curved downward Convex cost functions are preferred in machine learning because they can be optimized more efficiently using gradient descent and other optimization algorithms.
- What is overfitting/underfitting and what can you do about it (=what can be the problem)?
- Overfitting occurs when a model is too complex and is able to fit the training data extremely well, but performs poorly on new, unseen data. This happens because the model has learned the noise in the training data and is unable to generalize to new situations. To address overfitting, you can try reducing the complexity of the model, using regularization techniques, or adding more training data. - Underfitting occurs when a model is too simple and is unable to capture the underlying patterns in the data, leading to poor performance on both the training data and new, unseen data. To address underfitting, you can try increasing the complexity of the model, adding more input features, or using different algorithms.
- What is regression/classification? Give examples of such problems.
- Regression is a type of supervised learning in which a model is trained to predict a continuous output value based on one or more input features. For example, a model might be trained to predict the price of a house based on its size, number of bedrooms, and location. - Classification is another type of supervised learning in which a model is trained to predict a categorical output value based on one or more input features. For example, a model might be trained to predict whether an email is spam or not spam based on its content, sender, and subject line.
- What is the difference between Ridge (regularised) regression and support vector regression?
- Ridge regression, also known as L2 regularization, is a type of linear regression that adds a regularization term to the objective function in order to prevent overfitting. The regularization term is the sum of the squares of the coefficients, multiplied by a regularization parameter lambda. - Support vector regression is a type of support vector machine (SVM) that is used for regression tasks. It seeks to find the hyperplane in the feature space that maximizes the margin between the input data and the predicted values. (Föreläsning) Kernel ridge regression makes use of squared error loss, whereas support vector regression uses the 𝜖-insensitive loss. The 𝜖-insensitive loss is particularly interesting in the dual formulation, since it gives sparse 𝜶.
- What is learning rate and momentum in this context? Explain AdaGrad, RMSProp and ADAM
- The learning rate determines the size of the update to the model parameters at each iteration of the optimization algorithm. - Momentum is a technique that accelerates convergence by incorporating the update from the previous iteration. AdaGrad, RMSProp, and ADAM are all variants of gradient descent that incorporate additional techniques to improve the optimization process. AdaGrad uses a per-parameter learning rate that is adapted based on the historical gradient information, while RMSProp uses a moving average of the gradients to scale the learning rate. ADAM combines elements of both AdaGrad and RMSProp, and also includes momentum to help the optimization process avoid getting stuck in local minima.
- What are the links between guidelines, care quality, and financial outcome in general?
- Units (within healthcare operations) that run high quality operations generally have better financial outcomes. - Care must, as far as possible, be given as close to the patient's normal life situation as possible and allow a life that is affected to the smallest possible extent by the disease in question. - Care should be conducted "benefit-intensively". If the same benefit can be achieved without hospitalization, it should be preferred. - To ensure consistent and high quality, guidelines (based on knowledge and proven experience) are used for handling patients. - To follow guidelines is to conduct operations with high quality and thus reduced costs.
- What is meant by each of Descriptive, Diagnostic, Predictive, and Prescriptive analytics?
- What happened? Descriptive analytics refers to the process of summarizing and describing data and patterns in the data. It involves the use of statistical techniques and visualizations to understand the characteristics of the data and identify trends and patterns. - Why did it happen? Diagnostic analytics involves the use of data and analysis to identify the root cause of a problem or issue. - What is going to happen? Predictive analytics involves the use of data and statistical models to make predictions about future outcomes, such as the likelihood of a patient developing a particular condition or the likelihood of a financial transaction being fraudulent. - What should we do? Prescriptive analytics involves the use of data and analysis to recommend actions or decisions that can optimize a particular outcome, such as the most cost-effective treatment plan for a patient.
- The error for new data consists of three parts: Bias, Variance and Irreducible error. Explain these?
-Bias: Systematic error, Model does not match data -Variance: Variation between models by repeated training -Irreducible error: Randomness in data cannot be predicted
- Explain the crafting process from different perspectives? Data to use? Over-/underfitting? Generalisation gap?
1. Start simple 2. Test vs baseline and intentional overfit 3. Compare difference between training error and validation error (=generalization gap) Train different ML approaches - Try different ML models, features and hyperparameters - Compare using validation data and evaluation metric - Baseline performance
- Give example of lower expected level of performance?
A baseline is a very simple model that serves as a lower expected level of the performance: - A baseline can for example be to randomly pick an output value from the training data and use that as the prediction.
- Name and explain three parts that may be included in a data platform for a health care organisation?
A data platform for a healthcare organization typically includes: 1. A data repository for storing and managing the organization's data. 2. A data analytics and visualization platform for analyzing and interpreting the data. 3. A data governance and security framework for protecting the data from unauthorized access or misuse.
- Explain the decision boundary characteristics?
A decision boundary is a line or curve that separates different categories or classes in a classification problem. It is learned by a machine learning algorithm based on the input data. Decision boundaries can be linear or non-linear and may be prone to overfitting or underfitting. In multiclass classification problems, there may be multiple decision boundaries to separate the different classes.
- What is meant by a high-bias model?
A high-bias model is a model that is oversimplified and has a high error on the training data. This can be caused by a lack of capacity or flexibility in the model, leading to underfitting.
- A splitting criteria needs a measure for how to punish misclassification. What is the difference between a linear cost (misclassification rate) and a nonlinear cost (e.g. Gini index)?
A linear cost (misclassification rate) punishes misclassification equally, regardless of the class being misclassified. A nonlinear cost, such as the Gini index, takes into account the relative frequencies of the different classes and assigns a higher cost to misclassifying minority classes. This can be useful when the goal is to balance the misclassification rates across all classes.
- Explain how a linear model can be extended with nonlinear versions of a feature. What are the risks when doing that?
A linear model can be extended with nonlinear versions of a feature by transforming the independent variables using nonlinear functions, such as polynomial or exponential functions. However, this can increase the risk of overfitting, which occurs when the model becomes too complex and starts to fit the noise in the data rather than the underlying pattern.
- What is a random forest and how is it trained?
A random forest is a type of ensemble model that is trained using bagging and decision trees. A decision tree is a tree-like model that makes predictions based on the value of a particular feature. A random forest is trained by building multiple decision trees on different subsets of the training data and averaging their predictions. The goal of a random forest is to reduce the variance and improve the generalization performance of the model.
- Describe how the RNN is different and similar to a CNN?
A recurrent neural network (RNN) is similar to a CNN in that it processes sequential data, such as a time series or a sentence. However, unlike a CNN, which processes data independently in each layer, an RNN processes data sequentially, using feedback connections to maintain a state that depends on the past input. This allows an RNN to process data with temporal dependencies and make predictions based on past input.
- Explain the concept of Boosting/AdaBoost? What does it mean that it is sequential?
Boosting is a machine learning ensemble method that involves training multiple models sequentially, with each model trying to correct the mistakes of the previous model. AdaBoost is a specific type of boosting algorithm that works by weighting the training examples based on their importance and re-fitting the model to the weighted examples. Boosting can be used to reduce the bias of a model by iteratively improving the model's performance on the training data.
- Explain why both bagging and of decision trees and random forrest can handle nonlinear decision boundaries.
Both bagging and random forests can handle nonlinear decision boundaries because they are trained using multiple models, which can capture complex relationships in the data. Decision trees, in particular, are able to model nonlinear relationships because they can split the data based on multiple features at different levels of the tree.
- What input data are a CNN and a RNN suited for, respectively. Explain how they are trained.
CNNs are well suited for processing data with a grid-like structure, such as images, while RNNs are well suited for processing sequential data, such as time series or text. Both CNNs and RNNs are trained using gradient descent and backpropagation, with the goal of minimizing a loss function that measures the difference between the predicted and true values.
- Describe the rationale behind convolutional neural networks, e.g., parameter sharing.
Convolutional neural networks (CNNs) are designed to process data with a grid-like structure, such as images. They use convolutional layers, which apply a set of filters to the input data, and pooling layers, which down-sample the data to reduce the dimensionality and increase the robustness of the network. One of the main advantages of CNNs is that they use parameter sharing, which means that the same set of filters is applied to different parts of the input data, reducing the number of parameters and improving the efficiency of the network.
- Explain Cost Per Patient (KPP) and its limitations?
Cost Per Patient (KPP) is a measure of the cost of providing healthcare to an individual patient over a certain period of time. It is often used as a way to compare the costs of different treatment options or to assess the efficiency of a healthcare organization. The KPP's biggest limitation lies in the fact that it only describes costs for a care session or a visit, which is of limited interest when care is to be developed at a system level where the entire care chain must be taken into account
- Explain the dropout principle?
Dropout is a regularization technique that is used to prevent overfitting in neural networks. It works by randomly dropping out neurons during training, which forces the network to rely on multiple pathways for information rather than just a few. This can help prevent the network from overfitting to the training data and improve its generalization to new data.
- Give examples of limitations in datasets?
Examples of limitations in datasets include small sample size, lack of diverse samples, and noise or irrelevant features.
- Modified performance scoring, give two examples of modified scoring metrics?
Examples of modified performance scoring metrics include precision and recall, which are used to measure the accuracy of a classifier. Precision measures the proportion of correct positive predictions, while recall measures the proportion of actual positive cases that were correctly predicted.
- Explain the main principle behind the Gradient descent optimisation method?
Gradient descent is an optimization algorithm that iteratively updates model parameters in the direction that minimizes the cost function. It uses the gradient of the cost function to determine the update direction and magnitude, and repeats the process until convergence or a minimum is reached.
- Explain the meaning of using Hold-out data?
Hold-out data refers to a portion of the dataset that is set aside and not used for training the model. It is used to evaluate the model's performance on unseen data and ensure that the model generalizes well to new examples.
- What does GDPR say about data sharing from a hospital to company that develops their own products based on the data?
If, on the other hand, the supplier processes patients' personal data for purposes that it determines itself, for example for the development of its own products and services, the supplier becomes the personal data controller for processing. In this case, the supplier is classified as a third party, which is why a separate legal basis is required for sharing the patient data with the supplier
- Illustrate how softmax divides a 2-dimensional feature space for five class?
In a 2-dimensional feature space, the softmax function divides the space into multiple regions, each corresponding to a different class. For example, in a five-class classification problem, the feature space would be divided into five regions.
- Describe which layers that can be inside a feature extraction frontend and a classification backend in a CNN.
In a CNN, the feature extraction frontend typically consists of a series of convolutional and pooling layers that are used to extract features from the input data. The classification backend typically consists of fully connected layers that take the extracted features as input and output the final classification or prediction.
- How can we interpret the logistic function in a binary classification problem? As a probability for what? What is this probability at the decision boundary? What is the output of the linear model at the decision boundary?
In a binary classification problem, the logistic function can be interpreted as the probability of the event occurring. At the decision boundary, which is the point where the probability of the event occurring is 0.5, the output of the logistic function is 0.5. The output of the linear model at the decision boundary is the weighted sum of the predictor variables, which is used as input to the logistic function.
- ML and screening, explain the scoring scheme (both diagram and expert cost per patient).
In the context of screening, ML algorithms can analyze data to identify individuals at risk for a particular condition or disease. The scoring scheme for an ML-based screening process assigns a score or probability to each individual indicating their likelihood of having the condition or disease. Expert cost per patient refers to the time and resources required to evaluate each patient by a medical expert.
- Why is information-driven care not an IT-project?
Information-driven care is not an IT project because it involves the integration and use of various types of data and information to support clinical decision making and improve patient care. It involves the development and implementation of processes, systems, and tools that enable the collection, analysis, and dissemination of data and information across the healthcare organization.
- Explain why and when k-fold cross-validation is good? What about variance? What about large/small database?
K-fold cross-validation is a technique that is used to evaluate the performance of a machine learning model by dividing the data into k folds and training the model k times, each time using a different fold as the validation set and the remaining folds as the training set. This allows the model to be trained and evaluated on different subsets of the data, which helps to reduce the variance in the evaluation and provide a more robust estimate of the model's performance. K-fold cross-validation is particularly useful when the dataset is small, as it allows the model to be trained on a larger portion of the data, but it can also be used with larger datasets.
- Make calculations of number of parameters in different neural networks? Examples 6.1 and 6.3 (be able to perform calculation).
One layer from 28x28=784 pixels to 10 classes (0-9) means 784*10+10=7850 parameters. Two layers with 200 hidden units means (784 · 200 + 200) + (200 · 10 + 10) = 159010 parameters. (example 6.1)
- What is one-hot encoding?
One-hot encoding is a method of encoding categorical variables as numerical data. It involves creating a new binary column for each unique category and assigning a value of 1 to the column corresponding to the category and 0 to all other columns. This can be useful for feeding categorical data into a machine learning model that expects numerical input.
- What is meant by out-of-bag data?
Out-of-bag (OOB) data refers to the data that is not used to train a particular model in an ensemble of models trained using bagging. The OOB data can be used to estimate the generalization error of each model in the ensemble, which can be useful for model selection and evaluation.
- Effects of over-/undertraining?
Overfitting occurs when a model is too complex and fits the training data too closely, resulting in poor generalization to new data. Underfitting occurs when a model is too simple and does not capture the underlying structure of the data, resulting in poor performance on the training data. To avoid overfitting and underfitting, it is important to find a balance between model complexity and performance through techniques such as regularization and cross-validation.
- Explain Patient Encounter Costing (PEC) and how it is different compared to KKP?
Patient Encounter Costing (PEC) is a method of cost accounting that aims to assign the costs of providing healthcare to individual patients or episodes of care. It is different from KPP in that it focuses on the costs associated with specific patient encounters, rather than the overall cost of caring for a patient over a longer period of time. This can be useful for identifying the costs of different types of care and for comparing the costs of different treatment options.
- Be able to interpret performance measures and confusion matrices in general.
Performance measures and confusion matrices in general can be used to assess the accuracy and reliability of a machine learning model. Common performance measures include accuracy, precision, recall, and F1 score, which can be calculated from the entries in the confusion matrix. The confusion matrix is a table that shows the number of true positive, true negative, false positive, and false negative predictions made by the model, and it can be used to compute the performance measures and identify areas of strength and weakness in the model.
- Be able to interpret performance plots in general.
Performance plots in general can be interpreted by examining the trend of the plotted metric over time or as a function of some other parameter. For example, a learning curve plot that shows the training and test error as a function of training data size can be used to assess the model's generalization ability and identify signs of overfitting or underfitting.
- What personal data according to GDPR?
Personal data according to the General Data Protection Regulation (GDPR) is any information that relates to an identified or identifiable natural person. This includes both directly identifying information, such as name and address, and indirectly identifying information, such as online identifiers or location data.
- How is PPV related to prevalence? Give an example.
Positive predictive value (PPV) is related to prevalence in that it is influenced by the overall prevalence of the event in the population. For example, if the prevalence of a disease is low, the PPV of a diagnostic test for that disease will also be low, even if the test has high sensitivity and specificity. This is because the test will have a higher probability of producing false positives, which can dilute the overall PPV.
- What is regularisation, how can it be done, and when is it used?
Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty to the objective function during training. It can be done by adding a term to the loss function that penalizes large values of the model parameters, such as the L1 or L2 regularization. Regularization is used to improve the generalization performance of the model, which refers to its ability to make accurate predictions on unseen data.
- Samples or features for sound? Explain why features may be a good choice?
Samples refer to individual observations in a dataset, while features refer to the variables or attributes that describe each sample. In the context of sound, features could include characteristics such as the frequency, amplitude, or duration of a sound wave. Features may be a good choice for representing sound data because they can capture important characteristics of the data that may not be evident from the raw samples.
- Give the names of three pretrained networks (no details required)?
Some examples of pretrained networks are VGG, ResNet, Xception, and Inception. These networks have been trained on large datasets and can be used as a starting point for training new models or for transferring knowledge from one domain to another.
- Give examples of spatial filters and what output they produce for a simple input.
Spatial filters are mathematical functions that are used to extract features from images. For example, a horizontal edge detector filter would highlight horizontal edges in an image and produce a corresponding output image.
- Explain how the LSTM is working internally (principally, 5 steps)?
The LSTM network works internally by using a series of five steps: input, forget, output, and two different types of gates (input and output gates). The input gate controls the flow of information into the memory cell, the forget gate controls the flow of information out of the cell, and the output gate controls the flow of information from the cell to the output. These gates are all controlled by the previous hidden state, the current input, and the current hidden state, allowing the LSTM network to maintain a long-term memory and make predictions based on past input.
- Explain what is meant by the black-box problem in data-driven health?
The black-box problem in data-driven health refers to the challenge of understanding and interpreting the decision-making processes and algorithms used by machine learning models. These models can be complex and may be based on large amounts of data, which can make it difficult to understand how the model is making predictions or decisions. This can be a problem in healthcare applications, as it may be difficult to explain to patients or clinicians how the model arrived at a particular recommendation or decision.
- Write the expression for the closed form solution of theta for a linear model. Explain what is inside your matrices and vectors for a first order model.
The closed form solution for theta in a linear model is theta = (X^T * X)^-1 * X^T * y, where X is a matrix of the independent variables, y is a vector of the dependent variable, and theta is a vector of the coefficients.
- Understand trends of cost and misclassification rate, with and without decaying learning rate, how to recognise overfitting and how to use dropout to avoid overfitting (Examples 6.2-4).
The cost and misclassification rate are important metrics for evaluating the performance of a neural network. A decaying learning rate can help improve the performance of the network by gradually reducing the size of the updates to the weights and biases. Dropout is a technique that helps prevent overfitting by randomly dropping out neurons during training, which forces the network to rely on multiple pathways for information rather than just a few.
- Which data is used to optimise hyperparameters?
The data used to optimize hyperparameters is typically the training data or a portion of the training data, such as the validation set. The hyperparameters are adjusted to maximize the model's performance on this data, and the resulting model is then evaluated on the test set to ensure that it generalizes well to new examples.
- What is the difference between Accuracy and F1 score (in words based on what they focus on).
The difference between accuracy and F1 score is that accuracy measures the fraction of correct predictions made by the model, while F1 score is a weighted average of precision and recall, which are measures of the model's ability to identify positive examples. Accuracy focuses on the overall performance of the model
- Illustrate generalisation gap in error as a function of model complexity? In error as a function of training data size?
The generalization gap is the difference between the model's performance on the training data and its performance on the test data. It can be illustrated by plotting the error as a function of model complexity or training data size. As the model complexity increases or the training data size increases, the error on the training data typically decreases, but the error on the test data may not always decrease at the same rate. This can lead to a gap between the training error and the test error, indicating that the model may be overfitting to the training data.
- Explain the influence of the hyperparameter? What is the meaning of 1-NN, 20-NN or 100-NN? Its relation to over-/underfitting?
The hyperparameter in the k-NN method is the value of k, which determines the number of nearest neighbors to consider when making a prediction. A 1-NN model considers only the single nearest neighbor, while a 20-NN model considers the 20 nearest neighbors. Increasing the value of k can reduce the complexity of the model and reduce the risk of overfitting, but may also lead to underfitting if k is too large.
- For a linear model, Least-squares and the Maximum-likelihood derivations results in the exact same expression. Show in a figure how this can be interpreted for a linear model.
The least-squares and maximum-likelihood derivations for a linear model result in the same expression because they both seek to minimize the difference between the observed values of the dependent variable and the predicted values of the model. This can be interpreted visually as a line that fits the data points as closely as possible. OBS! Båda closed form solution, därav får samma slutgiltiga svar
- What is the main principle behind a decision tree? When is it useful?
The main principle behind a decision tree is to recursively split the training data into smaller and smaller subsets based on the values of the input features, with the goal of maximizing the purity of the resulting subsets. Decision trees are useful for a variety of tasks, including classification, regression, and feature selection. They are particularly useful when the relationships between the features and the target variable are complex or non-linear, as decision trees are able to capture these relationships and make accurate predictions. Decision trees are also easy to interpret and can provide a clear visualization of the decision-making process, making them a useful tool for understanding and explaining the model's predictions. However, decision trees can be prone to overfitting if the tree becomes too deep and complex.
- Be able to interpret the meaning of an ROC_AUC or a Precision-Recall curve?
The receiver operating characteristic (ROC) curve is a graphical plot that illustrates the performance of a binary classification model. It plots the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis, and the area under the curve (AUC) is used to measure the model's overall performance. A higher AUC indicates a better model, with an AUC of 1.0 corresponding to a perfect model. The precision-recall curve is a graphical plot that illustrates the trade-off between precision and recall, with precision on the y-axis and recall on the x-axis.
- What is meant by "there is a risk that parts of the care chain are made more efficient at the expense of others and that the local streamlining de facto reduces efficiency (resource consumption or patient benefit) over the entire care cycle."?
The statement refers to the potential unintended consequences of optimizing certain aspects of the healthcare system. For example, if a healthcare organization focuses on reducing costs in one part of the care process, it may lead to inefficiencies or reduced quality in other parts of the process, which could ultimately reduce overall efficiency and negatively impact patient outcomes.
-The structure of neural networks and their parameters
The structure of neural networks consists of layers of interconnected units called neurons, which process and transmit information. The parameters of a neural network refer to the values of the weights and biases that determine the connections between neurons and the strength of the signals they transmit.
- Explain why the training data and its variation are so important in clinical applications?
The training data and its variation are important in clinical applications because they are used to train and evaluate machine learning models that are used to support clinical decision making. The quality and diversity of the training data can significantly impact the performance of the model, and it is important to ensure that the training data is representative of the population that the model will be used on.
- There are two requirements related to profiling in GDPR? Which two?
There are two requirements related to profiling in GDPR: 1. The data subject must be provided with information about the profiling and its consequences 2. The patient has the right not to be subject to decisions based solely on profiling and which have legal implications consequences for the patient or that affect the patient in a similar way to a significant degree.
- Is there a risk that training data becomes a part of a ML model?
There is a risk that the training data becomes a part of the machine learning model if the model is trained on personal data and the resulting model is used to make predictions or decisions about individuals. In this case, the model may contain information about the characteristics and attributes of the individuals in the training data, which could potentially be used to re-identify the individuals or make inferences about them.
- Illustrate three different activations functions and give their names?
Three different activation functions are sigmoid, tanh, and ReLU. Sigmoid maps input values to a range of 0 to 1, tanh maps input values to a range of -1 to 1, and ReLU maps all negative input values to 0 and all positive input values to their original value.
- Which derivatives are needed for calculating gradients in a 2-layer neural network (no calculations needed)?
To calculate the gradients in a 2-layer neural network, we need to take the derivative of the loss function with respect to the weights and biases of the network. This can be done using backpropagation, which involves propagating the error from the output layer back through the network and updating the weights and biases to minimize the error.
- Calculate which of two splitting criteria that is the best.
To calculate which of two splitting criteria is the best, you can compare the performance of the decision tree trained using each criterion on a separate validation dataset. The criterion that leads to the best performance on the validation data is typically considered the best.
- Give example of achievable performance?
To compare the performance with the achievable performance gives us another reference point to assess the quality of the model: - The achievable performance can be based on what other state-of-the-art models on the same or a similar problem achieve. If humans can identify the correct class with an accuracy of 99%, that serves as a reference point for what we can expect to achieve from our model.
- Name two advantages and two drawbacks/risks with k-NN.
Two advantages of the k-NN method are that it is: 1. Very easy to implement 2. New data can be added seamlessly Two drawbacks: 1. The cost of calculating the distance can be computatively expensive in big datasets 2. For higher dimensions it is difficult for the algorithm to calculate the distance in each dimension.
- What is important to think about when initialising a neural network? For ReLU?
When initializing a neural network, it is important to consider the scale of the input data and the complexity of the task. It is also important to consider the type of activation function being used. For ReLU (Rectified Linear Unit) activation, it is common to initialize the weights with a small random value, such as 0.01, and the biases with a value of 0.
- Explain what happens in examples similar to Example 4.2 and 4.4 in the book.
vet inte
