4F-2 Data Analytics 2 - Part 2 Artificial Intelligence and Machine Learning

Ace your homework & exams now with Quizwiz!

What are Support Vector Machine limitations/disadvantages?

# features cannot exceed # instances Uses only numeric or dummy-coded categorical data Decision boundary can be hard to interpret Can be outperformed by random forest and gradient boosting machines

ML Model Development Step 9: Evaluate (validate) model's performance

-If adequate data present • Run model on separate "hold out" validation (tuning) data set • Check performance of the preliminary model using methods described earlier • May need different metrics for low incidence classes • Determine what, if any, changes need to be made to • Hyperparameter settings • Feature selection • Model complexity • Some models will integrate checking performance and adjusting model -If amount of data (instances) is suboptimal • Data is split into training data set and testing data set (no up-front split of validation data set) • Use cross-validation, bootstrapping or other equivalent method to train the model and evaluate it

ML Model Development Step 4: Partition data

-Randomly split data with no overlap between data sets • Potential for sampling errors between sets • Small data sets may not have much data to split -Method of partitioning data may differ when data sets are smaller

In ML when partitioning data what is the testing set?

-Testing Data Set 10-33% • Data used to evaluate the final trained model • Model is run on the data but is NOT trained on it • Model's performance on these data used as its final evaluation

In ML when partitioning data what is the training set?

-Training Data Set 67-80% • Data used to train the model

In ML when partitioning data what is the validation set?

-Validation (Tuning) Data Set (not always used) 10-15% • Data used to evaluate preliminary trained model • Model is run on the data but is NOT trained on it • Model's performance on these data used to optimize model before retraining on training data set again • May not be used if no hyperparameters or if insufficient data in general

What is Gini impurity in Decision Trees?

0 with single class populations

What is the average percentage of instances that are out-of-bag?

0.368 (36.8%) instances are never chosen and are out-of-bag.

What is the average percentage of instances selected for the bag?

0.632 (63.2%) instances are selected for the bag

ML Model Development Step 8: Train model on training data set

1 pass through of training the model = 1 epoch After model has been optimized, it may train again on the training data set

What is a Singular value decomposition in Dimensionality Reduction Methods in ML Unsupervised Algorithms?

A = USVT • A: Matrix • U and V are orthogonal matrices • S is the diagonal matrix

What is a Naïve Bayes Classifier?

A classification algorithm that uses Bayes' Theorem to perform classification based on probability Assumes independence between features (hence naïve) Vector x representing some n features (independent variables) Assigns current instance probabilities for every K-cluster of potential outcomes Can predict binary, categorical and numerical data Gaussian Naïve bayes: Use for numerical data with Gaussian distribution

What is ML logistic regression?

A classification method using a logistic function. (not a regression model) AKA logit regression output is categorical : classification method Called regression because mathematical formula similar to regression methods Common application: Sepsis prediction models

What is a policy in reinforcement learning?

A function that outputs an optimal action to maximize expected average reward.

What is a Support Vector Machine?

A machine learning algorithm used for classification. Common uses: Fetal aneuploidy screening, Prediction of metastasis from gene profiles, Autoverification of GC/MS in the lab

What is the Silhouette coefficient?

A measure of how well each data point fits into its assigned cluster. Ranges from -1 to +1 1: clusters well apart 0: clusters indifferent -1:clusters assigned incorrectly Max(a,b): maximum distance of ALL distances between a and b for ALL cluster pairs

What is Polynomial Regression?

A method for modeling non-linear exponential data. Models non-linear exponential data such as growth rates, progression of pandemics, etc. Input: Uses exponentials of single input variable (independent variable x) Output: single output variable (𝑦𝑖); dependent variable Advantages: Better fit to non-linear exponential data Limitations / Disadvantages: Higher risk of overfitting because it is sensitive to noise

What is Q-learning?

A method that finds the optimal action-selection policy for any given finite MDP.

What is a Markov Decision Process (MDP)?

A model used in reinforcement learning where states, actions, probabilities, rewards, and penalties are known.

What is a Recurrent Neural Network (RNN)?

A type of ANN that handles time-series data well

What is a Generative Adversarial Network (GAN)?

A type of neural network used for generating synthetic data. Used to generate DeepFake images Used to simulate cat-and-mouse fraud schemes

What is a Convolutional Neural Network (CNN)?

A type of neural network used for recognizing subpatterns and motifs in unstructured data, particularly images.

What is Long Short-Term Memory (LSTM)?

A variation of RNN with gated nodes for flexible representation of short-term or long-term data

What is coefficient of determination (r^2)?

A.k.a. goodness-of-fit Values between 0 and 1, expressed as percent (%); 100% is a perfectly fit model Represents percent variation in y that is not explained by variation in x Proportion of the variance in the predicted variable accounted for by the model

What is Simple Linear Regression?

A.k.a. univariate regression Model finds the line of best fit (calibration) which can then be used for prediction of y based on x

What is a Multilayer Perceptron (MLP)?

A.k.a. vanilla neural network. Perceptron: iterative algorithm that determines best values for the coefficient vector Typical example of a shallow feed-forward network Fully connected multi-layer neural network • May have 1 or several hidden layers Steps • Starting with the input layer, propagate data forward to output layer • Calculate error of output (observed vs. expected) • Backpropagate the error to adjust weights and biases to minimize it Repeat steps over multiple epochs to learn ideal weights and biases

What does the FDA consider AI and machine learning?

AI and machine learning is Software as a Medical Device (SaMD).

What is generative AI?

AI and machine learning used to generate new content. Images, music, speech, code, video, text e.g., ChatGPT (text), DALL-E (images), CoPilot (code generation), VALL-E (audio) Contrast with other AI that performs data analysis, classifies, or chooses an action

What are the challenges of AI in cybersecurity?

AI can be hacked just like any other software Hacked systems have potential for unauthorized disclosure, patient harm Human autonomy ("human-in-the-loop") may help detect malfunctioning AI The NSCAI is a national effort established in 2018 by the John S. McCain National Defense Authorization Act to address AI cybersecurity and related national security concerns.

What is Augmented intelligence?

AI is used to assist and augment human work Maintains "human-in-the-loop"; human ultimately making decisions

What is AI Effect and Tesler's Theorem?

AI is whatever hasn't been done yet Optical character and voice recognition, automated pap smear and peripheral blood smear readers, bioinformatics pipelines → no longer considered AI even though they are

What is Autonomous intelligence?

AI makes decisions without human involvement

What are large language models (LLMs)?

AI systems that work with language and have been trained with more parameters. "Large" refers to training with more parameters • No specific cutoff for "large" vs. not large •N-grams: Recurrent Neural Networks (RNNs): Transformers • Transformers developed by Google Brain in 2017: much faster than an RNN

What is Artificial Intelligence (AI)?

Ability of a computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.

What is Intelligibility in the context of AI?

Achieved through -transparency: Sufficient information published before the design or deployment of an AI technology, Describes how technology is designed, intended use, data used, etc. Also means that a person knows when AI is being used on them -eXplainability (XAI) Providing the human user an explanation of how the AI tool works

How do you correct a Model when raining data is more complex than real-world data?

Add instances to avoid sampling error and reduce complexity

What is the Apriori algorithm Association Rules for ML Unsupervised Algorithms?

Algorithm for finding frequent item sets Uses a hash tree to count item sets navigating through data set in breadth first manner Common uses: Detecting adverse drug reactions

What is Machine Learning (ML)?

Algorithms which allow computers to learn without explicit programming.

What is ML Fully supervised learning?

All data labeled to same extent

What is variance in ML model evaluation?

Amount of imprecision (square of standard deviation (σ) > σ2) Due to model's sensitivity to small fluctuations in the training set High variance: model is imprecise (and likely overfit) Low variance: model is precise (but may not be accurate and may be underfit) High variance + low bias inconsistently accurate results

What is bias in ML model evaluation?

Amount of inaccuracy in the model's performance after training High bias: model is inaccurate (underfit) Low bias: model is accurate (but may be overfit) High bias + low variance: consistently inaccurate results

What is a feature?

An aspect (variable) of the training data a.k.a. vector, attribute In clustering and instance-based methods, called a dimension Features can be selected columns in the raw data or.. • Can be calculated or combinations of data from >=1 columns in the raw data • e.g., Body mass index is a calculation of mass (body weight) and height Geocoder: Type of feature that maps a data point to a map so that you know where the event happened

What is market basket analysis Association Rules for ML Unsupervised Algorithms?

Analyzing customer purchase patterns Customers who bought X also bought Y • X and Y are independent variables • E.g., customers who bought dog leashes also bought fireplace logs

What is CatBoost?

Another type of ensemble algorithm

What does Area Under the Curve (AUC) represent in ML?

Area under the ROC curve, measures discrimination. a.k.a. concordance (C) statistic AUC of 1 = perfect discrimination between groups Threshold for acceptable performance • Model to replace human: very high (near 1) • Model to assist human: 0.7-0.9 may be ok

What is ML Bias-variance tradeoff?

As complexity increases, bias decreases but variance increases Bias-Variance Trade-Off • Things that reduce variance increase bias • Things that reduce bias increase variance Total error = (bias)sq + variance + irreducible error Can never get rid of irreducible error

ML Model Development Step 3: Prepare, Transform and Cleanse Datances)

Assess and/or create labels (supervised models only) • Degree and extent of labeled data determines type of learning: informs model selection • Features should have similar/same scales of measure or labels; Labels should have similar/same names • Manual labeling: Performed by subject matter experts, error prone, can introduce bias, terms may not be standardized • Automated labeling: Can be performed using human-labeled data used as a template. Completely automated: No human-labeled data as a template. Multiple-instance learning: Automatically learns important details of instances grouped under broad labels Cleanse data • May need to normalize normally distributed data to standard deviation units OR... • Normalize to percentile span of data range if not normally distributed data • Correct missing/erroneous data • Remove outliers if possible without introducing bias

What is a ML confusion matrix Single class?

Assessment of model's "confusion" in analyzing instances (i.e., assigning instances to the wrong class or outcome) Evaluated for each outcome / class the model produces (e.g., classification) For binary outcome: simple 2 x 2 table # columns, # rows = # output classes

How does Simple Linear Regression work at a high level?

Assumes linear relationship between dependent variable (y) and the independent variable (x) Uses method of least squares to find the best line through the data Goal is to minimize the loss function The slope of the line is the regression coefficient

What is the goal of boosting methods in sequential ensembles?

Attempts to improve predictive flexibility of models Trains large # of constrained (weak learner) models (e.g., decision trees with limited depth) Combines all constrained models into a single strong learner Uses weighted voting (unlike parallel ensembles) Advantages • No data preprocessing required; handles missing data; very powerful method with good performance • Helps reduce high bias (inaccuracy)...more likely to see bias with shallow decision trees Limitations • Controversy over whether this method reduces or increases overfitting • Requires intense computation

What are the main categories of AI Ethics challenges?

Beneficence, Intelligibility, and Accountability

AI challenges with Ethics

Beneficience, Intelligence (eXplainability and transparency), Accountability, Auditability

What is the output for ML Logistic regression?

Binary classification: Probability that an instance belongs or does not belong to a single class Multiclass classification: Probability that an instance belongs to one of n classes

What is the output of a Support Vector Machine?

Binary or multiclass classification. New instances are classified based on location with respect to the hyperplane

What are the common uses of hierarchical clustering?

Bioinformatics and genetics - relationships or classes of samples based on their genetic profiles, treatments, outcomes. Outoput is a Dendrogram (tree diagram)

What method is used to construct sequential ensembles?

Boosting methods.

What the input for ML Logistic regression?

Can be multiple independent variables (nominal, ordinal or metric)

What is an advantage of parallel ensembles - Random Forest?

Can determine relative importance of individual features for classification tasks by calculating weighted average of decreased class impurity (increased purity) produced by all nodes of the forest that use each feature

What are the limitations of logistic regression?

Cannot handle large numbers of features (input variables) Cannot handle situations where relationship between input and output variables is not constant Best for binary outcomes; may overfit if insufficient training data

What is the Input for Support Vector Machine?

Categorical variables must be converted to numeric (see feature engineering in supplemental material) Kernel and penalty hyperparameters

In ML model optimization what is regularization?

Category of methods that artificially force algorithm to build a less complex model (i.e., more generalizable, less likely to overfit data) • Places constraints on the model's variability in the setting of noisy data, similar to curve smoothing • Can perform feature selection by adjusting weights of some features to zero: feature elimination • Prevents parameters from getting too large: slightly higher bias (inaccuracy) but significantly less variance (more precise) • Minimizes the loss function for data

Statistical Causation

Cause-and-effect relationship • One thing causes another • Independent variable causes a particular value in a dependent variable • Criteria to satisfy if relationship is causal

How do you correct a Model that is too complex for the data?

Choose simpler model, Regularization

Criteria for causation and Description

Chronology (temporality): Effect has to occur AFTER the cause Dose-response relationship: Increasing exposure increases the risk Consistency: Effect is consistent when results are replicated in different settings using different methods Plausibility: Effect agrees with accepted understanding of medical processes Coherence: Compatible with existing theory and knowledge (which may be wrong)

What is the Average linkage method?

Clusters are merged based on the average similarity between all pairs of points in the clusters.

What is the Single linkage method?

Clusters are merged based on the similarity of the closest pair of points between the clusters. Can cause premature merging

What is the Complete linkage method?

Clusters are merged based on the similarity of the furthest pair of points between the clusters. Outliers may cause merging of close groups later than is optimal

What is Elastic Net regularization?

Combination of L1 and L2 regularization.

ML feature selection: Stepwise selection

Combination of forward and backward selection Forward feature added, then all included variables are evaluated for backward selection to validate the addition of the forward feature

How can we estimate the optimal value of K in K-Means Clustering?

Compare compactness of groups for range of K # Sum of squared error (SSE) for all members of clusters indicating compactness - change K and recalculate until optimal K defined

What is calibration in model evaluation?

Comparison of observed vs. expected probability of class membership. Must be good if the model is to be used for prognostic models Assess with reliability diagrams [Dimitriadis et al. 2021] Observed event frequency plotted against predictive probability

What are the limitations/disadvantages of Artificial neural networks (ANNs)?

Computationally intensive Back-propagation and better processing technology have helped reduce the impact of this Unraveling the pathways after training is completed can be difficult to impossible: Black Box Problem

Machine learning algorithms

Computer systems that automatically learn from experience instead of being explicitly programmed

What are Artificial neural networks (ANNs)?

Connectionist systems that mimic human brain. Goal is to solve problems the way that a human brain would Do not separate memory and processing

Give an example of AI in Assistance with time-consuming tasks

Counting mitoses, Medical documentation, Prior authorization

What are the uses of clustering methods?

Creating hypotheses about data structure and simplifying data.

Why might critical input data be missing in AI training data?

Critical inputs may not be represented in AI training data because humans may not realize which data contributed to their decisions. Polanyi's Paradox refers to human decision-making that goes beyond explicit understanding or description.

What are some challenges in AI?

Cybersecurity risks, bad data/science, automation bias (Assumption that the computer is right, even when it doesn't make sense), inaccurate assumptions about data accuracy and representation

What is the overall process of a CNN?

Cycles of convolutions to pooled layers are repeated until a defined set of criteria are reached, then the data is sent to a feed-forward network for classification.

AI challenges with data for training model

Data for training model may be poor • Has bias and/or false beliefs • Lacks sufficient quantity and variability • Lacks complete and 100% accurate labels • Is missing critical input data

How can human bias and false beliefs be represented in data labels?

Data labels can represent human bias and false beliefs, such as in court sentences or hiring/firing decisions.

What is ML Unsupervised learning?

Data which have not been classified or labeled Goal: model discovers new (previously unknown) patterns or relationships

What are the pros of k-fold cross-validation?

Decreases risk of sampling error, overfitting, and underfitting.

Give an example of AI in Anomaly detection

Detecting errors in data

What are some common uses of K-Means Clustering?

Detection of cervical intraepithelial neoplasia, works better with numerical data.

What is Multiple Linear Regression?

Determine strength of relationship between >=2 independent variables and a dependent variable Value of dependent variable at certain independent variables Input: >1 input independent variable (covariate; 𝑥𝑥1, 𝑥𝑥2, etc.); no collinearity between them Output: Still a single output variable (𝑦𝑦𝑖𝑖)

How does a Support Vector Machine work at a high level?

Determines optimal boundary between two classes in multidimensional (n-dimensional) space • 2 features, boundary is a line • 3 features, boundary is a plane (surface) • >3 features, boundary is a hyperplane and cannot be visualized

What cause AI models to be brittle?

Do not produce consistent output when given similar inputs Unable to see the forest for the trees (double-edged sword) • Purpose of models is to analyze details too complex for humans to synthesize, but... • Humans better able to factor in general features and/or situational awareness • Humans will ignore details in favor of general assessment

What is Ridge (L2) regularization?

Does not reduce features but gives some features less weight Better than L1 for maximizing performance of model on hold-out data Is differentiable, so gradient descent can be used for optimizing the objective function

Give an example of AI in Signal conversion

E.g., natural language processing, voice recognition, optical character recognition

What is the Centroid similarity method?

Each iteration merges clusters with the most similar point.

What is the advantage of ML Decision Trees?

Each node easily explainable as if-then cutoffs

How do you correct a Model that trains to long?

Early stopping (fewer epochs, cycles)

What are some non-mathematical regularization methods?

Early stopping, Computational data augmentation, Pruning decision trees.

What are some non-mathematical methods of regularization?

Early stopping, pruning decision trees.

What are Support Vector Machine advantages?

Efficient use of computer memory Works well with high dimensional data Less prone to overfitting

How does parallel ensembles - Random Forest work?

Ensemble of randomly selected decision trees (to make a "forest") run in parallel Performance is better when each decision tree in the forest is different (uncorrelated) Can be used for classification or regression Uses both bagging and random subspaces as metrics

What are parallel ensembles - Random Forest?

Ensemble of randomly selected decision trees run in parallel. Common uses: 30-day hospital readmission algorithms

What are sequential (series) ensembles?

Ensembles where each model is run sequentially. Uses multiple short decision trees as weak models Each iteration focuses on learning from the mistakes of the one before it Constructed using boosting (not bagging) methods

What is machine learning validation?

Evaluating preliminary (non-final) model • Results of evaluation lead to tweaking (tuning) the model

What is the difference between hard and soft clustering?

Exclusive (hard) clustering: • An instance can only belong to 1 cluster • E.g., K-means clustering Fuzzy (soft) clustering: • An instance can have more than 1 cluster assignment

In ML what is Leave One Out Cross-Validation (LOOCV)?

Extreme case of k-fold where k = n (number of samples) Use when you have very small data sets where standard K-fold cross-validation will not work Can focus on noise unique to that dataset, so this method is NOT preferred

AI challenges with Legal and Regulatory

FDA jurisdiction CLIA - challenges for laboratories (static models) Safety surrounding continuously retraining models

ML feature selection: Forced inclusion

Features are selected based on known association in prior studies

What is machine learning testing?

Final evaluation of a machine learning model where no further changes to the model are expected

What is transparency in AI?

For AI developers: Reasons for model's performance are known and understood For end-users (ethics): Sufficient information is published such that model's performance can be audited

What is null error rate?

For classification methods, rate of being wrong if you ALWAYS pick the majority class NER=(Total instances-Majority class instances)/Total instances. • If the majority class has 105 instances out of 165 total instances Null error rate = (165 - 105)/165 = 36%

What types of problems can ensemble methods be used for?

For classification, each model produces probability (vote) of a class then votes are counted For regression, each model produces metric (numeric) variable averaged across models

What is the output of k-Nearest Neighbor?

For each new instance (x), algorithm finds k training instances with closest distance to x and returns majority result • Class for classification problems • Mean for regression problems

Using F score in multi=class evaluations

For multi-class evaluations • Compute the F1 score for each class alone then... • Macro F1: average all the per-class precisions and average all the per-class recalls and then use averages to compute overall F1 score • Weighted F1 score: Same as macro F1 except that each class is factored by a weight • Often used to compare classifiers, but some believe this is often used incorrectly

ML feature selection: Backward selection

For regression models Remove feature if it worsens the strength of the association Iteratively remove variables (features) until removal worsens the strength of the association

ML Model Development Step 1: Determine the problem to be answered

Foundational to rest of the steps in model development

In ML what other statistical methods can you use to screen for low incidence conditions?

Fβ score Matthews Correlation Coefficient: A measure of the quality of binary classification predictions. Stratified K-fold cross-validation: A technique for evaluating the performance of a machine learning model by dividing the data into K subsets and ensuring each subset has an equal distribution of classes.

ML Model Development Step 5: Select features (independent variables):

General rule: select is <=1 feature for each 10 instances in the development data set Higher number of features: overfitting Each feature should be selected based on its likely association and/or ability to explain the dependent variable (outcome, condition) Manual selection by a human • Subject to human bias: constrains model • Features selected not always relevant or optimal Automatic selection by a computer algorithm (e.g., deep learning) • Some ML models automatically select features during primary model training

What is the role of the environment in reinforcement learning?

Gives rewards or penalties to the agent and transitions to a new state.

ML Model Development Step 10: Optimize Model (aka tuning)

Goal: minimize deviation from the correct output (minimize the loss function) Methods of optimization • Tune (adjust) hyperparameters • Dimensionality reduction (see unsupervised algorithms) • Regularization (see below) • Consider a different model if the current one is simply not working

What is the importance of good quality data in AI?

Good quality data is critical for AI models. Bad data leads to bad models. Some models need a large amount of training data to perform well.

What are layers in an ANN?

Groups of nodes with similar activation functions.

What are the advantages of Artificial neural networks (ANNs)?

Handles large amounts of complex data.

What is F Score in ML?

Harmonic mean of precision and recall. a.k.a. F1 score For binary classifier with no weights on importance of precision vs. recall Perfect F score = 1 Increasing precision decreases recall and vice versa

What are the causes of underfitting in ML?

High bias, low or high variance. More common at the beginning of model development prior to tuning. Causes related to model and its settings: too simple or rigid for the data (e.g., using linear model for complex data), training is paused too early, hyperparameters are suboptimal, not enough features selected, suboptimal features selected. Causes related to data: training data more simple than real-world data (underrepresented populations).

Overfitting vs Underfitting

High bias: underfit High variance: overfit

What is entropy in Decision Trees?

High when large number of evenly mixed classes

What is goodness of fit in ML?

How closely a model's output values match the observed (true) values.

What are Expert Systems?

Human knowledge encoded in a knowledgebase.

What is traditional programming?

Human-specified rules analyze data according to known or suspected patterns.

What are the limitations/disadvantages of K-Means Clustering?

Ideal grouping will not occur when K(#) does not fit population characteristics Values of some dimensions (features) may be correlated (redundant) These can increase noise without improving clustering Identify and remove these as part of data cleanup prior to training

What are some common applications of CNNs?

Image analysis, image classification, classifying time-sequence or gene-sequence data.

What are the possible outputs of a CNN?

Image classification, image feature selection Saliency map: Feature (activation) map directly overlaid on the original image • Shows which features considered more relevant, but doesn't tell you why

What is deep learning in ANN?

Imitating human brain in data processing and decision-making patterns. Usually multiple (Some say > 1 to >3 to hundreds to thousands) of hidden layers • Thousands to millions of interconnections; large number non-linear computations Means more in-depth processing, not more in-depth knowledge

What issues can arise from incomplete, inaccurate, and variable data labels?

Incomplete, inaccurate, and variable data labels can cause problems in AI models. Different terms or metrics for same label due to human inconsistency

What is correlation coefficient (r)?

Indicates the strength of the relationship between two variables on a scatter plot. Perfectly positively correlated model: r = 1 No correlation whatsoever: r = 0 Perfectly negatively correlated model: r = -1 Correlation does not imply causation

What is machine learning?

Input and output data used to create a model/tool.

What is an outlier?

Instance which is significantly different from the remaining instances in the population Can skew results Different models have different sensitivities to outliers

What is k-Nearest Neighbor?

Instance-based method for classification problems. Common uses • Derive tumor infiltrating lymphocyte density • To predict treatment response

How does hierarchical clustering work at a high level?

Instances are grouped based on similarities and differences. Agglomerative clustering • Bottom up approach; most common • Each instance is a cluster and successively merged with most similar other cluster • Each feature of an instance population = dimension Divisive clustering • Top down approach; NOT common • Entire data is a cluster then divided into two clusters based on similarities and differences • Cycle is repeated until

What is K-Means Clustering?

Instances assigned to manually defined number (K) of clusters based on their similarity. NOT k-Nearest Neighbor

What is Probabilistic Clustering in Unsupervised ML?

Instances clustered based on probability that they belong to a particular distribution • Gaussian Mixture Model (GMM): Leveraged to determine which gaussian distribution an instance belongs to

What is the bootstrapped sample?

Instances in the 'bag' Instances never selected: "out-of-bag" (OOB) sample. Because you are always drawing from the entire original population, it is likely that the same instance may be selected more than once.

What challenges arise when data lacks quantity or variability?

Insufficient quantity or variability of data can be problematic for models, especially when finding less common patterns. Underrepresented populations can lead to non-generalizable rules and disparities in AI models.

What are parameters in machine learning?

Internal values inside machine learning that the model derives based on training data e.g., weights, bias values

What is noise in machine learning?

Irrelevant information or randomness in a data set.

How does GAN work at a high level?

It consists of a discriminator and a generator network trained in tandem. If Discriminator is correct, penalty to Generator If Discriminator wrong, penalty to Discriminator

How does RNN work at a high level?

It is a deep learning network where the output recurrently feeds back on itself to inform the next prediction Data from prior instances to be used as input for subsequent instances Network nodes can accumulate historical data: called context nodes • Have their own weights which are adjusted via backpropagation during training

How is the Silhouette coefficient calculated?

It is calculated as (b - a) / max(a, b), where a is the average intracluster distance and b is the average intercluster distance.

What is the role of the discriminator network in GAN?

It learns to discriminate between synthetic and real instances. With each iteration, increases ability to discriminate between synthetic instances from generator network and real instances

What is the role of the generator network in GAN?

It learns to generate instances that can fool the discriminator. With each iteration, increases its ability to foolthe discriminator network

How is reinforcement learning different from supervised and unsupervised learning?

It uses unlabeled data like unsupervised learning but uses outcomes to affect model training like supervised learning.

ML feature selection: Forward selection

Iterative inclusion of new feature as long as it makes a contribution that explains the variation in the dependent variable Stops when no additional contributing variables (features) found

How is K-Means Clustering different from k-Nearest Neighbor?

K-Means Clustering assigns instances to clusters based on similarity, while k-Nearest Neighbor classifies instances based on their nearest neighbors.

What is the black box problem in AI transparency?

Lack of transparency in understanding the rules developed by the AI algorithm. May be indecipherable after model is trained, even to the developer(s) May not be able to determine why algorithm generated certain output May generally work well but some output may be inexplicably wrong

What is the output layer in an ANN?

Layer with nodes equal to the number of output categories in the data. # nodes = # output categories of data

What is the input layer in an ANN?

Layer with nodes equal to the number of selected features in the data. # nodes = # features selected in data

What are hidden layers in an ANN?

Layers between the input and output layers to process information Shallow networks usually have 1 Deep networks have >3

What is reinforcement learning?

Learning how to reach a goal through trial-and-error. Game playing, speech to text, financial trading.

What are some mathematical methods of regularization?

Least Absolute Shrinkage and Selection Operator (LASSO) (L1) regularization Ridge (L2) regularization Elastic Net regularization (combo of L1 and L2) Special regularization for neural networks

What is LASSO (L1) regularization?

Least Absolute Shrinkage and Selection Operator (LASSO) (L1) regularization • Modifies objective function by adding penalty hyperparameter (e.g., C) • If C=0, then hyperparameter has no effect on the algorithm • As value of C increases above zero, more parameters forced to be set to zero to minimize the objective • Can eliminate less essential features and produce a sparse model • Helps to increase model explainability by showing which features are essential

What does a lower value of K mean in K-Means Clustering?

Less granularity in the clustering.

ML Model Development Step 13: Post-Deployment Monitoring

Look for shifts and trends in model output that may represent increase in bias or variance Model stability • Ability of a model to produce similar output over a range of similar inputs, including inputs not previously seen but which are not substantially different from prior inputs • A stable model is a robust model •Unstable (brittle) models • Models that do not produce consistent output when given substantially similar inputs

What is the difference between Machine Learning and Traditional Statistics?

Machine Learning does not define explicit mathematical relationships between inputs and outputs, while Traditional Statistics does.

What is Strong AI?

Machine outperforms humans in many tasks

What is Narrow AI?

Machine performs a single specific task better than a human

Wha t is General AI?

Machine performs any intellectual task with same accuracy as a human

Give an example of AI in Decision support

Making prior authorization decisions

What are regularization methods?

Mathematical and non-mathematical techniques to prevent overfitting.

What is a confusion matrix in multi-class classification?

Matrix evaluating model performance for each class. Number of columns and rows equal to the number of output classes. Model performance for each outcome/class produced.

What is beneficence challenge in AI?

Maximizing benefits and minimizing risks and harms AI can propagate and exacerbate human bias because... • Human bias can infiltrate data used to build models • Mitigate through Diversity, Equity and Inclusiveness practices • Fair representation of all groups in data • Fair and equitable benefits, risks and harms across each group • Monitor for and mitigate bias Person should be able to choose whether his/her data used in algorithm Protect human autonomy in decisions ("human-in-the-loop") • ACR and RSNA recommendation do not approve autonomous AI until sufficient human-supervised AI experience obtained Research integrity (good science) Sustainability: Environmental, Workplace (ease of maintenance)

What is a MO loss function?

Measure of deviation from the correct output.

What is Root Mean Squared Error (RMSE)?

Measure of how well a regression model fits the data. n: number of measurements; xi: observed measurement; xtrue: actual measurement Nearly guaranteed to be higher on test data than training data The lower the result, the better the fit (the more the observed values match expected values) If RMSE is higher on test data than on training data, model has overfit the training data

What is Cohen's kappa in ML classification methods?

Measures classifier performance compared to random chance.

Give an example of AI in Predictions

Medical diagnoses and problems (active research area) Predicting patient volumes → adjust staffing (especially during system changes) Predicting optimal future state workflows / functional gaps in process redesign Predicting, detecting and subverting malware attacks

What is k-fold cross-validation in ML?

Method of choice for smaller and larger data sets. Some say to use it for larger data sets as well

What is stochastic gradient descent?

Method to reduce loss function by descending the curve. Mathematically descends the curve of the loss function to a minimum value Effective training method for most models

What is Adaboost sequential ensembles?

Misclassified data from each algorithm have weights INCREASED • Increases chance that the next algorithm/model will classify them correctly • Well-classified data may have weights decreased Developed for binary classification • Also multiclass classification with soft voting Advantages • Quick and easy to use • Works well with large data sets • Uses several algorithms to improve accuracy Limitations • Takes longer to train • Can be impossible to interpret

What is a model in machine learning?

Model = Algorithm + parameters When a model is used for classification, it is called a classifier Weak learner (weak model): model whose performance only slightly > random chance Good model: model that generalizes well (it performs the same on new data as it did on the training (and test) data)

What can happen in the accuracy paradox?

Model can correctly predict absence of the condition in 99% of cases - hooray! BUT... May completely fail to detect the condition being sought 100% failure of detecting the condition (but null error rate is only 1%)

What is underfitting in ML?

Model does not accurately predict output for the data fed to it.

What is ML discrimination capability?

Model's ability to discriminate between groups, classes or clusters. Can be used for both supervised and unsupervised models

AI challenges with brittle models

Models can be brittle • Produce variable output with consistent input • Degrade over time • Subject to adversarial examples • Small input changes hack the system (cybersecurity)

What are ensemble methods in ML?

Models that are groups (ensembles) of > 1 model to improve performance. Models should be different from one another for best advantage

What is auditability challenges for AI?

Monitor tool for performance and to ensure ethics are followed Formal oversight mechanisms Responsibility :Person(s)/entity(ies) responsible for monitoring AI Responsiveness: Developers and users systematically examine to determine whether it is responding adequately, appropriate and according to expectations AND respond when it is not working (fix or terminate AI program)

What does a higher value of K mean in K-Means Clustering?

More granularity in the clustering.

What is the ideal number of instances for input in RNN?

More than 10,000 instances

What is the ideal number of instances for input in a CNN?

More than 10,000 instances.

What are Decision Trees in ML?

Most common supervised classification method Goal: classify a set of training instances accurately • Examples: Classification & Regression Trees (CART), C4.5, ID3

ML Model Development Step 2: Gather appropriate data (instances)

Most important, time-consuming and expensive part of the process Data set = collection of instances Collect data from multiple sources to simulate real-word data for intended use • Genders, races, ethnicities, socioeconomic statuses, etc. Exception: Transfer learning • Bulk of data obtained from a different domain because more of it is available • Bulk of training done with this data • Use smaller set of real-world data from intended domain for tuning

What are some methods to evaluate ML models?

Most methods to evaluate models are for supervised models Some can be used for both supervised and unsupervised models A few are more often used for unsupervised models Unless noted to be for unsupervised models, the methods displayed are primarily (or only) for supervised models Not all methods of evaluating a model are shown

What is the recommended ratio of features to instances in a data set to avoid overfitting?

No more than 1 feature per every 10 instances.

What is irreducible error in machine learning?

Noise that can't be reduced by optimizing algorithms, but sometimes can be reduced by better cleaning of data. Due to: Inherent randomness, Misframed problem, Incomplete feature set

What are the characteristics of machine learning?

Not based on human knowledgebases or specified rules. Uses algorithms to learn repetitive data patterns. Can discover new patterns, make predictions Handles large data sets in complex settings

Does Machine Learning make assumptions about the characteristics and distribution of the data?

Not usually, Machine Learning does not make assumptions about characteristics and distribution of the data.

Is the reason for output clear and explainable in Machine Learning?

Not usually, the reason for output in Machine Learning is not always clear and explainable (black box problem). Traditional statistics usually clear

What is a label?

Observed value for a feature of an individual instance

What is overfitting in ML?

Occurs when statistical model exactly fits training data BUT... • Does not fit new data well (test or production data) Training set has low error rate but test set has high error rate = high variance •Most common problem for any statistical model using a training set

What is a ML epoch?

One pass through the training data

How do Artificial neural networks (ANNs) operate?

Operate via flow of signals through nets of connections, akin to biological networks Network trained to optimize signals of each node-to-node connection via adjusting weights (w) and biases (b) (no control over activation (a) functions)

What is Gradient boosting sequential ensemble?

Operates similarly to AdaBoost EXCEPT model trains on the residual errors of the previously run model Residual errors calculated using gradient descent algorithm to reduce the loss function

What is the Training mechanism of a Support Vector Machine?

Optimize boundary by maximizing margin between support vectors of different classes Focus on borderlines is UNIQUE (outliers automatically ignored)

What are ML Supervised Algorithms Classification Methods?

Output is a categorical (class) variable Used for predictions, classification, categorization Predicts probabilities that an outcome (dependent variable) belongs to a particular class (independent variable) e.g., automated diagnosis, image classification

What is the most common problem for any statistical model using a training set?

Overfitting.

What is the Ward method?

Pair of clusters merged when smallest increase in within-cluster variability. Measured by... • Determining cluster center (centroid) then... • Computing squared distances between each cluster and cluster center • Subtracting squared distance of cluster A from squared distance of cluster B • Clusters with lowest sum of squared errors (SSE) are paired

What is a ML hyperparameter?

Parameter that is manually set prior to running the algorithm (not set by the model) Manually specify change or limits in input weights that loss function can make per training step These are adjusted during model optimization (tuning) Examples of hyperparameters: • Limit on total # epochs • K in K-means clustering

Give an example of AI in Classifications

Pattern detection (e.g., diagnoses), feature detection (images)

What is the role of the agent in reinforcement learning?

Perceives the state of the environment and executes actions based on the state.

ML Model Development Step 11: Test model on testing data set

Performed by model developers on final model after all training and validation (tuning) is completed Performance should be approximate to that of the validation data set(s) • Differences can be due to bias or variance • Check for... • Overfitting (low bias, high variance) • Inadequate sample size during training and validation • Correct methods used for evaluation of performance • Correct model used for data and problem

What is accountability in AI?

Person(s)/entity(ies) accountable when something goes wrong with AI Can be personal, organizational or regulatory Medicolegal liability • AI is not standard of care • Regulations not yet developed in US • EU paper that discusses that liability is based on physician using standard of care

What is the Reciever operating characteristic (ROC) curve?

Plot of sensitivity against (1-specificity).

How does k-Nearest Neighbor work at a high level?

Plots instances in multi-dimensional space NO MODEL (Instance-based method) Knowledge is stored in the structure of the mapped data (training data not discarded)

What is Polanyi's Paradox?

Polanyi's Paradox refers to human decision-making that goes beyond explicit understanding or description.

What are the uses and benefits of AI?

Predictions, Classifications, Decision support, Signal conversion, Problem-solving, Anomaly detection, Assistance with time-consuming tasks

What are Dimensionality Reduction Methods in ML Unsupervised Algorithms?

Principle Components Analysis (PCA) Common uses: Use to identify unimportant dimensions OR... • Select first several dimension components (highest ranked) for features selection only How it works (high level) • Principle component: axis through the data that is a function of contribution of variability in a population • Aligns with maximum variance in a population AND • Each principle component has to be orthogonal (at right angles) with all other principle components • >3 dimensions cannot be visualized • 1 principle component per dimension Input: Usually requires continuous metric dimensions (features) • Variations can use categorical variables

What is the Output for ML Decision Trees?

Probability of class membership of the new instance

What is feature engineering?

Process of transforming raw data into a data set • Transform categorical variables into metric continuous (numeric) • One-hot encoding • Transform categories into an array of binary switches, one item per categories • Adds dimensionality to features (complexity) • Map ordinal values to numbers • Map categorical value to its statistic (e.g., mean) • Transform continuous metric (numeric) variables to categorical • Discretization (a.k.a. binning / bucketing) • Typically done by converting data based on numeric range into differently named bins or buckets • Normalization: convert actual range into normalized range • Standardization: rescale feature values to a normal distribution

What is back-propagation in ANN?

Process where ANN learns whether it made a mistake or not based on output • Adjusts internal parameters of transfer functions (nodes) using loss functions and stochastic gradient descent functions in waves propagating backwards from the output nodes to the input nodes • Helps speed up processing

How does K-Means Clustering work?

Randomly assign instances to clusters Compute distances of all instances in the cluster from the centroid (center of the cluster) using a defined distance metric Move instances closest to centroid of the Kcluster to that K-cluster Recalculate the centroid position REPEAT from above

How is a bootstrapped population created?

Randomly drawing instances from the original population Start with data set of n instances (original population) and create a new population of n instances (bootstrapped population) by: 1. Randomly drawing a single instance out of the original population and place it in the "bag"; chance of selection = 1/n. 2. Randomly draw another instance out of the original population as if the first instance had never been removed (or had been replaced in the original population); chance of selection still = 1/n per instance. Selected instance may be the same or different as one previously selected. 3. Repeat process until n draws have been completed.

In ML how is the training data partitioned in k-fold cross-validation?

Randomly into equal # (k) subsamples (folds). -k is typically set between 5 and 10 -If k = 10, then hold out 1-fold as a validation data set and use the remaining 9 folds collectively as the training data set -Train the model on the collective 9 folds together then validate and calculate performance metrics on the remaining validation fold -Process is repeated until all combinations of training data and validation data are evaluated -Quality metrics performed on each validation fold are averaged across each k-run

What is a foundation model in generative AI?

Refers to AI model that can be used as a foundation for a range of other tools e.g., ChatGPT is a tool that uses the GPT-4 LLM as a foundation model

What is a feed-forward network?

Refers to any ANN (or portion of ANN) which is unidirectional from input to output. Oldest type of ANN

How do you correct a Model that has too many features (>1 per 10 instances)?

Remove irrelevant input features (avoid underfitting) Dimensionality reduction Increase total number of instances

ML Model Development Step 12: Deploy model

Remove model from training environment • Put model into the device or software where it will be used Types of deployed models • Static model: • Develop model via training --> stop training --> use model in static manner • Most common model used in medicine • Incremental or continuous model: • Incrementally or continuously retrained after deployment • Cannot be used in certain medical environments under federal law (e.g., laboratories) • Requires special monitoring and controlling to ensure that model is still performing accurately Verification • Performed by the local site using the final model • Check to ensure performance after transit and deployment in a new setting • Similar to verification of FDA-approved systems after they are received

What is a ML algorithm?

Repeatable process used to train a model from a given set of training data

What are the cons of k-fold cross-validation?

Requires training and validating the model k times. (i.e., if k=5, total # training epochs = 5, total # validations = 5)

What is bootstrapping?

Resampling cross-validation method

How does Decision Trees work at a high level?

Root node -> internal node -> leaf node Each node is a feature that is examined • If value of feature is below a specified threshold, left branch is followed; otherwise, right branch • Threshold splits population by purest possible subsets of classes • Each internal node = 1 feature (independent variable) • Each leaf node = outcome class (dependent variable) Repeats recursively until purity cannot be improved "Growing the tree" = training

What are association rules in ML Unsupervised Algorithms?

Rule-based method to find relationships between independent variables. Generates associations, not causation Rated in terms of support and confidence -Support % total transactions from a transaction database that the rule satisfies -Confidence Degree of certainty of an association Data must be converted to categorical data to use this algorithm type Can be done through discretization = converting continuous data into bins

What are expert systems?

Rules, relationships, ontologies explicitly coded or programmed into a knowledgebase. Rules engines of expressly programmed IF-THEN statements (e.g., MYCIN, Internist-I, CADUCEUS) Handles limited amounts of data compared to machine learning

What is Data Science?

Science of organizing and analyzing massive amounts of data.

What is ML transfer learning?

Separate category vs. subtype of supervised learning Data used for training the model are transferred from a different related domain • Data were developed for use in a domain different than the one intended for the model • Example: Using natural images from ImageNet to train a models for medical images [Alzubaidi et al 2021] Coarse training done on transferred data Fine tune training with smaller data directly related to domain of use Reasons • Data are expensive • Higher quality and quantity data may be more available, cheaper in another domain

ML Model Development Step 7: Choose quality metrics based on model

Should be appropriate for machine learning method AND purpose of the model • e.g., looking for majority classes vs. minority classes (screening for low incidence conditions)

What is an instance in ML?

Single event in a data set # instances required to train a model depends on the problem and model used

What is the input of Simple Linear Regression?

Single input variable (x; independent variable)

What is the output of Simple Linear Regression?

Single output variable (y; dependent variable) Output is a continuous metric (numeric) variable

What is ML Weakly supervised learning?

Small amount of data have detailed labels; rest have fewer labels

What is ML Semi-supervised learning?

Some data are labeled while other data are not Unlabeled data may be auto-labeled to match patterns on labeled data

What is Deep Learning?

Specific set of ML tools designed to handle big data, such as specific neural networks.

How is RMSE calculated?

Square root of the average of the squared differences between observed and actual measurements.

ML Model Development Step 6: Select machine learning model

Supervised (known data output) or Unsupervised (unknown data output)

What are the Limitations/Disadvantage of ML Decision Trees?

Susceptible to overfitting (high variance) because small variations in data can cause branches to be created that are not useful Mitigate by limited number of nodes allowed in a branch OR... By preventing nodes from being added unless they produce a statistically significant increase in purity (i.e., "pruning" the decision tree)

What are connections in an ANN?

Synapses or edges.

Shallow vs Deep Neural Networks

Table showing comparison

What are some common applications of RNN?

Text classification, natural language processing, anomaly detection in quality control data

What is a value in reinforcement learning?

The future reward received by taking an action in a particular state.

What is a possible cause of overfitting related to the data itself?

The training data being more complex than real-world data.

What is signal in machine learning?

The true underlying pattern you are trying to learn from the data. Well designed machine learning separates signal from noise

What is an advantage of CNNs compared to other neural networks?

They are less of a 'black box' and provide some insight into the chosen features.

What is a limitation/disadvantage of CNNs?

They still don't provide an explanation for why certain features were chosen.

Deep Neural Networks

Thousands to millions of interconnections for non-linear computations.

What is the purpose of a convolution (filter) layer in a CNN?

To amplify data (pixels for images) that match the pattern of weights, creating hot spots.

What is the goal of reinforcement learning?

To learn a policy (value function) that maximizes long-term reward.

What is the purpose of a pooling layer in a CNN?

To mask data except for the amplified values (hot spots), creating a feature (activation) map.

What is the purpose of a feed-forward network in a CNN?

To provide an overall classification of the data (image) based on the processed features.

What are some causes of overfitting related to the model and its settings??

Too many features selected for # instances in the data set (most common reason for overfitting) • > 1 feature selected per every 10 instances is too many • Excessive detail in training data Model selected is designed for data more complex than the data examined • e.g., using a neural network for linear relationships • too many parameters learned Model trains too long (too many epochs)

What is bagging in Parallel ensembles?

Train each model against random subset of the training data, i.e., bootstrap aggregating Attempts to reduce chance of overfitting complex models Trains large number of relatively unconstrained models (strong learners) in parallel Combines all models together to smooth out their predictions Bootstrap each model in the ensemble of models then base outcomes on -Total # outcomes OR -Aggregate probabilities of each prediction

What is random subspaces in Parallel ensembles?

Train each model with entire training data set but use random subsets of features per model Useful when number of features is large

How can the bootstrapped instances be used?

Train the model on the bagged instances Validate or test the model on the out-of-bag (OOB) instances Repeat the bootstrapping process many times and calculate performance metrics for each collection of out-of-bag (OOB) instances and then average them Advantages • Works well on small data sets; handles outliers well • Makes no assumptions about distribution of data (i.e., no assumptions about data being normally distributed) • Allows for calculation of standard errors and confidence intervals Limitations • Can require long computation time; will have margin of error

What are parallel ensembles in ML?

Training different models in parallel with each other. Methods to create diversity in parallel ensembles (helps decrease overfitting)

What are the inputs for k-Nearest Neighbor?

Training instances (100 to 100,000), distance function, and k (hyperparameters).

What is the Input for ML Decision Trees?

Training instances > 100 up to 1,000,000 ideal

What are the Inputs for parallel ensembles - Random Forest?

Training instances > 100 up to 1,000,000 ideal 3 main hyperparameters need to be set before running the model • Node size, # trees allowed, # features sampled • Trees are kept short (limited # nodes in branches) Data • Each sample population is a randomly bootstrapped sample with 2/3 used for training and 1/3 used for testing (bagging) • Each tree limits splitting strategy to random subset of features (random subspaces method)

What is ML Supervised learning?

Trains on classified and/or labeled data • Goal: train model to generate known answers, patterns or relationships

What are nodes in an ANN?

Transfer functions similar to neurons.

What are clustering methods?

Unsupervised methods to discover patterns in data. Goal: discover patterns or relationships (dimensions) between data instances in a population

What is an Autoencoders in Dimensionality Reduction Methods in ML Unsupervised Algorithms?

Use neural networks to compress data then recreate data as output Part of an artificial neural network that uses unsupervised learning to reduce data dimensions (reduce noise)

What are the common uses of ensemble methods?

Used on output of high variance (imprecision) and low bias (low inaccuracy) Group of simple models may outperform a single complex model Can be trained in parallel or in sequence (series)

What are regression methods used forin ML Supervised Algorithms?

Used when expected output (dependent variable) is continuous (numerical) Requires the least number of training instances Describes, estimates or predicts the linear relationship between >=2 numerical variables e.g., estimating life expectancy, staffing shortages, population growth prediction, pandemic spread

How does logistic regression work?

Uses simple logistic function to fit data to sigmoid output Goal is to maximize the maximum likelihood estimation (contrast regression methods)

What is an independent variable in ML?

Value of variable is NOT dependent on any other variable Synonyms in AI/ML: Feature, Vector, Attribute, Predictor, Factor (categorical), Covariate (numerical)

What is a dependent variable in ML?

Value of variable is dependent on the value of a different variable Synonyms: Target, Class, Outcome, Response

What is convolution in the context of CNNs?

mathematical operation on two functions to produce a 3rd function that describes how shape of one function is modified by the other

What are some special regularization techniques for neural networks?

Weight decay, Dropout, Batch normalization.

Related to Step 5 of ML development to many features can overfit data

When a model is overfit, consider reducing the number of features Dimensionality reduction methods • Unsupervised machine learning model • Unique because used as an adjunct toanother model • Goal: reduce the number of selected features (dimensions) by determining most important ones • See unsupervised ML methods below

When does the cycle in K-Means Clustering stop?

When centroids stop moving location (no more cluster moves).

When is the Monte Carlo method used in reinforcement learning?

When one or more elements are unknown, such as probabilities of an action leading to a state. It runs through the process many times, reaching each state multiple times and averaging the outcomes from all previous experiences in that state. Probabilities of action leading to a state. are calculated

What is the accuracy paradox?

When the best classifier for the intended use has a higher error rate than the null error rate.

When does the accuracy paradox occur?

When the condition or outcome is a very low percentage of the overall dataset.

When does the training of GAN stop?

When the discriminator can no longer distinguish between real and fake instances.

What are the Outputs for parallel ensembles - Random Forest?

When used for classification • Output of each model aggregated by "voting" • Each vote weighted equally (i.e., not weighted, unlike boosting) • Hard voting: most frequent class selected is voted for • Soft voting: averaging probabilities for each class, selected then calculating average probability per class then selecting class with highest average probability

What are the advantages of logistic regression?

Widely used and easy to understand.

Can Machine Learning handle a large number of input variables?

Yes, Machine Learning can handle a large number of input variables. Traditional statistics not usually.

Can Machine Learning use complex multifactorial data?

Yes, Machine Learning can use complex multifactorial data. Traditional statistics not usually

Does Traditional Statistics make assumptions about the characteristics and distribution of the data?

Yes, Traditional Statistics makes assumptions about characteristics and distribution of the data.

statistical association

a relationship or correlation between two or more independent variables that is unlikely to be due to chance

2x2 table can be used to model confusion matrix in ML

a two-way table where the explanatory and response variables each have two categories Accuracy or Correct Classification Rate=TP+TN/TP+TN+FP+FN Misclassification/error rate (1-accuracy)=FP+FN/TP+TN+FP+FN Sensitivity (Recall, True Positive Rate)=TP/TP+FN Specificity (True Negative Rate)=TN/TN+FP Precision (Positive Predictive Value)=TP/TP+FP

What is Fβ score?

allows weighting of importance of precision vs. recall • If recall is 2x as important as precision, then β = 2 • Fβ score = F score when β = 1

What is a Support Vector in a Support Vector Machine?

data point closest to the boundary (hardest to classify) Direct bearing on optimum location of the decision plane


Related study sets

Javier Portillo, ECON 202, Exam 1 study guide

View Set

History of costume. chapter 9 quiz

View Set

Chpt 10 Quality Management & Six Sigma (BA339)

View Set