BCOR 2205 Final Exam Information and Analytics Management
Round 1 %
32%
Round 2 %
64%
LogLoss
A measure of accuracy; Rather than evaluating the model directly on whether it assigns cases (rows) to the correct label, the model is evaluated based on probabilities generated by the model and their distance from the correct answer; Lower scores are BETTER
Algorithm
A step by step procedure for solving a problem or achieving a specific result. Used in computer programming, mathematics, engineering, etc
Holdout Set
A subsection of a dataset to provide a final estimate of the MLS models performance after it has been trained and validates. Should never be used to make decisions about which algorithms to use for improving tuning algorithms
Machine Learning
A subset of AI, practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world
8 Criteria of Auto ML Excellence
Accuracy, Productivity, Ease of use, Understanding and learning, Resource availability, Process transparency-effects understanding and learning, Generalizability across contexts, Recommended action
Blending
After cross validation has run models are internally sorted by cross validation score and then the best models are blended
Features
Can be thought of as the independent variables we will use to predict
Example of Discrete Data
Course Letter Grade
Supervised ML
Data scientist tells the machine what it wants it to learn (identifies target)
Ml Life Cycle
Define Project and Objectives, Acquire and Explore Data, Model Data, Interpret and Communicate, Implement, Document and Maintain
False Negative Rate
FN/(FN+TP)
False Positive Rate
FP/(FP+TN)
Speed (Datarobot)
Fastest on left, slowest on right
Feature Effects
Feature Impact for specific feature values
ChatGPT
Large Language Model created by OpenAI (Generative Pretrained Transformer)
Accuracy (Datarobot)
Least accurate at top, most accurate at bottom
Artificial Intelligence
Machines that can preform tasks that are characteristic of human intelligence
Binary Data
Nominal attribute with only two categories/states
Example of Nominal Data
Occupation
Categorical Data
Qualitative; Described by words rather than numbers
Numerical Data
Quantitative: Arise from counting, measuring, or some kind of mathematical operation
Machine Learning Pipeline
Raw data to features to models to deploy in production to predictions
Example of Algorithm
Recipe for baking a cake, the method we use to solve long division problems, process of doing laundry, etc.
Business Problem Requirements
State the problem in language of business, specify action, include specifics, explain bottom line impact
Training Set
Subsection of a dataset from which the MLS uncovers or learns relationships between the features and the target variable
Validation Set
Subsection of a dataset to which we apply the MLS to see how accurately it identifies relationships between the known outcomes for the target variable and the datasets other features
Specificity
TN/(TN+FP)
Accuracy
TP+TN/All Cases
Sensitivity
TP/(TP+FN)
Learning Curves
Teaches us if additional cases will help or not; Shows the predicative ability changes with 'sample size'
Over Training/Over Fitting
The model simply memorizes the training examples and is not able to give correct outputs also for patterns that were not in the training dataset
Feature Impact
The overall impact of a feature adjusted for the impact of the other features
Importance
The overall impact of a feature without consideration of the impact of other features
Target
The variable we are trying to predict and gain insights about
Cross Validation
Top four models; Only if validation set is <=10,000 rows
Large Language Model
Type of computer program that has been trained on a lot of text to understand and generate human-like text
Discrete Data
Under Categorical; Finite number of options
Continuous Data
Under Categorical; Infinite number of possible responses, like any point on a number line
Unsupervised ML
Up to the machine to decide what it wants to learn
Nominal Data
You can identify groups are different, but no meaningful ranking
Text (Strings)
You specify a number of characters