Machine Learning Engineer Terms
Multinominal NB
A discrete distribution used whenever a feature must be represented by a whole number (for example, in natural language processing, it can be the frequency of a term)
semi-supervised learning
A model that is trained on data with some labeled and some term-9not labeled.
Stemming
A process of reducing words to their respective root forms in order to better represent them in a text mining project.
Self-Training
A way to solve tackle semi-supervised learning. The procedure in which you can take any supervised method for classification or regression and modify it to work in a semi-supervised manner, taking advantage of labeled and unlabeled data.
Bernoulli's NB
Binary distribution useful when a feature can be present or absent
How to Ensure you're not overfitting
Collect More Data Reduce Number of Features Ensemble Method Early Stopping Cross Validation
CNN (Convolutional Neural Network)
Convolutional neural networks are a specialized type of artificial neural networks that use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers. Mainly used to analyze visual imagery.
Recall
False Negative Rate TP / TP + FN Ratio of True Positives to False Negatives
Precision
False Positive Rate TP/ TP + FP Ratio of True Positives to False Positives
Type 1 Error
False Positive. You predicted presence when it wasn't=
Reinforcement Learning
Favorable or non-favorable outputs or rewards are given as a result of what the computer does. Aims to maximize rewards
Supervised Learning
Machine learning model that is trained on labeled data.
Gini Impurity
Quantify the amount of uncertainty at a single node
Information Gain
Quantify the amount that the node reduces the amount of uncertainty. Higher the worse Probability of a Random sample being classified correctly if you randomly pick a label according to the distribution of the branch
ROC curve
Receiver Operating Characteristic It shows the Models tradeoff points of False Positives to False Negatives
Bootstrap sample
Sample with replacement from the original sample, using the same sample size. Used to estimate the population statistics form a small data sample.
RNN (Recurrent Neural Networks)
Type of Neural Network that uses sequential data or time series data. They are distinguished by their "memory" as they take information from prior inputs to influence the current input and output.
Unsupervised Learning
Unlabeled data
Lemmatization
grouping words together based on their basic dictionary definition