ai hw7
supervised learning
experience a data set containing features, but each example is also associated with a label or target
classification
type of machine learning task where the computer program is asked to specify which of k categories some input belongs to
capacity
a model's ability to fit a wide variety of functions. low capacity may struggle to fit the training set, high capacity can possibly overfit
unsupervised learning
"Any learning technique that has as its purpose to group or cluster items, objects, or individuals" algorithms that experience a data set containing many features, then learn useful properties of the structure of this dataset
data points
another name for an example
clustering
one of the roles in unsupervised learning algorithms, which consists of dividing the dataset into clusters of similar examples
feature
a collection of features make up an example (ex. the features of an image are usually the values of the pixels in that image)
example
a collection of features that have been quantitatively measured from some object or event that we want the machine learning system to process
dataset
a collection of many examples
machine learning
a form of applied statistics with increased emphasis on the use of computers to statistically estimate complicated functions (algorithm that is able to learn from data)
reinforcement learning
algorithms which interact with the environment so there is a feedback loop between the learning system and its experiences
the experience, E
machine learning algorithms can be broadly categorized as unsupervised or supervised by what kind of experience they are allowed to have during the learning process (experiencing an entire data set)
the task, T
machine learning allows us to tackle tasks that are too difficult to solve with fixed programs written and designed by humans, but learning itself is not the task (ex. teaching a robot to walk, walking is the task)
overfitting
occurs when the gap between the training error and test error is too large
underfitting
occurs when the machine learning model is not able to obtain a sufficiently low error value on the training set
the performance measure, P
quantitative measure of a machine learning algorithm's performance (usually specific to the task)
label
the final output labelled data are groups of samples that have been tagged to one or more labels (yi producing the label for example i)
accuracy
the proportion of examples for which the model produces the correct output
hypothesis space
the set of functions that the learning algorithm is allowed to select as being the solution this is one way to control the capacity of a learning algorithm
target
the target variable is the feature of a dataset about which you want to gain a deeper understanding
regression
type of machine learning task where the computer program is asked to predict a numerical value given some input
design matrix
way of describing a dataset, it is a matrix containing a different example in each row, and each column corresponds to a different feature