IS - ML
accuracy is the
proportion of correct classifications
error rate is the
proportion of misclassifications
bias node value is always
1
output in naive bayes
assign a class with maximal probability for the new dataset
bayesian learning (Naive bayes classifier) steps
- Decide on features - do pre-processing: lowercase words, remove punctuation, stem, remove stop words... - Count occurences (can use tf - idf) - count probabilities (apply smooting if needed) - apply naive bayes to find the class with maximal probability
Preprocessing in consists of
- Deciding on features - splitting data in training and testing parts
TN
I predicted negative and it was actually negative
classifier learns relations between
labels and features
how to evaluate ranking algorithm
precision @ k
Limitations of AI
- Limited ability to learn and adapt to new situations without instructions. - Difficulty in analyzing, interpreting unstructured data, such as text or images - If the data used to train an AI system is biased, the system will also be biased. - Limited creativity - lack of common sense and lack of unnding (ability to connect concepts (red) to objects (rose) without instructions)
difference: validation set and test set
- Validation sets are used DURING TRAINING to adjust the model's parameters and see how the training process is going so far,, - test sets are used AFTER training to evaluate the FINAL performance of the model on UNSEEN data.
unsupervised learning meaning
- When an AI system can look at data on its own and build rules; - NO CORRECT ANSWERS (no previous info) GIVEN!
autonomous weapons - problems
- can be used in unethincal way or against the law - thus, we need laws in order to develop and use them - If an autonomous weapon causes harm or damage, it is not clear who would be responsible for the consequences
why privacy is a huge concern in AI?
- collects personal data - might be sharing it with third parties (foe ex companies) - may keep data forever - AI systems may be vulnerable to cyber attacks and information leaking
how do you convert text into text feature vector?
- do preprocessing (lowercase the words, remove punctuation, stem, remove stop words) - COUNT OCCURENCIES (FT-IDF)
K-means clustering algorithm
- it is unsupervised learning method, so it does not require labeled data. It identifies patterns and group similar data points itself into k groups BASED ON THE MEAN DISTANCE BTW DATA POINTS AND THE CENTROID OF THE GROUP.
Libet experiment - 2 conclusions
- our conscious intention for an action is NOT the cause of that action. - brain actifity was first, before individual WILLED anything to happen - brain knew when you would click before you consciously did!
algorithms/methods for classification:
1. Artificial neural networks 2. linear classification 3. Bayes classifier 4. with KNN 5. decision trees
ML Steps
1. Choose the features 2. Choose the model class 3. Choose a search method
Asimov's Laws
1. NOT LET TO INJURE - A robot may not injure a human being or allow a human being harmed 2. OBEY ORDERS - A robot must obey orders given it by human beings except where such orders would conflict with the First Law. 3. PROTECT ITS OWN - A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
types of supervised learning
1. classification- assigning CLASS to each example 2. regression - assigning NUMBER to each example 3. collaborative filtering - recommendations 4. ranking
shilouette score: what is it, what it measures, what is being used to calculate it
1. it is the MEAN DISTANCE btw point x and all other points of the same cluster. 2. It measures how well-separated clusters are in a clustering algorithm. 3. It is calculated using intra-cluster distance and inter-cluster distance.
A good clustering will produce high quality clusters in which:
1. the intra-cluster similarity is high (similarity of the points in one cluster) 2. the inter-cluster similarity is low (similarity of the points between clusters)
orthogonal
90 degree angle
how feature vector could look like?
<1,1,0,1,0>
K-nearest neighbor classifier
A SUPERVISED learning technique that classifies a new observation by finding similarities ("nearness"/ PROXIMITY) between this new observation and the existing data it finds the most similar case and copies its label
Silhouette Coefficient
A measure between -1 and 1 to evaluate clustering algorithms; Higher value indicate that a point is well clustered and vice versa
Design Matrix
A tool used to compare design solutions against one another, using specific criteria - when we are thinking if we should choose 2D, 3D or 4D space to represent a problem?
ML is subset of .. and ...
AI and Data science
Feedforward Neural Network
a type of artificial neural network. - no loops - information flows ONLY forward
Baye's rule vs naive Bayes classifier
Bayes' rule is a general math formula, while naive Bayes is a specific ML classification algorithm that is based on Bayes' rule.
ranking and ex
COMPARING ITEMS and ranking them based on keywords example: web search, image search
Searle's Chinese Room argument
Computer is just excecuting the program and cannot actually "understand" what it is doing, regardless of how "intelligently" the computer seems to act.
confusion matrix
a way to evaluate classification model. It is a table with false positive, true positive, ... etc
recall minimizes
FN - minimizes the ERROR VALUE (the distance between linear regression and datapoint)
precision minimizes
FP
domain in rdfs is where the line is pointing
FROM
vagueness vs uncertainty: what is the difference
For ex, the concept of "tallness" is vague because it is difficult to say exactly how tall someone or something must be to be considered tall; while uncertainty is a lack of knowledge/confidence about a situation
FN
I predicted negative and it was actually positive
TP
I predicted positive and it was actually true (positive)
FP
I predicted positive, but it was actually false
stemming
In keyword searching, word endings are automatically removed (lines -> line) , you replace word with its stem
Deep learning is a subset of
ML
bayesian learning aka
Naive bayes classifier
bayesian learning is also called
Naive bayes classifier
Ethics and law: connections
Not all unethical behaviour is illegal Not all illegal behaviour is ethical Not all legal behaviour is ethical
give me an example of a vector in a feature space:
Number of positive words; length of the sentence; etc..
Binary classification
T/F; positive/negative values
range in rdfs is where the line is pointing
TO
Tf-idf =
Tf * idf
Veil of Ignorance
The Veil of Ignorance shows us that ignorance is not always detrimental to a society, sometimes it is better to ignore something
feature set
The group of features your machine learning model trains on. For example, postal code, property size, and property condition might comprise a feature set for a model that predicts housing prices.
what is Syllabisation in natural language processing?
dividing a word into its individual syllables (skiemenys)
Clustering and examples
discovering structure in data and grouping items example: when you go to "photos" and type "sunset" and get all pics of the sunset
Turing Test
a test of computer's intelligence - if human can't say if they are talking to machine or another human, machine then is intelligent
End-to-end evaluation means that we evaluate the performance of an algorithm on .... , from ... to ...
a complete dataset, from the initial state to the final results
maximum margin hyperplane
a hyperplane that separates two classes of data points in a multi-dimensional space with the maximum possible distance between the hyperplane and the nearest data points of each class.
feature space and example
a space in which each dimension represents a feature that characterizes your data. For example, if your data is about people, your feature space might be Gender, Height, Weight, Age
"label" or "class" in a training set in classification is
an attribute that we want to teach ML to determine
ANN
artificial neural network
Classification and example
assign a CLASS aka LABEL to each example; for ex: 1. program which converts handwriting to digital text 2. face recognition 3. autonomous driving
(linear) regression and example
assign a NUMBER to each example; example: stock market
"features" in a training set in classification are
attributes that ML can learn from
sequential sampling aka
autoregressive sampling
why the Naive Bayes algorithm is called naive?
because it assumes conditional independence of all the features used for classification
hamming distance formula
bits different / all bits
Hyperplane
boundary that separates different classes or categories in a multi-dimensional space.
in naive bayes,the formula we use is
chaining
unsupervised learning type
clustering: grouping items WITHOUT training data
what is Discretisaton in natural language processing? + example
converting continuous data into discrete (categorical): for ex, when you have ppl with diff height, you create categories: ppl with height 150-160; then 160-170, etc
K-nearest neighbor classifier is represented in
coordination plane, also called FEATURE space (because dots in there are FEATURES)
Supervised Learning
correct answers (previous information) are given for each example!
we don't use smoothing in
cosine similarity, hamming Distance and euclidean distance
we can find KNN with
cosine similarity, hamming Distance or euclidean distance
Euclidean distance formula in KNN - in your own words, what do you need to do?
count the number of times when 2 sentences differ, and then take the square root of it
solution to avoid overfitting
cross validation
preprocessing steps in bayesian learning
data preparation - lowercase words, remove punctuation, do stemming, removing stop words
input in naive bayes
data which belongs to certain classes + one set of data with unknown class
Part of Speech Tagging
defining verbs, nouns, etc. in a sentence
Rule-based reasoning is just a
different way of proving entailment
regression distance (error value) isw
distance from the datapoint to the regression line
the trolley problem is used to
explore ethical dilemmas and decision-making
classifier divides the
feature space
we use Naive Bayes to
find class with maximal probability
Libet experiment findings leads us to argue that
free will is just an illusion, and our decisions are more like a "report" of what is already happening in the brain
When we use Naive Bayes to detect spam, the algorithm discards the order of the words and considers only their
frequency
shilouette score interval + what values here mean?
from -1 to 1. The higher the number, the better clustering
sci-fi ethics
future ethics
the ... the number for precision or recall, the better
higher
Validation/development set aka
hold out set
weird thing how machines fool ppl in turing test
humans associate intelligence during turing test with many other things which are not actually intelligent, for example using " ;) " , words fillers and many exclamation marks
sentiment analysis
identifying an attitude based on text
Named entity recognition
identifying and labelling words in text
Alan Turing theory
if a computer can convince/ fool person that they are interacting with another human being, computer is intelligent and is able to "think"
what is hyperplane un 2D space and 3D space?
in 2D, it is a 1D line that separates different categories. In 3D, it is a 2D plane
what is neighborhood distance?
in KNN it is a parameter that determines how far from a given point the algorithm should look for other data points aka "neighbors" for prediction
cosine similarity vs Hamming Distance vs euclidean distance: main difference
in cosine similarity, we want to MAXIMIZE the number because the bigger the similarity, the closer the neighbor. for euclidean and hamming distance, we want to MINIMIZE the number bc it represents a distance = when distance is smaller, they are more similar
"implication is transative" means that
in relationship btw 3 statements: if A -> B and B -> C, then A -> C
layers of neural networks
input layer, hidden layer, output layer
Turing test - what is the argument based on which we can argue its quality?
it uses human intelligence as the standard for intelligence; incorrectly assumes that all intelligence will be similar to human intelligence
k meaning in k-NN algorithm
k defines how many neighbors will be checked to determine the classification of a new instance
when we have 0 in naive Bayes classifier table, we use
laplace, simple or naïve smoothing
in K nearest neighbor classifier, there is no
learning phase
hyperparameter in ML is a parameter whose value is used to control the ...
learning process
Lemmatization
like stemming, but more accurate (for ex would convert "good" from "better")
what is Grounding in natural language processing?
linking words to their meaning in the real world; true understanding of abstract concepts requires sensory experience! we have it; remember example with red rose
gradient descent
local search similar to hill-climbing, used in 3D spaces when searching for a point with the lowest error rate. We move steb by step and check
sequential sampling (autoregressive sampling) is
longitudinal - data is collected and analyzed over time, and researchers make decisions about what additional data to collect based on their findings from data they've already collected
error function other name
loss function
when you have a complex classification problem and need to fit it with a linear model,
modify the FEATURES so that you could still draw a linear model
the smaller eucledian distance, the ...
nearer the neighbour is
collaborative filtering example
netflix, spotify recommendations
"K" in k-means clustering represents
number of clusters
cosine similarity formula in your own words when using it in KNN
number of times where both sentences A = 1 and B = 1 ----------------------------------- number of times when A = 1 * when B = 1
reinforcement learning has
occasional rewards
overfitting
occurs when a ML model matches the training data so closely that the model fails to make correct predictions on new data
parsing
oouping words in a sentence into phrases. The way a sentence is parsed determines its meaning.
Precision @ k formula is the same as
our regular precision formula
Classification is about ... and not
predicting the correct CLASS label and not about ACCURATELY estimating probabilities.
Pros and Cons of K-Means
pros: fast, can work with many examples; can work in high-dimensional spaces cons: 1. K is an input parameter - > estimating the right k is difficult 2. it gets stuck in local minima 3. Often no optimal clusters are found 4. Outliers might lead to problems
Stochastic - synonim
random
operators in knowledge graphs
rdf and rdfs
ontology-based languages are
rdfs and rdf
error rate aka
residual sum of squares
measure to evaluate clustering algorithm (for instance, k-means clustering) quality
shilouette score (intra- and inter-class similarity)
precision
shows accuracy of our algorithm. It is the ratio of the cases that algorithm predicted "positive" and it was accurate / all cases that algorithm predicted "positive"
why maximum margin hyperplane is useful?
since distance btw the hyperplane and the nearest data points of each class is maximized, it reduces the likelihood of misclassification.
cross validation
spreading out validation sets over the whole training process
formula to calculate node's value in feedforward neural network
sum (child node value * weight)
the row of (marginal and not) probability distribution ALWAYS
sums up to 1
ML categories
supervised, unsupervised, semi-supervised, reinforcement learning
when you create a classifier, you test the new instances with
test set
tf and idf meaning
tf - term frequency; idf - Inverse document frequency (how common the word is in the collection of documents?)
Naive Bayes can still perform quite well even if
the assumptions behind Naive Bayes are not satisfied
smaller value in Hamming Distance means that
the distance is smaller, they are more similar and it is a closer neighbor
what are the FEATURES when applying naive bayes when training with sentences?
the keywords (words)
what k means in k-fold cross-validation
the number of groups that a given data sample is to be split into.
hamming distance meaning
the number of pieces of data you would have to change to convert one dataset (for ex sentence) into another.
Linear classifier
the plane/line that separates the two sets of data
recall
the ratio of the cases that algorithm PREDICTED "positive" and it was accurate / all cases when the outcome was actually "positive"
sigma means
the sum of
regression predicts .. , for ex
the value of a function, for ex house price given a huge dataset abt previous houses
Pragmatism in ethics rejects the idea that
there is any universal ethical principle
in ML, features are generic means that
they are general, global, designed to be used in different contexts
what we want to represent with ontologies/knowledge graphs?
to define relations; to classify
activation function of neuron is applied
to the output of a neuron in a neural network.
in nearest neighbour classification, the biggest problem is the noise and solution is ...
to use K Nearest neighbour classification
sets in ML:
training set Validation/development set (aka hold out set) Test set
It is more important to have a larger ... set than a larger ... set in order to avoid overfitting.
training, test
resolution rule is the same as
transative implication
when the machine learning model description involves "auto-labelling" or smth like this, we can immediately assume that this is .. ML algorithm
unsupervised
semi-supervised learning
uses both labeled and unlabeled data to learn the relationships.
when better (higher) score of recall is more important than high score of precision? give an example
when for ex you want to build a better system for detecting a cancer; because you want to minimize the cases of FN (when system predicts negative, but it is actually true)
Occam's Razor Principle
when many solutions are available for a given problem, choose the simplest one (using prior knowledge).
Moral Absolutism
when we as a society agree on ethical standards and then use it across all cultures
activation function of neuron determines
whether the neuron should be activated or not
How to evaluate classification models?
with Confusion matrix (Precision and Recall)
Precision and Recall
with these 2 formulas you can evaluate classification model
if in the task they say "use boolean feature of word "artificial", it means that
you assign value of either 0 or 1 / T or F to it in the table
"find precision at 5" means that
you have to look at FIRST 5 results and see which ones are correct, then write the value of coreect / k
main problems in AI ethics today
• Autonomous weapons • Privacy • (Racial) profiling
ways to compute similarity between examples based on their features:
• Hamming distance • Euclidean distance • Cosine similarity