AI Midterm Review
how does a multi-perceptron neural network work
since individual perceptrons are not interconnected in any way, they simply act as multiple individual perceptrons that happen to be acting in parallel with simialr inputs (ex: one perceptron for male/female, another for tall/short, but neither interact with each other)
in general, perceptrons are what kind of networks
single layer, feed forward networks with inputs flowing from left to right [no cross talk b/t perceptrons, trained by adjusting weights in a single layer]
in general, how does machine translation using machine learning work
starts by looking at millions of human-translated doc to learn patterns and use statistics to translate text. Issue is it has no real understanding of what the text actually means
the late 1950's-1960s is the __ period of AI. What method was primarily used to make progress. What did it lead to
"early successes and optimism" period; weak general methods were used (simple methods like logic, searching, and state space models used to solve small instances of problems due to the exponential time complexity). Led to Heuristics ["rule of thumb"/simple guideline/estimate that is usually true as an attempt to capture "domain knowledge" to limit the amount of blind searching required (ex: in checkers, it is better to have more pieces than opponent on the board)]
what is the Fermi paradox
(by Enrico Fermi) shows the apparent contradiction b/t lack of evidence of extraterrestrial civilizations despite Drake equation's high probabilities of their existence, suggesting either intelligent life is rare, or that they have just never contacted Earth
What is the Great Filter concept and who came up with it
(by Robin Hanson) argument that something is wrong with other arguments about the high probability of intelligent life, showing that there may be some filter reached by all intelligent life at some point which threatens its existence (could be ahead or behind us)
what is MNIST
(national institute of standard in technology) data set of hand written images of digits 0-9. 50,000 training, 10,000 testing, and 10,000 validation images (three sets). Each image is 28x28 for 784 pixels (so neural network of 784 inputs). Output is 10 neurons 0-9 (correct output should be near 1 and others near 0 cause signmoidal). Also have hidden layers of neurons for added depth of learning
what is optical computing
(or photonic computing) focuses on computer system using light instead of electricity (photons over electrons) to carry data transport and logic operations. Long distance is already photonic (fiber optics, etc) but data manipulation is electric, which slows data down a lot. Silicon photonics alleviate some slowdown by making internal operations a combination at least, where parts of computer share info optically
in the 1942 story "Runaround" by Isaac Asimov, the Handbook of robotics has three laws of robotics, which are
1. a robot may not injure a human being or, through inaction, allow a human being to come to harm. 2. a robot must obey the orders given to it by human beings expect where such orders would conflict with the first law. 3. a robot must protect its own existence as long as such protection does not conflict with the first or second laws
as of June 2019, how many floating point operations per second can teh curent best chip handle
148.6 petaflops (Rmax), 10 ^ 15 operations (1.5 * 10^17, where predicted that 2021 will have exaflop 10^18 flops
in __ (year), what two people published a simplified model of a neuron known as the __. What did this prove
1943, Warren McCulloch and Walter Pitts, McCulloch-Pitts Neuron (MCP neuron). Proved any logical problem could be encoded by appropriate network of MCP neurons [problem was no learning rules existed at the time, though it proved in theory that a network of MCP neurons could solve any logic problems]
the word artificial intelligence was first defined in [year] at [place]
1956 at Dartmouth Conference (10 scientists, worst "time and effort" estimate in history, trying to solve all of AI's goals in one summer
in __ (year), who developed the perceptron learning rule. What did it do
1957, Frank Rosenblatt. [perceptron is a MCP neuron] Allowed individual MCP neurons to be trained to recognize linearly classifiable problems (guaranteed to converge to a solution if one existed)
in __ (year), what two peopple published a paper called "Perceptrons" and what did it prove
1969, Marvin Minsky and Seymour Papert. Proved perceptron can't be trained to solve some very simple problems (like XOR) [severely limited] [possibly politically motivated, caused neural networks to be dormant for 15-20 years]
what are early examples of autonomous vehicles
1970's Stanford Cart by Hans Moravec; 1990's ALVINN (autonomous land vehicle incorporating a neural network) by Carnegie Mellon University
how are AND, OR, and NOT implemented as single perceptrons [L6, look at for pics and to practice making "circuits" with perceptrons like XOR]
2-input AND (X1 weight=1, X2 weight=1, threshold is 1 with weight=-1.5), 2-input OR (X1 weight=1, X2 weight=1, threshold is 1 with weight=-0.5), 1-input NOT (X1 weight=-1, threshold is 1 with weight=0.5)
Raymond Kurzweil predicts human level AGI and the singularity will happen in
2029 and 2045, respectively
what is the existential risk from an artificial general intelligence
AI surpassing humanity in general intelligence could allow it to be more powerful and difficult to control, forcing humanity to rely on the "goodwill" of the machine super-intelligence in order to survive
what is the current state fo natural language understanding
ASR (auto speech recognition) and TTS (text to speech) are solved, but going bidirectionally is not (too ambiguous)
what is the number e. How does it impact the sigmoid equation when Z (sum of weighted inputs + bias) is a very large positive or negative number
Euler number (2.718...) that is the base of the natural log. For large positive, e^(-z) becomes approx 0, so 1/(1+0)=1; opposite for large negative values
what is the way forward for AI with a classical approach
Doug Lenat has been working on Cyc project for 35 years. Believes computers need 'common sense' through programming it in a logic based language one fact at a time
describe the math representation of an n-input perceptron, and in english
F(x) = 1 if the sum of all Wi * Xi (from i = 1 to n) is >= the threshold [W is a weight and X is an input]. This means perceptron output is 1 or 0 (fire or no fire) where n inputs (X) are provided as real numbers. Weights are also provided to simulate the synapse (inhibitory or excitatory) where each weight is multiplied by the input on the appropriate line. Sum of weighted inputs is compared to threshold (theta) that controls how sensitive it is to inputs
the underlying percetron algorithm was developed by __ in __ (year) and later implemented in custom-built hardware as the __ (name of machine)
Frank Rosenblatt in 1957. "Mark 1 Perceptron"
one of the best examples of AGI in fiction is the movie "2001: A Space Odyssey." What was the AI's name, when did the movie release, who made it
HAL 9000 Computer in 1968 by Stanley Kubrick and Arthur C. Clarke (person who realized geosynchronous orbit)
what does the research of machine learning in AI focus on
based on designing systems that learn by example and have feedback mechanism to train them. Two approaches are biologically inspired ML (neural networks) and math and stat inspired ML (hidden Markov Models)
why is information processing growing exponentially
because individual pieces of technology S-curve in improvements (go from being invented to massive improvements quickly to leveling out), but as one tech levels out, another will pick up in speed
what data is used to determine N (number of civilizations) in Drake equation
avg rate of start formation, fraction of formed stars with planets, avg # planets that can potentially support life, fraction of those planets that have developed life, fraction of those planets that have developed intelligent life, fraction of those civilizations that have developed external communications signals, and the length of time those signals were set out for
the 1970's is the _ period of AI. Why is it called this? What outcomes/progress came from this period (2 general conclusions of the period). What did these conclusions lead to
Winter of AI; little progress was made over the last couple of decades is key areas (vision, speech recognition, etc), which led to pessimism. Some people considered it a dead end (black hole of talent). Focus thus shifted to micro-worlds (small, limited problems). Two general conclusions were knowledge is the key (key to solve problems was to add more knowledge; idea of frames and scripts, which are structures of objects and scenarios to go with those objects) and that some small domains are valuable (small problems isn't trivial, have value to them). These conclusions led to expert systems (apply knowledge to small problems)
[GOOD TEST QUESTION] in the last 10 years, three advances have led to the ascendency of artificial neural networks and much progress in pattern recognition problems that were previously intractable. What three advances made it more practical
advanced algorithms for deep learning [training networks with 5-10 hidden layers are practical], access to vast quantities of training data [Internet], and continued hardware improvements [GPU based parallel computing platforms have made available massive amounts of computing power necessary to train multi-layer networks on large databases]
what is the effect of the learning rate and the initial weight values
affect the speed at which you converge at a solution (lucky guesses can help speed it up, but if there is a solution, regardless of the quality of guess, it will be reached eventually)
the perceptron algorithm is
an algorithm for supervised learning, modeled on a single neuron, capable of performing binary classification in linearly separable domains
what is the difference in calculation b/w perceptron and sigmoid neuron
both calculate sum of the weights times the inputs (plus the threshold, or bias for sigmoid), but instead of using the threshold compactor, sigmoid uses function sigma, 1/(1 + e^(-z))
define AI
branch of computer science with a goal of the construction of a general purpose intelligence - of constructing machines that are capable of doing all things which, at the present time, people are better
what is an expert system
computer program that functions at or near level of human expert in a narrowly defined domain. Focus on surface knowledge (if/then) over deep understanding. Issue is these systems are brittle, failing when presented problem outside their narrow focus area, but they at least can explain their conclusions. Hard coded so can't learn
what are some applications of AI technology being worked on today
computer vision (recognize humans in general/individual, objects, and artwork), conversing fluently in human language [ASR (auto speech recognition, verbal to text), TTS (text to speech generation), NLU (natural language understanding, understanding topics of conversation)], natural language translation, gameplay (strategy, knowledge, other games), plan and reason the way people do (problem solving), display creativity, and innovate, drive a car
what does the research of classical AI focus on
developing systems composed of explicit rules that manipulate data in ways designed to produced seemingly intelligent behavior (programming)
how did Microsoft's Kinect work
didn't truly see, was blind to most things besides people by holding an internal "skeletal model" of people to know where joints are and how they move. Looks for pixels moving in ways that match its internal models of how humans move to track the movements. Also can track changes in facial expressions in similar fashion [human faces and bodies are all it can recognize]
how are neurons positioned in relation to each other
do not have direct physical contact. Have tiny gaps called synapses between the dendrites of one neuron and the axons of others. Signaling neuron guides the signal from the source to it by releasing neurotransmitter chemicals into the synapse
how has computer visions with AI evolved (what was a big early failure)
early machine learning model was perceptron; more modern Kinect has models for how humans work and can detect changes in pixels that fit credentials [issue is system must have models of something or learn through training, they can't truly see/understand on their own]. Big early failure was "Freddy"
what is quantum computing
employ laws of quantum physics to implement a form of massive parallelism to which all solutions to a problem could be tested simultaneously (quantum bit, or qubit, can be a 1, 0, or both at the same time)
what are the 1980's known as for AI
era of expert systems with a focus on practical problems, little progress on big issues
what are the 1990's-2000's know as for AI
era of incrementalism. Lay seeds of future growth with little concrete progress towards big goals. Integrate automation of stocks, credit card fraud detection, etc into economy, but "grand challenges" not the focus
what are the two states of a synapse
excitatory (signals arriving in this synapse increases the odds that the receiving neuron will fire) and inhibitory (signals arriving in this synapse decreases the odds of the neuron firing)
what two types of incorrect outputs can a perceptron produce
false positives (fires when it should not have) and false negatives (does not fire when it should have)
what are the characteristics of a perceptron learnable problem (for 2D and in general)
for 2D, it must be linearly separable such that it can solve the equation of X(2) = (-W(1)/W(2))X(1) - (W(0)/W(2)), where the orientation of the decision surface completely describes the knowledge possessed by the perceptron. In general, for an N-input perceptron, the two output categories must be seperable by an N-1 dimensional hyper-plane (point for one input, line for 2D, 2D plane for 3D, etc) [XOR and EQUAL operations cannot be solved by a perceptron]
how do weights effect a positive input value (and negative)
for positive inputs, positive weights are excitatory synapses, where large positive weights increase the effect of a positive input and small positive weights decrease the effect of a positive input. Weight of 0 will make the input ignored. Negative weights decrease likelihood of perceptron firing [negative inputs are the opposite of all these]
how is the learning gradient calculated. How does the learning rate come into play and how are the new weights formed
gradient(0) = SumFN(0) - SumFP(0), repeat for all others. Multiply learning gradient by the learning rate (constant value > 0.0 and <= 1 that impacts how fast the system will learn, but not the final answer obtained (generally b/w 0.05 and 0.25, 0.22 is common). So delta(0) = 0.22 * gradient(0), repeat for all others. The revised (new) weights are found by adding the deltas to the current (old) weights, so W(0) new = W(0) old * delta(0)
what is the technological singularity (also called the singularity)
hypothesis that invention of artificial super-intelligence will trigger runaway tech growth, resulting in unfathomable changes to human civilization due to the AGI continuously trying to improve itself [first coined in 1958 by John von Neumann. Ray Kurzweil predits it to occur around 2045, while Vinge believes 2030]
how will a trained perceptron respond when presented with a new input vector (new input of X0, X1, ... that hasn't been used)
if input is on/above the decision surface, perceptron will fire (else not). There are infinite possible decision surfaces that can correctly classify the items (different initial weights and learning rates generate different surfaces)
what is the issue surrounding Nick Bostrom's Paperclip Maximizer thought experiment from 2003
illustrates the existential risk that AGI may pose to human beings when programmed to do even simple, seemingly harmless task to show the necessity of incorporating machine ethics into AI design
when and why was the drake equation made
in 1961 by Frank Drake to discuss at first SETI (search for extraterrestrial intelligence) meeting
how can we know if our perceptron has really learned what we hoped
in general, you can't know because beyond N=3, we cannot visualize the N-1 dimensional hyperplane. Once the perceptron is fully trained with the training set, we give it a testing set (all different inputs) to see if it has truly learned
what does it mean for a neuron to fire
it sends an electro-chemical pulse down its axon to transmit signals to other neurons
is the perceptron learning process guaranteed to halt
it will only be if the problem is "perceptron learnable". (it must be theoretically possible for a perception to learn the training set, then the perceptron learning rule algorithm will converge on the solution
according to Robin Harris, if we find multi-cellular life in the oceans of Enceladus or Europe, or beneath the surface of Mars, what would that say about the long term survivability of humans
it would be evidence that life life can easily develop, implying a low probability of human survival in the long term due to the great filter likely being ahead of us
what is an exascale machine
machine able to support a billion billion floating point operations per second
what has the last decade been for AI
machine learning, big data, and internet gain traction, progress on issues thought to be intractable (vision, speech, self-driving vehicles, etc)
what is the Turing test and when was it made
made by Alan Turing in 1950 paper. The idea is if a machine can trick you into believing it is a real person through a digital conversation that appears to be with someone else, then the machine can be said to exhibit intelligent behavior [focus solely on behavior, cannot test if AI is strong vs weak]
what are neurons and what are they made up of
nerve cells of the brain that have a nucleus with DNA to keep it alive. They have a large number of branched protrusions called dendrites that receive chemical signals from other neurons. They also have an axon (long, thin fiber-like appendage) which they use to send electro-chemical signals (there are branches at the end of the axon to send signals to other neurons)
can intelligent machines be constructed or exist
no one knows if they can be constructed, but they do exist (in the sense that humans are biological machines with intelligence (Decartes))
what is moore's law
observation that computing performance would double every 18 months (straight line growths on logarithmic paper b/c it is exponential)
though machine learning and big data have allowed for tremendous progress in some fields like computer vision, they cannot hold simple conversations or understand simple stories. Essentially Machine learning algorithms are
pattern matchers that treat all problems as 'recognition" problems (can't understand how/why it generates an answer)
what does the process of perceptron learning involve
perceptron can only "know" its weights and threshold, so it learns through modifying these values
the fundamental difference between a perceptron and a sigmoid (logistic) neuron is with the outputs. What is the difference
perceptron output is 0 or 1 (discontinuous step function), while sigmoid is a continuous function where real valued outputs b/w 0 and 1 are possible
What is the drake equation
probabilistic argument used to estimate the number of active, communicative extraterrestrial civilizations in the Milky Way galaxy
describe how a perceptron is trained on a high level
starts by making random guesses to fire or not when presented with an input (assign random initial weights W0-Wn). Present all training set inputs to perceptron and note any mistakes made to adjust the weights based on the perceptron learning rule. After adjustment, present perceptron with training inputs again and do step 2 again if any incorrect results occur [continue until entire training set is matched]
perceptrons are trained using __ learning, which requires (what are the steps)
supervised learning, which requires a training set and a testing set
what are the two general research areas for modern day AI research
symbolic (classical) AI and machine learning
what types of problems are symbolic and machine learning AI used to solve and why
symbolic tends to be higher-level reasoning, while ML is for low-level recognition/classification. This is because ML creates a "black box" in that it gets a result but cannot explain why/how it got the result, while symbolic AI are able to explain
though sigmoids are historic, one of the most popular modern-day activation functions is what
the rectifier. Implemented by a ReLU (rectified Linear Unit). Faster than sigmoids and simpler: F(x) = MAX (0,x)
why are neurons a good candidate for machine learning (in biological approach)
they are all or nothing, so they fire or don't fire with no in between, similar to binary of a computer
how can mathematical perceptron model be adjusted to have a fixed threshold
threshold can be considered 0 (so only need to know if the sum is positive or negative to fire or not) by making an X input before the first real input (X0) and a weight to go with it that is the inverse of the threshold (W0 = - theta), basically just subtracting the threshold from both sides to make the one side 0
what are training and testing sets in supervised learning
training set is set of correctly pre-defined inputs used to teach perceptron to distinguish b/w two types of input (continue to use same training set until it is mastered). Testing set is another set of correctly pre-defined inputs used after training is complete to determine how accurately the system learned to classify inputs [the inputs found in the sets are disjoint]
what is the way forward for AI with a Machine learning approach
two approaches: model "wetware" of brain (Raymond Kurzweil thinks brain is just a large hierarchy of pattern matchers), or simply copy (upload) the brain [need 1-10 exaFLOPS to run whole brain]
what math can be used to simplify the question of if a perceptron will fire or not
use a dot product of X and W vectors (X0 through Xn in horizontal vector, W0 through Wn in vertical vector, see if it is >= 0)
what is semi-conductor based computing
uses photolithography: form of photography that is used to "manufacture" integrated circuits to do stuff like mass produce phones. Design of circuit is setup as very large image (mask). Photoresist (chemical that changes with light exposure)used to copy the image in a very small scale (shorter wavelength of light, smaller the image). Semi-conductor methods are reaching limits of physics and are slowing in progress
how does the perceptron learning rule use misclassifications (false pos/neg errors)
uses them to compute learning gradient that is used with learning rate to adjust all weights. Add false negative input vectors to the weights and subtract the false positive ones (do these at the end of each passthrough of the training set). We have FN (false negatives) with cardinality S (# of false negatives) [add up all X0 values for SumFN0, ...] (same thing for SumFP)
what is bio-computing
using "natural hardware," or mimic the way DNA works for information processing and storage (synthetic biology and DNA computers)
what is molecular level computing
using nanotechnology (engineering systems at the level of individual atoms). Idea that a ataoms could be rearranged to do computing ('1' is presence of xenon atom, '0' is absense)
what is the difference between weak and strong AI
weak AI refers to machines that behave in an intelligent manner (behave as if they are conscious), but can make no claim as to whether the underlying system is truly intelligent/conscious. Strong AI refers to machines that are truly intelligent/conscious [mimic vs reproducing consciousness]
contrast the symbolic and machine learning approaches to recognizing a character
while symbolic would attempt to generate a list of features that make a certain symbol look the way it does, ML would instead focus on making a machine that could be shown examples of the symbol and discover for itself the features that make it unique
what is the physical symbol system hypothesis/assumption in AI
you can capture some degree of intelligent behavior by storing and manipulating symbols (computers tore 0s and 1s and can manipulate them)