OIDD255X Final Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Number of hidden layers

# of hidden layers of neurons between the inputs and the outputs

Batch size:

# of training samples before updating model parameters

How many bits for a sound sample in an audio recording

16 bits = 1 sound samples

What is a bit?

A binary digit, a 0 or a 1 (on-off sequence in coding)

Why does apple make their own chips?

(Apple making chips that are superior to anything intel can supply them) Privacy - Apple is the one company that can combine privacy and machine learning

What is advantageous about only using first party cookies?

Advantageous for big tech companies where you are doing a lot of activity on their site Apple and Google → now you can only use first party cookies

Advantages of cloud computing

Flexibility, ability to scale Cost efficient Workforce / labor sharing for specialized skills Cybersecurity benefits

How does reinforcement learning work?

Humans provide rules and objective The machine uses these to figure out the best strategy by running a simulation thousands of times through trial and error (which moves / set of decisions maximizes objective?)

What is the ISP network

ISP network: connect from backbone to local areas (typically providing access to consumers)

What do opponents of net neutrality believe?

ISP recognizes you like website 1, allow consumer to pay. a premium to fast-track streaming of that site

Method of unsupervised learning: anomaly detection

Identification of rare events which deviate significantly from the majority of the data Example: Parts failure detection, fraud detection

How are bits processed

Bits are processed using transistors which act as switches (on/off)

Policy interventions

2019 algorithmic accountability act

How many bits to encode a pixel in a color image

24 bits = 1 pixel

How many frames per second in a video

24 frames per second

How large is a 3 min MP3 song on a phone?

3MB on phone

Classification error

= 1 - accuracy

Surrogate models

A simple model that replicates the behavior of a more complex model Created by training an interpretable model on the inputs and predictions of the more complex model

What is machine learning?

A subset of AI where massive amounts of data are used to produce an algorithmic prediction (Rather than explicitly coding decision rules into software, we can provide examples of decisions being made under different data conditions

3 - computation

AI applications are outpacing Moore's Law Deep learning hardware has become increasingly specialized Example: Apple making their own chip

What is explainable AI

AI methods where why the algorithm arrived at the results can be understood by human experts Contrasts with the black box approach normally associated with ML

Economics of AI: how is this changing

AI requires a significant amount of digital capital Forces coming together the lower these barriers and accelerate AI adoption

AI progress given explainability

Advanced quickly in marketing and finance Progress in other fields are likely to be governed by the ability to address factors such as bias and explainability

What is the goal of the agent in reinforcement learning?

Agents goal is to maximize the sum of rewards

How can we constrain the loss function?

Algorithm can be instructed to find the best solution where biased solutions are penalized → push against unfair outcomes

What is a q table?

All possible combinations of state / action / reward are placed in a q-table State = positions (which tile are you in) → any environment can be expressed as a state Action = labeled as numbers Rewards are inside the table

What is reinforcement learning? What are some examples?

An area of machine learning concerned with how software agents ought to take a path of actions in order to maximize some notion of a cumulative reward Main ideas: sequence of actions and a cumulative reward About making the best sequence of decisions Examples: chess, video games, Uber maps, cooking recipes

Differing definitions of fairness

Anti-classification Demographic parity Predictive parity Individual fairness Equalized odds

ML and identifying boundaries in data

Anything below the boundary indicates one decision while anything above the boundary indicates the alternative decision ML algorithms use gradient descent to find the right place to put these boundaries

What limitations of databases does the blockchain addresss?

Anything digital can be written and rewritten The validity of information stored in a database required trust between counter parties or a centralized and trusted broker when parties do not know or trust one another (Banks, Platforms (Airbnb, Twitter)) Interesting because many institutions exist to broker trust blockchain removes the need for a regulator and removes the risk of something being altered

Applications of unsupervised learning

Applications: Grouping articles together in google news Creating customer preference recommendations (Netflix)

How is blockchain immutable?

Approved transactions can never be rewritten There are thousands of participants across the network. Relies on a consensus protocol and a proof of work

Examples of GAN usages

Art in a certain style that copies someone Photorealistic images of fake people

Implications for competition and innovation

Markups and profit Makes industry entry for difficult Potentially shifts the locus of innovation

What is a NAP

Network Access Point

Transparency

Showing people what features and weights go into the machine

2 - skills

Skills: advances in AI technology can impact the supply of skills Makes deep learning more accessible (Deep learning reduces the need for feature engineering, and ML engineers still need to tune these engines) Automated ML approaches further reduce the need for specialized skills

Applications of RL in business

Trading bots Portfolio management Price setting

For Lab 2, what is meant by "true positive", "false positive", "true negative", and "false negative".

True positive: the model predicted that the employee would leave the organization and they actually did. False positive: the model predicted that the employee would leave the organization but they stayed. True negative: the model predicted that the employee would stay in the organization and they did stay. False negative: the model predicted that the employee would stay in the organization and they actually leave.

Connections between Covid and AI job substitution

Workforce reorganization AI based monitoring in the workplace Robots can help with social distancing

Regularization

methods to improve fit and particularly to avoid overfitting

Epochs

passes through the training data set

What does a person contribute in deep learning networks?

provide data and choose the loss function

How does deep learning work?

1. Data is converted into embeddings 2. Neurons have activation functions that convert node inputs to outputs that are passed onto next layer 3. At each layer, weights combine the inputs to pass to the next layer (try all different weight combinations until predicted output best fits input and computer finds best combination) 4. Information about the error is then fed back to the network so the weights are adjusted to minimize the prediction error 5. Value from last layer is used to generate a prediction 6. Engineers choose a loss function to minimize

How would you make a decision tree using training and test data?

1. Determine the outcome you are trying to predict 2. Determine the data features that impact this outcome 3. Make a decision tree for each of these features until you get to outcome 4. Using 70% of data, train the model by running this data through the decision tree 5. Using 30% of data, test the model

Structure of the internet (hierarchy of networks)

Backbone network ISP network Local access network

ML bias: human processes

Bias generated through human processes: recruiter tending to hire from Penn, machine will favor candidates from Penn Not intentional but still has an impact

What is data privacy?

Data privacy: the right to be left alone when you want to be, the right not to be observed without your consent

Platforms and personal data: Web2

Data voluntarily contributed to platforms From an HR perspective, these data can be very useful for prediction and lead generation There are still significant concerns about platform power and privacy violations There is a lack of transparency about how the data are used, packaged and sold

Why is a change to database technology so potentially important?

Databases are the factory floor of information-based businesses

How do decisions work in reinforcement learning?

Decisions are cast as state → action → reward Agent = software making decisions Action = choices available (up, down, left, right) Environment = the game / set of rules Reward = depends on actions

Controversies with generative AI

Deep fakes and shallow fakes Can create fake content but also can claim real content is fake

Difference between ML and deep learning

Deep learning eliminates the need for feature extraction → machine does this for you using unstructured data

Why is AI regulation complicated?

Different cultures place different weights on productivity, rights and fairness Question of fairness is often interwoven with questions of politics, power and competition How much should AI be regulated? Is AI progress directly tied to economic and military power? If we constrain corporate AI innovation, are there geopolitical implications → China and Russia

What is dimensionality reduction

Dimensionality reduction converts a high dimensional data set into a smaller number of dimensions Example: self driving cars identifying only the important objects in an image (feature selection)

Difference between discriminative and generative model

Discriminative model: models boundary between two classes Generative model: models the distribution that produces the underlying data

What does the law say about AI bias?

Disparate treatment: intentional discrimination and disparate impact → neutral policies that adversley impact some groups) Law bars policies based in prejudice but allows for the use of protected attributes in certain contexts (affirmative action)

Benefits to organic network

Easy to run new internet based applications (ultimate platform) Internet does not discriminate between applications → new applications do not need the permission of the network, existing applications cannot prevent access to new applications

US-EU Safe Harbor Framework

Eu threatened to stop international data flow if necessary to protect EU citizens privacy EU directive on data protections prohibits transfer of personal data to non-EU nations that do not meet the EU's standard for privacy protection Because of differences between US and EU approaches, directive could have significantly hampered the ability of US companies to engage in many trans-Atlantic transactions US department of commerce in consultation with European commission developed a safe harbor framework

How does ML differ from an expert system?

Expert systems do not need the final column (labels) but they need the relationship to be clearly describable by the expert ML algorithms must have labels, but can work the relationships out themselves without an expert

What is Explainability and why is it important

Explainability: ability to explain to another in human terms why a model arrived at a conclusion Allows us to trust decisions made by an algorithm Big tech is investing heavily into explainable AI → so important for people to trust AI so it can grow and be implemented

How do we provide consent online?

Explicit (opt-in): user actively chooses to share information Implicit (opt-out): user assumed to agree to share unless specified otherwise

What is an F-score

F score = 2 *[ (precision * recall) / (precision + recall)] Weighted average of precision and recall

Why is an F score important?

F1-score shows the weighted average of precision and recall, which are two competing metrics (opposites)

Why does fairness come at a price? What does this mean?

Fairness and accuracy tradeoff → as fairness increases accuracy falls

What is the FCC

Federal Communications Commission

Digital observations: Web1 - How does this work?

First-party cookies contain information for a single web site and can be read only from that site Third-party cookies are shared among websites (so sites can know where you have been, when and for how long) and use for marketing Web toolbars send information about browsing activities Websites collect clickstream data

How can AI bias be fixed?

Fixing the training data Changing labels → feature engineering Manipulating data Reweighing observations Constrain the loss function Casual inference Transparency → google nutrition cards for machine learning Policy interventions

Types of generative AI technologies

GANs Transformers VAEs

What are GANs and how do they work

GANs → generative adversarial networks Generator network generates fakes that replicate real data Discriminator finds difference between real and fakes using training data Used to generate artificial content that is increasingly difficult to tell apart from real content

GDPR

GDPR A right to be forgotten, a right to explanation, conservative opt-in policies

Key data privacy laws

GDPR (general data protection regulation), PIPL (personal information protection law) and CCPA (california consumer privacy act)

What is generative AI

Generative AI is an innovative technology that helps generate artifacts that formerly relied on humans This means that generative models can create new instances of data → AI becoming a creative engine Putting in text and receiving an image, song, story

Method of unsupervised learning: collaborative filtering

Group customers according to historical preference data and recommend products based on similarities with other customers Example: Recommendation engines - Netflix show recommendations

What is domain expertise?

Having expertise in some kind of area that is relevant to the decision being made → more important for some domains than others

Interpretable deep learning

Highlighting pixels in an image that are most important / made biggest difference in producing outcome uses deep learning but shows you what was most important in that data

What kind of questions do the TCP/IP rules answer?

How do I write down the address of the computer to send my packet to? Where do I send the next packet? How do I detect the beginning of a new packet? How do I detect an error in transmission?

Requirements of the US-EU safe harbor principles

Implicit (opt-out) consent for all information Explicit (opt-in) consent for specific kinds of information Notice of purpose of use, types of third-parties the information is shared with Verification that all third parties information is shared with also adhere to the consent and notice principles above Consumer access to all this information stores about them, ability to verify and correct this information

What is the role of privacy in AI?

In the information age, privacy hinges on our ability to control how our data is being stored, modified, and exchanged between different parties Modern data-mining techniques allow corporations to identify, profile and affect peoples' lives without their consent

Disadvantages of cloud computing

Industry-specific concerns Privacy concerns Vendor lock in Data security issues

History of the web

Information economy (Web 1): reading content Platform economy (Web 2): reading and writing / sharing content Token economy (Web 3): read, writing content, executing transactions

Information exchange between computers

Information exchange between two computers may pass through several other computers

What is structured data?

Information organized in a format so it is identifiable, storable, retrievable, and analyzable in a computer system.

What is the internet?

Internet is a very large network of computers that "speak" TCP/IP and are all connected

What is Interpretability

Interpretability: ability to predict what a model is going to do given a change in inputs → understand why a model is doing what it is doing (identifying patterns)

What are some interesting features about the blockchain?

It is immutable → transactions cannot be changed once confirmed (Once written, it is the truth) Does not require a central authority / owner for verification

Issues that AI bring to the workplace

Less relational capital AI governance Contracts and liability Employee culture and satisfaction Questions related to fairness

What do the lines and circles represent in a neural network?

Lines represent weights, circles are neurons

LIME

Local interpretable model-agnostic explanations (LIME) Generates a linear approximation to a more complex boundary for a local set of points Image classification: can pull out different features of an image and say this feature is why I thought this image was something else can interpret why the model arrived at a certain outcome

Methods of converting data sets into a decision

Logistic regression Random forest Decision trees Neural networks

Why is the difference between rule-based software and ML so critical?

Machine learning is much more accurate when dealing with very large data Automating processes even if we never knew the decision logic → take data and let machine make decision Can often be faster, easier, and cheaper than encoding instructions Like other automated models, when ML models are built for these tasks, they can learn to the level of the best expert and scale without limit

Limitations of RL

Massive amounts of simulated or archival data on decision sequences and outcomes needed Requires great deal of computation Curse of dimensionality makes it hard to apply in offline contexts - easy for strictly digital applications, but robots in real life may break down and wear Assumes the world is Markovian (can be modeled as a set of independent states) → this is not the case

Efforts to predict job impact with advancements in AI

McKinsey study: by 2030, intelligent agents and robots could eliminate as much as 30% of the world's human labor, displaying jobs of as many as 800 million people There is a growing division between wealth creation and distribution There will potentially be different impacts on job levels at different wage levels

In what contexts might explainability matter most and why?

Medicine Equal employment guidelines - Burden of proof lies with organizations in lawsuits Autonomous vehicle systems - When something goes wrong, need to explain what went wrong General data protection regulation - Data privacy laws → have right to meaningful explanation of logic

What is data exploitation and how does it work

Modern devices collect, store, use and sell terabytes of users' personal data Many consumers are unaware of how much their software / devices generate, process and share their information The potential for exploitation only goes up as our reliance of digital technologies increases Common theme across exploitation → misuse and abuse by powerful, centralized data entities

ML: Automating decision rules

Most existing enterprise software is the automation of decision rules Decision rules (business logic) are encoded in software - Equivalent to an expert encoding their know-how in if-then statements → this is not machine learning

Future importance of cloud computing

Much AI-driven computing demand will be met using cloud resources

How many samples per second is music recorded?

Music recorded at 44,100 samples per second (44.1 kHz)

What is backpropagation?

Neural networks feedforward and backpropagate hundreds of times to tune the weights to find the combinations that generate the smallest prediction error During backpropagation, error information is fed back into the network, and weights are adjusted through gradient descent

What are neurons in a neural network?

Neurons → look at data coming in and if it fits category it is fired onto the next neuron

Disadvantages of organic network

Nobody owns it, no one in charge Net neutrality

Why is packet switched more efficient?

Not cost effective to have telephone-like connections between different computers

What is the objective of data scientists creating ML algorithm?

Objective is to fit the model to minimize prediction error

Use cases for deep learning

Oil & gas industry - optimize supply chains, predict machine failure Construction industry - run simulations to find the fastest route to build projects (how to lay out pipes / concrete) Financial services - use deep learning text analytics to detect insider trading Cybersecurity - detecting security threats Social media - pinterest suggested images

Challenge with reinforcement learning?

One challenge is to teach agent how to correlate immediate action with delayed rewards → Bellman equation helps agent solve problem in terms of current reward and potential future reward

Applications of blockchain beyond crypto

Organic produce Diamond certifications Insurance claims processing Cross border payments Real estate deeds Birth certificates Bike sharing Global ID Voting

Packet switched networks

Packet switched networks: move data in packets based on destination address in each packet. When received, packets are assembled in proper sequence (not as fast but much more efficient)

When is reinforcement learning useful?

Particularly useful when preparing training data may be hard, difficult to know best strategy ahead of time Example: how would you prepare training data for winning a chess match?

Why not always use explainable AI?

Performance penalty of interpretable algorithms (when they are interpretable, they are less accurate) Tradeoff between predictive power and interpretability

What is compression and how is it calculated

Photos/videos/etc are compressed on an iphone, usually using lossy formats that discard "unnecessary" information 1 - compressedFile/rawFile

How do pixels work?

Pixels are a 1 or a 0 - 1 meaning on, 0 meaning off.

What does it mean for ML to make a prediction?

Prediction is a general term for making a recommendation of many different kinds within an organization Should we invite an applicant for an interview? Should we sell an asset? Should we provide this customer with an e-coupon? ML can also make numeric predictions Estimate house of a pierce Demand for a product Number of riders for the subway

Challenges with collaborative filtering

Privacy How to initialize the matrix Biased toward old preferences Does not work well with rapidly changing user preferences Unpredictable items are hard to accurately recommend

Privacy and HR

Privacy: data capture promises new battles over worker privacy Facebook and access to images - drunk photos Photo of women vs. photos of woman with child (less likely to get hired) We trade privacy for discounts in many markets As with credit histories, opting-out may not be a choice

Problem with anti-classification

Problem: other factors in the model may correlate with the removed variables

Problem with demographic parity

Problem: qualified candidates miss out, artificial quota not based on performance or merit

What is proof of work?

Proof of work: a problem such that the likelihood of solving it is strictly dependent on computer hardware Proof of work makes it exceedingly expensive to recompute a chain quickly enough to satisfy consensus protocol Why is it hard? → to confirm the transaction block, the network is asked to solve a completely arbitrary problem mathematical problem This problem can only be solved through trial and error

How is ML different from automated software rules

Rather than explicitly coding decision rules into software, we can provide examples of decisions being made under different data conditions and the machine will learn how to classify new data and make predictions.

Business applications of generative AI

Real estate - modeling houses automatically through AI based on client preferences Finance - creating presentations based on data Skincare - creating new formulas Text generation - law industry Marketing design - auto generate logos Creating new data for other AI initiatives Creating code

Where is data-driven decision making useful in HR?

Recruiting Selection On boarding Training Performance management Advancement Retention Employee benefits

What does pruning do?

Reduce model complexity (cutting off a piece of the tree) - pruning cuts down the tree to reduce overfitting

Who should deal with AI bias? Regulate it?

Regulators? Managers? Data scientists? AI councils? They have responsibility to answer big human rights questions AI education → people outside of data scientists need to be aware of and think about these issues

How do rewards work in reinforcement learning?

Rewards are assigned based on performing a sequence of correct actions based on decisions

Receiver operating characteristic (ROC) curve

Roc curves are a way to visualize how ML algorithms tradeoff errors as its 'discrimination threshold' is varied → as you change algorthims to favor one type of rror, how does it change another type of error It plots the false positive rate against the true positive rate

What is a router and what does it do?

Routers connect the internet's individual networks (subnets). They Cooperate to give an end-to end route for each packet and need to be very fast

Summary of difference between rules based and ML

Rules-based software allows us to automate tasks where human can lay out a decision process Machine learning enables automation of tasks where examples are available to reverse engineer the best decision rules

Approaches to explainable AI

SHAP LIME Surrogate models interpretable deep learning

3 different scenarios of job displacement: What are the critical incentives? What is the policy preparation?

Scenario 1: efficiency gains with minor displacement Scenario 2: significant displacement (20-30%) requiring massive reskilling for reemployment Scenario 3: massive workforce replacement (70%+) with robots / algorithms

Screening - HR and AI

Screening: reducing information asymmetry by analyzing direct signals of job performance Scale through archival data → Stack overflow Before: wait for an applicant to apply Now: search out applicants who can ask and answer the type of questions Use granular data for better management of information work Example: tracking the flow of information in service work

Method of unsupervised learning: clustering and segmentation

Segment customers or points into different categories and group them into clusters based on similar characteristics K-means clustering: works by creating k clusters and assigning points to clusters to minimize square differences

How do WANs work?

Several connections between WAN converge at a Network Access Point (NAP). Backbone providers own and maintain devices at NAP's. Hundreds of fibre optic cables underground going between cities

SHAP

Shapley Additive Explanations (SHAP) Computes feature importance SHAP computes predictions with and without features → pulls out features to see how model changes

What is Simultanibility

Simultanibility: can you reproduce a decision? Decomposition into component parts → see inside algorithm

Why might the model produce incorrect predictions for certain individuals in Lab 1?

The model might still produce incorrect predictions because the model makes the predictions using an average of all the data provided. Therefore, there are some data anomalies and outliers that do not fit the regression model and will produce an incorrect prediction.

How did the model learn in Lab 1?

The model was fed massive amounts of structured data (health indicators) with different weights placed on each of these indicators. The model learns by using these data inputs to make a prediction with the aim for the result indicated by the model to match the "Diabetes?" column.

What is the role of humans in reinforcement learning

The only thing humans provide are the rules and the objective

Where does ML bias come from?

Training data and human processes

ML Bias: training data

Training data → ML algorithms mimic human decisions, so if machine is trained on data that is biased it will learn to be biased Structural bias: bias may not be intentional → lack of adequate training data

What are transformers and what is an example of one?

Transformers → take into account a sequence of items (like words), are trained using language models and are fine tuned for specific tasks Uses: writing contracts, law industry, sports journalism, nuanced journalism GPT-3: produces human like text

What are transistors?

Transistors are small electronic devices with a semiconductor - can have millions in a very small space and operates like a switch

What is the problem blockchain solves?

We already have good data storage technologies (databases), but we need an alternative because databases can be updated or deleted, the maintainer must be a trusted third-party The blockchain is a software protocol that achieves this trust algorithmically, so a trusted third party is not needed

Tradeoffs and challenges with ML systems

We need a lot of examples from which to learn (lots of data) Need to define label (what we are predicting) and features (what we use to make predictions) And we need to be able to define the label and the features very well (why data is so important)

Why do we need so many performance metrics?

We need to optimize for different things in different industries

What is the problem of overfitting?

We want to build a model but we want the model to work on other data, not training data. The training process for algorithms imply a tradeoff between performance on training data and other data sets

Some questions with AI and job displacement

What job skills or tasks are uniquely human? Are they any? What should policy makers be most concerned about? Are some countries better prepared to make these adjustments than others?

Competing stakeholders and metrics

What to prioritize? Profits or fairness? What information to share with the public? Example: Propublica and Northpoint bail → regulators problem

What is a WAN

Wide Area Network

A publishing company uses a ML model to see if they should accept manuscripts or not. They use old books that were accepted and rejected as training data to decide if manuscripts should be accepted or not. Would this model be biased? What measure of accuracy / fairness would you use for this model?

Yes - training data: algorithm learns based on past decisions that were made by humans who do have biases. Those who picked manuscripts may pick those from authors they know or manuscripts that appeal to them. Accuracy metric: precision (books identified as good that actually are good) Could miss out on publishing a good book, but you don't want to publish a book that is terrible (classified as good but actually is bad - false positive) Fairness metric: Predictive parity - precision rates are similar across all groups

What is an integrated circuit?

a collection of thousands or millions of transistors placed on a small silicon chip (fundamental building blocks of computers)

What is the backbone network

a set of interconnected Wide Area Networks (WAN) across the country (high speed, city-to-city, large service providers)

What is TCP/IP

a set of rules for transmitting data between computers. TCP/IP protocol is the address system → allow any two computers on the internet to exchange data

What is an IP address

a unique string of 32 1s and 0s - www.upenn.edu corresponds to an IP address

What is local access network?

access to individual computers (like Penn Net)

Bias in deep learning

accurate prediction relies on all salient conditions being well-represented in the training data Imagine a training data set where all masked faces are women, and all unmasked faces are men

Feature engineering

adding more columns, changing columns etc. - This can be a difficult process because it requires domain expertise, is time consuming and expensive

What is a hash

an identifiable digital fingerprint for each block When you change what is inside a block, the hash will change

What is Gradient Descent?

an iterative optimization algorithm used to find a local minimum / maximum of a function That means the algorithms will search for the best place to set the boundary through intelligent trial and error

How do we get access to labelled (organized) data?

archival sources, customer contributions, ghost-work (workers being paid to label data)

Why is ML so important?

automate a large class of tasks which were difficult to automate by encoding business rules not specific to any one domain

Bytes Megabit Megabyte

bit = 0 or 1 1 byte = 8 bits 1 megabit = 125,000 bytes = 1,000,000 bits 1 megabyte = 8 megabit = 1,000,000 bytes = 8,000,000 bits

Anti-classification

blind process, drop or remove relevant variables that add bias

How do packets know where to go?

communication: TCP/IP is a set of rules for transmitting data between computers. Each computer has an IP address

Why not always use deep learning?

complex and time consuming need lots of data can be resource intensive lack of interpretability

What is the role of AI in big tech?

going backward in supply chain and building chips yourself

How do you improve image quality?

having more pixels (making boxes smaller)

Activation function

how node inputs are mapped to outputs

examples of unstructured data

images, sounds, text

What kind of AI did the diabetes prediction lab use

machine learning model — "logistic regression"

Disadvantges of ending net neutrality

monopolies - would allow big companies to pay to fast-track, leaving small companies behind. cencorship issues

What does it mean if there are more transistors in a chip?

more powerful chips

Performance metrics: specificity

negative instances correctly identified (not a cat, called it not a cat) Specificity = TN / (TN + FP)

What are deep learning networks

neural netwokrs with input layer, hidden layers and output layer

Performance metrics: precision

positive identifications that were correct (fraction of times you called it a cat and it was actually a cat) Precision = TP / (TP + FP)

Performance metrics: sensitivity (recall)

positive instances correctly identified (fraction of times it was a cat and you called it a cat (opposite of precision) Recall = TP / (TP + FN)

Predictive parity

precision rates are similar across all groups

Demographic parity

protected classes receive some positive outcome at equal rates as unprotected class Example: ⅘ promotion rule with males and females

Recent AI applications and Moore's Law

recent AI applications are now outpacing Moore's Law Tesla got a new chip that had a 90x improvement in terms of processing powers 90x performance improvement from old chip to new = decade of moore's law

How do underwater cables send data overseas?

sitting on the floor of the ocean - not very stable (Asian exchange taken out for three and a half days after an incident)

Learning rate

step size to use when adjusting model parameters

What is training data?

the data used to fit the model (the model learns from this data)

What does Net Neutrality mean?

the desire to have ISPs treat all content equally and not block, slow down or charge money for specific online content. This is controversial - censorship, regulation,

What is 5G?

the next evolution in data transmission - Promises lower latency, higher bandwidths Used in combination with 4G and other network protocols

Moore's Law

the number of transistors in a dense integrated circuit (IC) doubles about every two years → computer chips double their processing time every two years

What is the test data set?

the sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. Normally start with labeled data and then divide it into training and test data (70/30 split)

Applications of the blockchain

Credentialing, personal histories and recruiting information Could help navigate GDPR Faster cross-border payments Human capital passports

For Lab 1, under what conditions might you be more concerned about the false negative rate? How about false positive rate?

A healthcare manager might be more concerned about a false negative rate when dealing with elderly people or people at more risk of serious illness if their diabetes is left undiagnosed or untreated. More concerned about a high false positive rate for giving patients medication or treatment that is unnecessary and potentially harmful.

Expert System

A rule-based system that acquires and stores human knowledge in the form of if/then rules

true positive

fraction of time a positive instance is identified positive

Why is the lack of a trusted third party important?

Can we trust third parties with our data? Growing concerns about big tech Market failure where the right incentive for commercial investment do not exist

What is casual inference?

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system.

How is the software market changing

Changing the market: Google releases TensorFlow TensorFlow black boxes the complexity associated with building ML algorithms → a way people can use it without worrying about the internals (install or download it) TensorFlow is only one of several competing deep learning frameworks Then Google introduces Teachable machine

Circuit switched networks

Circuit switched networks: whole line is closed off to the only one user - require dedicated point-to-point connections during calls. (one to one)

Methods of unsupervised learning

Clustering, segmentation and dimensionality reduction anomaly detection collaborative filtering

What makes bias challenging in a ML context

Competing stakeholders, competing metrics Differing definitions of fairness in ML Fairness comes at a price Who should deal with AI bias? Regulate it? AI regulation is complicated

When to use deep learning?

Complex tasks w/hidden patterns and unstructured data (image classification, NLP, speech recognition) Tradeoff between accuracy and interpretability - high accuracy (complex model) but hard to understand how the model works Time and resources - lots of processing power required You need enough labeled data

What predicts higher productivity

Connecting information work to revenue: access to information diffusion predicts individual productivity Each additional keyword seen is associated with about $70 extra revenue Seeing information sooner also predicts higher productivity

4 key inputs to AI adoption

Software Skills Computation Data

4 - data

Software, skills, computation becoming commodity → competition comes in the data Is data the new oil? A virtuous cycle of data collection can have competitive implications Top 5 S&P firms are data platforms → never before have top 5 firms all been in the same industry

1 - software: what does this entail?

Software: creating ML algorithms from scratch requires substantial expertise Programming a neural network from scratch is complex (Generally requires a PhD level training in computer science or a related field and hiring for these technical and scientific skills is difficult)

How can we fix training data to remove bias?

Synthetically alter Add more training data

What are the 2 kinds of discrimination?

Taste-based discrimination: prejudicial decision making Statistical discrimination: decision makers consider protected attributes to achieve non-prejudicial goals (Economically advantageous that creates some bias outcome)

false positive

fraction of times a negative instance is identified positive

How does wireless work?

The FCC manages the allocation of different frequency spectrums to companions through auctions. When companies win, they receive a slice of a spectrum, and companies that own parts of the wireless spectrum can use this data transfer

How does a blockchain work

The blockchain can be viewed as a ledger, where identical copies of this ledger are stores across all participating nodes of the network 1. Transactions (differences) are recorded in a sequential chain of blocks. Each block contains data, a hash, and a hash of previous block The current state of the world is computed by summing up all the blocks 2. Transactions between two parties are verified using PGP encryption Private key is used to encode data Public key is used to decode data Since the plaintext is locked with a private key, that information has to have come from the owner of that private key 4. These transactions are then confirmed by the network and stored in the blockchain

What is the blockchain

The blockchain is a way for two parties to exchange value without a trusted third party

Bias-variance tradeoff

The challenge is capturing the model vs. capturing the model + the noise

What is consenus protocol?

The consensus protocol makes sure that every new block that is added to the Blockchain is the one and only version of the truth that is agreed upon by all the nodes in the Blockchain. Blocks are linked - if a block is tampered with, the hash of the block will change, and thus every sequential block will change because they hold the hash of the previous block.

Why is the internet not an intelligent network?

The internet was designed to be organic → its role is simply to move data packets from one computer to another (role to move data packets)

false negative

fraction of times a positive instance is identified as negative

Data privacy and consent: laws?

There is no constitutional right to privacy of information Existing laws protects our privacy against public intrusion, but not against private collection of information

What is the goal of unsupervised learning

To find patterns in unlabeled data without human supervision. This is different from trying to predict an outcome

What is unsupervised learning?

Unsupervised Learning is a machine learning technique in which the users do not need to supervise the model. Instead, it allows the model to work on its own to discover patterns and information that was previously undetected. It mainly deals with the unlabelled data

What is Deep reinforcement learning

Use deep learning to approximate different states and actions rather than being given instructions Taking high dimensional data and generating abstractions → deep learning converts images into states and actions and learns to play game on its own without being given any rules

Next generation: quantum computing

Use qubits - a different way of holding information Lets you process information in a way that is way more efficient than binary computation Still have lots of problems, looking at decades into the future

AI developed device that works as a pesticide (kills weeds) - what would you want to know?

Use sensitivity/recall because I would be worried about false negatives - pests that were there but unidentified can create lots of harm

How does machine learning work

Use training data to train the machine (learning) The machine will automatically work out from these examples how the data should be mapped to decision This allows the machine to learn the optimal decision to make under different conditions

When to use accuracy?

Use when false negatives and false positives have similar costs (consumer targeting)

When to use sensitivity / recall

Use when identifying positives is crucial and false negatives is unacceptable (identifying deadly disease)

When to use precision

Use when occurrences of false positives is unacceptable (labeling emails as spam - you would rather have some spam emails in inbox than miss out on some regular emails that were incorrectly sent to spam)

When to use specificity

Use when you do not want to raise false alarm / where false positive is unacceptable (drug test in which any positive go to jail)

Who provides ISP network

Usually provided by local telephone or cable TV carriers (Comcast, Verizon) ISPs are consumer facing with sales people and technicians etc Important to know these are physical cables owned by actual companies

What are VAEs and applications

VAEs → variational autoencoders These tools learn a latent and interpretable representation of input data This allows construction of new output from these latent attributes Applications: image segmentation (self-driving cars) Take something and add to it → make someone's face older, add glasses

Ethical issues with the use of GAN technologies

Value of original work? Fake news Fake images of people

Examples of backbone network

WAN owners compete with each other - AT&T, Sprint

How does WEKA work?

WEKA is a ML model. Press "run" → predictions being made using data provided

Downsides of blockchain

Wasteful and costly in terms of energy A blockchain can currently implement a few transactions per second Compare this with thousands of transactions per second for most financial databases

If software, skills and computation become more widely available, what factors shape AI competition?

data

What is cloud computing?

data and computing occur on a network of remote computers, not on your own computer or laptop - Enables new models of computing

How does data travel over the network?

data is transmitted as a sequence of packets - Packet switched vs. circuit switched

What is unstructured data

data that is not in columns nor sorted into features

What is the validation dataset?

data used to evaluate a given model (fine-tune the model hyperparameters)

What is wireless network?

different from physical cables, another way to send and receive data

Type 2 errors

false negative

Type 1 errors

false positive

Performance metrics: accuracy

fraction of labels correctly predicted Accuracy = correct predictions / total predictions = (TP + TN) / (TP+TN+FP+FN)

true negative

fraction of time a negative instance is identified negative

Equalized odds

true positive and false positive rates are similar across groups

Individual fairness

two individuals with similar features should have the same outcome Counterfactual fairness: outcome would be the same if demographic attributes were flipped

How does data move overseas?

underwater cables and wireless

What are features?

what we use to make predictions

What is a label?

what we want to predict


Ensembles d'études connexes

WGU - D089 - Principles of Economics

View Set

Chapter 3 - The Costs of Production and Profit Maximization

View Set

DOC1 Chapter 8 Managing Lesson Delivery

View Set

1.4 Hematology: Macrocytic and Normochromic Anemias

View Set

SFDC STUDY 175 Questions - Set 2

View Set

Ch 4 human digestion, transport and absorption

View Set

Module 1 Chapter 2.3 and 3.2-3.9

View Set