Analytics
Taxonomy of DSS?
- Communication-driven DSS - Data driven DSS - Document driven DSS - Knowledge driven DSS - Model driven DSS
What are two common methods to evaluate a classifier?
- Confusion matrices (aka contingency table) - Receiver Operating Characteristic (ROC) Curve
Potential Benefits of DSS?
- Decision quality - improved communication - cost reduction - increased productivity - time savings - improved customer and employee satisfaction
What are some practical ML issues or concerns?
- Getting good data - choosing appropriate features - which algorithm to use (not possible to know beforehand.) - How to test it? - how to set user defined parameters
What are some disadvantages of using ML?
- Need lots of data - Gold standard data can be expensive to acquire - good data is hard to find - error prone: it is usually impossible to get perfect accuracy
What are some advantages of machine learning?
- Often more accurate than human made rules - No need for human presence - Humans are not often capable of expressing knowledge even if they do a task well - If the solution needs to be adapted to particular cases - Problem sizes too big for humans to reason about
What are the components of a DSS?
- Specialized databases - Analytical models, decision maker insights and judgments -Interactive Graphical User Interface (GUI)
What are the 7 varieties of Data?
- Structured: pre-set format (ex: banking tansaction) - Unstructured: no pre-set format (ex: web pages, social media. Currently most data is unstructured) - Semi-Structured: Unstructured data that can be put into a structure using format descriptions - Batch: Big chunks of data, time separated - Streaming: chunks of data, consistent feed - Real Time: analysis to be done immediately - Meta-data: definitions, mapping: i.e., data about data. This helps you to do pre-processing on info
with regard to ML, What are the 4 standard learning scenarios?
- Unsupervised learning: only unlabeled data. (grouping or clustering data based only on features) - Supervised learning: only labeled data (predict class/labels of an unseen item based on its features) - Semi-supervised learning: both labeled and unlabeled data (improve prediction by also using info from unlabeled data) -Reinforcement: agent interaction with environment (learns best behavior based on consequences of actions)
What is data mining?
-An attempt to discover patterns, trends, and correlations hidden in the data that can give a strategic business advantage. -Includes the analysis of huge databases/warehouses. - Can highlight buying patterns, reveal customer tendencies, cut redundant costs or uncover unseen profitable relationships and opportunities
What is Type II error?
-False Negative
What is Type I error?
-False Positive
What is the main purpose of data mining?
-Knowledge discovery (a component of some DSS)
What are the 4 characteristics of big data?
-Volume (data at rest): Terabytes to exabytes of existing data to process - Velocity (Data in motion): Streaming data, milliseconds to seconds to respond - Variety (Data in many forms): Structured, unstructured, text, multimedia - Veracity (Data in doubt): Uncertainty due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations (Big Data Typically has 2 of the above characteristics)
What do we mean by making a machine learn?
-a process of acquiring knowledge from observations/data and/or interactions/feedback from an environment - So we need algorithms that instruct machines how to acquire knowledge from data, not what the knowledge is
What business outcomes can benefit from Big Data?
1. Acquire, Grow & Retain Customers 2. Optimize Ops and reduce Fraud 3. Maximize Insights and improve economics 4. Transform Business Performance 5. Create New Business Models
What are things big data cannot help with?
1. Chance correlation 2. Meaning ( can find relationships in data, but more is needed to determine meaning or cause) 3. Action (more data doesn't imply more knowledge 4. Easily fooled ( many data tools can be purposely fooled) 5. data drift (incoming data can cause unintentional signals) 6. feedback (reinforcing data) 7. critical thought (scientific sounding answers to vague questions) 8. new data (How to handle previously unseen data) 9. realistic (big data is not a silver bullet)
What are the challenges with big data?
1. Data lacks integrity 2. Data lacks metadata 3. Back end is cheap 4. Front End is confusing 5. Analysts don't understand your question 6. Analysis is incomplete 7. Lacks a means to interpret the analyses 8. Not acting on the analyses
What is the Imitation game?
A test to see if something is artificially intelligent. If you are able to interact with something without being able to tell if if you are interacting with a human or a computer than it passes the test
Compare ML vs AI vs Data Mining vs Statistics
AI: Computers that behave and reason intelligently ML : Automatically learn models of data for prediction (a subset of AI) DM: Human-guided discovery of hidden patterns in a particular dataset Stats: Quantify and summarize data
What are some common learning tasks?
Classification: assign a category to each item Regression: predict a real value for each item Ranking: order items according to some criterion Clustering: partition data into homogenous regions Dimensionality reduction: find a lower-dimensional feature space that preserves most important properties of the data
What are some applications of ML?
Image recognition, Speech recognition, Chess, Cancer diagnosis, computer security, medical diagnosis
Why can Big Data happen now (8 reasons)?
Low cost, CPU Power, Fast Access, Cloud Computing, Distributed computing, Government investment, Open Source Software, Machine Learning
What does AI Include?
Natural languages Industrial Robots Expert Systems Intelligent agents
What is Machine Learning?
a computational method that uses experience to improve algorithm performance for purpose of prediction
With regard to Machine Learning what is experience?
a data-driven task, and thus stats, probability, and optimization will play a significant role
What is a Decision Support System (DSS)?
an interactive software used to aid in decision making by suggesting solutions to the problem at hand
What is the goal of AI?
to develop computers that can think, as well as see, hear, walk, talk, and feel.