WPC 300 Midterm
Data Models
1. Descriptive 2. Explanatory 3. Predictive 4. Prescriptive
Database Management System (DBMS)
1. Helps us create the structural rules that our data will adhere to 2. helps load data into framework we've already established 3. provides additional support such as tracking users and providing log in credentials performing maintenance and routine backups, as well as a host of additional admin. tasks that protect/secure data
problem finding
1. Tell an interesting and complete story • The problem you address should be meaningful • Solution could be reused for related problems • Assumptions, boundaries 2. Find an appropriate solution framework • Break down the problem into pieces • Iterative process (agile vs. waterfall) • Identify appropriate analytical/modeling techniques 3. Routinize the procedure • Documentation • Next similar question can be solved quickly • Build a system (macros, codes, programs)
Typical IT infrastrcuture
Data sources - Operational systems - Flat files ↓ Staging Area ↓ Warehouse - summary data - metadata - raw data ↓ Data Marts - Purchasing - Sales - inventory ↓ Users - analytics - reporting - mining
Logical functions
IF, OR, And, NOT
control group
In an experiment, the group that is not exposed to the treatment; contrasts with the experimental group and serves as a comparison for evaluating the effect of the treatment.
Time functions
TIME, HOUR, MINUTE, SECOND, NOW, YEAR, MONTH, DAY, DATE, DATEIF
Split a cell
To divide one cell into two or more cells.
Cloud Computing
Types: public, private, and hybrid - Types of cloud services: --- Infrastructure as a service (IaaS) --- Platform as a service (PaaS) --- Serverless computing --- Software as a service (SaaS) • Drawback: concerns of security, reliability
data transformation
clean data into proper format or structure for the purpose of querying and analysis
4 squares of data
collection cleaning communication analyses
Meta Data
data that describes other data
Data Extraction
extract data from primary/secondary source
Data load
load data into final target database, more specifically an operational data store, data mart, or data warehouse
A/B testing
method of comparing two versions of a webpage or app against each other to determine which one performs better
Two types of data
transactional and analytical
Cofounding/Spurious Correlation
two variables are correlated, but one does not cause the other
transactional
• Captures data describing and event • An exchange between actors • Real-time
Analytical (Data Warehouse)
• Captures data to support analysis and reporting • An aggregated view of the business • Historical
Simulated data
• Data based on assumptions and simulations • Used a lot in scheduling, routing, queuing
Heuristics
• Experience • Intuition • Rules of thumb • Think fast
Secondary data
• Firm's proprietary databases • Internet data (crawlers) -- https://scrapy.org/ -- Beautifulsoup • Stock/capital markets data (Compustat, CRSP, etc.) • Accounting disclosure data (Form 10K, 10Q)
Survey Biases
• Online Survey Selection Biases • Question order biases • Imbalance sampling • Prevent using leading and biased questions, but use neutral wording. Do not include your opinion in the question. • Don't overloaded your question. Always break questions into singles so your survey is measurable. • Don't ask the wrong questions • Don't survey the wrong people. Note - It is always a good practice to revise survey questions and to target a population that fits the survey's goal by clearly defining the respondent requirements so we can prevent survey biases from happening
Survey -- The Code of Ethics
• Respondents should be given the content, sponsorship (who is conducting the survey), and purpose of the survey so that they may make an informed judgement about whether they wish to participate. • Avoid use of methods that deliberately introduce bias into the results. Note - We should include a paragraph to explain who is conducting the survey, the content, and purpose of the survey
Primary data
• Surveys • Interviews (marketing firm's telephone interviews) • Used a lot in marketing research
4Vs of Big Data
• Volume • Velocity • Variety - different forms of data • Veracity - uncertainty of data
SQL SELECT statement
SELECT column_name(s) FROM table_name WHERE condition* GROUP BY column_name ORDER BY column_name DESC/ASC - *condition: column_name =, <>, >, >=, <, <= 'a value' or a number (need ' ' for test value, not numbers like 123) - Understand all the Keywords: DISTINCT, COUNT, MAX, MIN, AVG, SUM
ORDER BY
Sorts the result according to specified criteria
SQL
Structured Query Language
The Information Architecture of an Organization
Data entry → Transactional Database (stores real time transactional data) → Data extraction → Analytical Data Store (stores historical transactional and summary data) → Data Analysis
VLOOKUP
(lookup_value, table_array, col_index_num, range_lookup(optional)) - To find the salary for Elva, move the mouse to D2, type = VLOOKUP(C3,'Salary chart'!$A$2:$B$8,2) - Since I didn't enter the range_lookup value, Excel used the default value. - That worked because we just need an approximate match. - If we need an exact match, then we need to enter 0 as the range_lookup value.
The Relational Database
- A series of tables with logical associations between them - The associations (relationships) allow the data to be combined - Entity - Primary Key - Foreign Key - What does a row in a table mean? - Why does primary key need to be unique?
A Scientific Perspective to Problem Solving
- Heuristics vs Analytics
Analytics Process
- Identify a problem - Find dataset (Data collection) - Prepare data (ETL) - Analyze data with a data model (Descriptive, Explanatory, Predictive, and Prescriptive) - Interpret Results
The Transactional Database
- In business, a transaction is the exchange of information, goods, or services - For databases, a transaction is an action performed in a database management system - Operational databases deal with both: they store information about business transactions using database transactions Ex of Transactions - purchase of product - enroll in a course - hire an employee Data is in REAL TIME - reflects current state - how things are NOW
Transactional Database
- Supports management of an organization's data - For everyday transactions This is what is commonly thought of as "database management"
Analytical Data Warehouse
- Supports managerial decision-making - For periodic analysis This is the foundation for business intelligence
Decision Making Biases
- Understand situations where biases arise - Based on a situation, you should be able to identify what kind of cognitive bias that is - Differentiate the major kinds of biases discussed
identifier
- attribute that uniquely identifies one entity instance from other instances - FlightNo
attributes
- describe characteristics of an entity - FlightNo, From, To, Departs, Arrives
Descriptive Data Model
- insight into the past • What do we have? What happened? When? • Find Patterns; basic plots of variables; Correlation • Descriptive statistics
Predictive Data Model
- predict the future • What is likely to happen next? • Methods: Regression analysis; Data mining (Classification, decision trees, Association Rules (market-basket analysis), Text mining
Prescriptive Data Model
- provide advice • Recommended actions & timing; Possible automation of actions • Methods and Applications
entities
- something users want to track - flights, traveler, reservation
Explanatory Data Model
- understand a (causal) relationship • Why did it happen? How did it happen? • Does smoking cause cancer? • Methods and Applications
ETL (extraction, transformation, and loading)
A process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse time consuming and sometimes done in parallel. Several computation tools to develop robust systems
experiment
A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process
double-blind study
An experiment in which neither the participant nor the researcher knows whether the participant has received the treatment or the placebo
Entity data relationship model
Entities → Attributes → Identifier
Survey Examples
Examples of quantitative research that use survey methodology to answer questions about a population - Polls about public opinion - Public-health surveys - Market-research surveys - Government surveys and censuses Although censuses do not include a "sample", they do include other aspects of survey methodology, like questionnaires and interviewers.
Text functions
LEFT, RIGHT, MID, CONCATENATRE (can also be done with a "&"), Trim, LEN, Find, and TEXT (can be used to format a number and convert it to text)
Statistic Math Functions
MIN, MAX, AVERAGE, MODE, MEDIAN, COUNT, SUMIF, COUNTIF, SUM, SUMPRODUCT, RANDBETWEEN, and how to calculate the range (that will be calculated with MAX - MIN)
OLTP
Online Transaction Processing
PaaS (Platform as a Service)
PaaS includes infrastructure but also middleware, development tools, business intelligence (BI) services, database management systems, and more. PaaS is designed to support the complete web application lifecycle: building, testing, deploying, managing, and updating.
SaaS (Software as a Service)
Pay for software as you use it. Not installed locally, instead it is 'Hosted software e.g. Google Docs
Serverless computing
The cloud provider handles the setup, capacity planning, and server management for you.
OLAP
online analytical processing
IaaS (Infrastructure as a Service)
rent IT infrastructure—servers and virtual machines (VMs), storage, networks, operating systems—from a cloud provider on a pay as you-go basis
SELECT DISTINCT
specifies that the statement is going to be a query that returns unique values in the specified column(s)
randomized controlled experiment
study in which people are allocated at random (by chance alone) to receive one of several interventions one of these interventions is the standard control, and the other are treatments
placebo effect
the phenomenon in which the expectations of the participants in a study can influence their behavior
Analytics
the process of developing actionable decisions or recommendations for actions based on insights generated from historical data • Observe what has worked • Experimentation • Collect data and analyze data • Find patterns • Boundary and Generalizability • Think slow