Quantitative Methods for Business
The Quantitative Analysis Approach
(1) Defining the Problem, (2) Developing a Model, (3) Acquiring Input Data, (4) Developing a Solution, (5) Testing the Solution, (6) Analyzing the Results, (7) Implementing the Results
NoSQL: "Not Only SQL" database • NoSQL will likely be implemented alongside relational databases to support organization data needs.
- Nonrelational database that supports the storage of a wide range of data types. - Structured, semi-structured, and unstructured - Flexibility, performance, and scalability to handle extremely high volumes of data.
- SELECT: specifies the attributes - FROM: specifies the tables (more than one) - WHERE: specifies selection criteria and/or conditions
3 Key words used in Structure Query Language (SQL).
continuous variable
A variable (such as age, test score, or height) that can take on a wide or infinite number of values.
Statistical Methods
Descriptive Statistics and Inferential Statistics are
Collecting Data, Presenting Data, Characterizing Data Purpose: Describe data
Descriptive statistics involves:
• Physical models • Scale models • Schematic models
Different types of models
Estimation, hypothesis testing Purpose: Make Decisions About Population Characteristics
Inferential Statistics involves
Hypothesis testing
Likelihood of a population parameter being true Population Parameter: could be mean, STDev, etc.
A mathematical model of profit
Profit = Revenue - Expenses
Profit formulaProfit = Revenue - (Fixed cost + Variable cost) Profit = (Selling price per unit) (Number of units sold) - [Fixed cost + (Variable costs per unit) (Number of units sold)] Profit = sX - [f + vX] Profit = sX - f - vX Where s = selling price per unit v = variable cost per unit f = fixed cost X = number of units sold
Profit formula
Design, direct, and evaluate the scientific approach for decision making Use research methods to: • Collect appropriate and accurate data to generate evidence • Inform and guide the design of databases • Analyze data • Predict and analyze outcomes • Examine patterns • Identify gaps
Statistics helps one to
- use analytic methods to critically appraise existing literature and other evidence to determine and implement the best evidence for practices in business - Design and implement processes to evaluate outcomes of different alternatives for solving a problem
Statistics helps to
the collection, preparation, analysis, interpretation, and presentation of data. First: find the right data and prepare it for the analysis. Second: use the appropriate statistical tool, which depends on the data. Third: clearly communicate information with actionable business insights.
Statistics is the science that deals with
1. Define null hypothesis 2. Define alternate hypothesis 3. Compute test statistic 4. Compare with population parameter 5. Accept or reject null hypothesis
Steps of Hypothesis Testing
random cluster sampling
The dots represent individuals within the population that are grouped into clusters (circles). Individuals in the entire cluster are sampled from the population to form the sample
Structure Query Language (SQL). - Manipulate data using a relatively simple and intuitive approach - Specify the attributes, tables, and criteria the retrieved data must meet
The most popular query language used
subsetting • Subsetting can also be used to eliminate unwanted data such as observations that contain missing values, low quality data, or outliers.
The process of extracting portions of a data set that are relevant to the analysis
category scores • Example: In customer satisfaction surveys, we often use ordinal scales such as very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, and very satisfied to indicate the level of satisfaction. In such cases, we can recode the categories numerically using numbers 1 through 5 with 1 being very dissatisfied and 5 being very satisfied.
This transformation allows the categorical variable to be treated as a numerical variable in certain analytical models.
What is Statistics?1. Collecting Data •e.g. Survey 2. Presenting Data •e.g., Charts & Tables 3. Characterizing Data •e.g., Average
What is Statistics?
Statistics
are a useful tool for expressing data or characteristics in a scientific way.
Quantitative factors
are data that can be accurately calculated - Different investment alternatives - Interest rates - Inventory levels - Demand - Labor cost
Qualitative factors
are more difficult to quantify but affect the decision process - The weather - State and federal legislation - Technological breakthroughs
fixed cost / (selling price per unit - variable cost per unit)
break-even point formula
nominal and ordinal measurement scales
categorical variables are represented by
Data Statistics is the language of data.
compilations of facts, figures, or other contents, both numerical and non-numerical.
omission
complete-case analysis - Exclude observations with missing values - Appropriate when the amount of missing value is small or concentrated in a small number of observations
categorical variable For example, categorical predictors include gender, material type, and payment method
contain a finite number of categories or distinct groups.
dimension table
describes business dimensions such as customer, product, location, and time
Outliers • It is noteworthy that in the presence of outliers, it is preferred to use the median instead of the mean to impute missing values.
extremely small or large values
fact table
facts about the business operation, often quantitative format
Predictive Analytics
forecasting future outcomes based on patterns in the past data
numeric scales, where intervals are fixed, uniform values throughout (limitation: no fixed 0 point) ex: thermometer
interval measurement scale
An enterprise data warehouse or data warehouse - Integrated and accurate - Supports managerial decision making - Organized around subjects such as sales, customers, or products - Historical and comprehensive view of the entire organization - Volume of data can become very large very quickly
is a central repository of data from multiple departments within an organization.
Database
is a collection of data logically organized to enable easy retrieval, management, and distribution of data.
Entity Relationship Diagram (ERD)
is a graphical representation used to illustrate the structure of the data.
data management
is a process that an organization uses to acquire, organize, store, manipulate, and distribute data.
Quantitative analysis
is a scientific approach to managerial decision making in which raw data are processed and manipulated to produce meaningful information
data mart - A subset of the enterprise data warehouse - Focuses on one particular subject or decision areas
is a small-scale data warehouse.
Database Management System (DBMS)
is a software application. - Defining, manipulating, and managing data - Examples: Oracle, IBM DB2, SQL Server, MySQL, Microsoft Access
data transformation
is the data conversion process from one format or structure to another.
Binning • It is important that the bins are consecutive and non-overlapping so that each numerical value falls into only one bin. • Binning can be an effective way to reduce noise in the data if we believe that all observations in the same bin tend to behave the same way.
is the process of transforming numerical variables into categorical variables by grouping the numerical values into a small number of groups or bins.
most basic: classify responses into categories ex: gender, race, religion, marital status
nominal measurement scale
compares categories ex: movie ratings, surveys (question with scales like "highly satisfied" or "1-good, 2-fair...")
ordinal measurement scale
entity - Entities have relationships with one another: either 1:1, 1:M, M:N • 1:1 - customer bought one item • 1:M - customer bought many items • M:M - many customers with many items
person, places, things, events
composite primary key
primary key that consists of more than one attribute; used when none of the individual attributes alone can uniquely identify each instance of the entity
imputation
replace missing values with some reasonable values
ratio measurement scale
similar to interval scale, but have a true zero point (height, age, weight, length, etc.)
Data Modeling
the process of defining the structure of a database
Descriptive analytics
the study and consolidation of historical data
Prescriptive analytics
the use of optimization methods
Categorical variables (qualitative variables)
those that divide subjects into groups, but do not allow any sort of mathematical operations to be performed on the data
handling missing values and sub-setting data.
two important data preparation techniques
omission and imputation
two strategies for handling missing values
discrete variable For example, the number of customer complaints or the number of flaws or defects
variable that has specific values and that cannot have values between these specific values
Acquiring input data
• Input data must be accurate - GIGO rule • Garbage in => Process => Garbage out
Each of the dimension tables has a 1:M relationship with the fact table
• Primary keys of the dimension table are also the foreign keys in the fact table • Combination of the primary keys of the dimension tables forms the composite primary key of the fact table
Statistical Computer Packages
• SAS •SPSS • MINITAB • Excel are examples of
Sensitivity Analysis - Sensitive models should be very thoroughly tested
• determines how much the results will change if the model or input data changes
Stratified sampling
• is a type of probability sampling, in which first of all the population is divided, homogeneous subgroups (strata) • after that, a subject is selected randomly from each group (stratum), which are then combined to form a single sample. The common factors in which the population is separated are age, gender, income, race, religion, etc.
Data Wrangling
• the process of retrieving, cleansing, integrating, transforming, and enriching data to support subsequent analysis. - Transform raw data into a format that is more appropriate and easier to analyze - Objectives: improving data quality, reducing time/effort required to perform analytics, reveal the true intelligence in the data
The most common type of database used in organizations is
• the relational database. - Consists of one or more logically related data files called tables or relations - Each table is a two-dimensional grid • Rows: records or tuples • Columns: fields or attributes, characteristics of a physical object, an event, a person
primary key (PK)
:attribute that uniquely identifies each instance of the entity; used to create a data structure called an index for fast data retrieval and searches
record, which represents an object, event, or person
A collection of related fields makes a
population (universe)
All Items of Interest
converted into numerical variables
In many analytical models, such as regression models, categorical variables must first be
Non-random sampling
Convenience Sampling Volunteer Sampling Quota Sampling Purposeful Sampling Snowball Sampling are examples of
deterministic model
Mathematical models that do not involve risk or chance
probabilistic models
Mathematical models that involve risk or chance
Sample
Portion of Population
Parameter
Summary Measure about Population
Sample Statistic
Summary Measure about Sample
star scheme. - Specialized relational database model - Two types of tables: dimension and fact tables
a data mart Conforms to a multidimensional data model called a
foreign key (FK)
a primary key of a related entity
Simple Random Sample
a sample in which (a) every member of the population has the same chance of being chosen, and (b) the members of the sample are chosen independently of each other.
Cluster sampling • The most common variables used in the clustering population are the geographical area, buildings, school, etc.
a sampling technique in which the population is divided into already existing groupings (clusters), and then a sample of the cluster is selected randomly from the population.
mathematical model
a set of mathematical relationships
Instance
a single occurrence of an entity; represented as a record in a database
dummy variable • Oftentimes, a categorical variable is defined by more than two categories. Given k categories of a variable, the general rule is to create k 1 dummy variables, using the last category as reference.
also referred to as an indicator or a binary variable, is commonly used to describe two categories of a variable.