Quantitative Methods for Business

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The Quantitative Analysis Approach

(1) Defining the Problem, (2) Developing a Model, (3) Acquiring Input Data, (4) Developing a Solution, (5) Testing the Solution, (6) Analyzing the Results, (7) Implementing the Results

NoSQL: "Not Only SQL" database • NoSQL will likely be implemented alongside relational databases to support organization data needs.

- Nonrelational database that supports the storage of a wide range of data types. - Structured, semi-structured, and unstructured - Flexibility, performance, and scalability to handle extremely high volumes of data.

- SELECT: specifies the attributes - FROM: specifies the tables (more than one) - WHERE: specifies selection criteria and/or conditions

3 Key words used in Structure Query Language (SQL).

continuous variable

A variable (such as age, test score, or height) that can take on a wide or infinite number of values.

Statistical Methods

Descriptive Statistics and Inferential Statistics are

Collecting Data, Presenting Data, Characterizing Data Purpose: Describe data

Descriptive statistics involves:

• Physical models • Scale models • Schematic models

Different types of models

Estimation, hypothesis testing Purpose: Make Decisions About Population Characteristics

Inferential Statistics involves

Hypothesis testing

Likelihood of a population parameter being true Population Parameter: could be mean, STDev, etc.

A mathematical model of profit

Profit = Revenue - Expenses

Profit formulaProfit = Revenue - (Fixed cost + Variable cost) Profit = (Selling price per unit) (Number of units sold) - [Fixed cost + (Variable costs per unit) (Number of units sold)] Profit = sX - [f + vX] Profit = sX - f - vX Where s = selling price per unit v = variable cost per unit f = fixed cost X = number of units sold

Profit formula

Design, direct, and evaluate the scientific approach for decision making Use research methods to: • Collect appropriate and accurate data to generate evidence • Inform and guide the design of databases • Analyze data • Predict and analyze outcomes • Examine patterns • Identify gaps

Statistics helps one to

- use analytic methods to critically appraise existing literature and other evidence to determine and implement the best evidence for practices in business - Design and implement processes to evaluate outcomes of different alternatives for solving a problem

Statistics helps to

the collection, preparation, analysis, interpretation, and presentation of data. First: find the right data and prepare it for the analysis. Second: use the appropriate statistical tool, which depends on the data. Third: clearly communicate information with actionable business insights.

Statistics is the science that deals with

1. Define null hypothesis 2. Define alternate hypothesis 3. Compute test statistic 4. Compare with population parameter 5. Accept or reject null hypothesis

Steps of Hypothesis Testing

random cluster sampling

The dots represent individuals within the population that are grouped into clusters (circles). Individuals in the entire cluster are sampled from the population to form the sample

Structure Query Language (SQL). - Manipulate data using a relatively simple and intuitive approach - Specify the attributes, tables, and criteria the retrieved data must meet

The most popular query language used

subsetting • Subsetting can also be used to eliminate unwanted data such as observations that contain missing values, low quality data, or outliers.

The process of extracting portions of a data set that are relevant to the analysis

category scores • Example: In customer satisfaction surveys, we often use ordinal scales such as very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, and very satisfied to indicate the level of satisfaction. In such cases, we can recode the categories numerically using numbers 1 through 5 with 1 being very dissatisfied and 5 being very satisfied.

This transformation allows the categorical variable to be treated as a numerical variable in certain analytical models.

What is Statistics?1. Collecting Data •e.g. Survey 2. Presenting Data •e.g., Charts & Tables 3. Characterizing Data •e.g., Average

What is Statistics?

Statistics

are a useful tool for expressing data or characteristics in a scientific way.

Quantitative factors

are data that can be accurately calculated - Different investment alternatives - Interest rates - Inventory levels - Demand - Labor cost

Qualitative factors

are more difficult to quantify but affect the decision process - The weather - State and federal legislation - Technological breakthroughs

fixed cost / (selling price per unit - variable cost per unit)

break-even point formula

nominal and ordinal measurement scales

categorical variables are represented by

Data Statistics is the language of data.

compilations of facts, figures, or other contents, both numerical and non-numerical.

omission

complete-case analysis - Exclude observations with missing values - Appropriate when the amount of missing value is small or concentrated in a small number of observations

categorical variable For example, categorical predictors include gender, material type, and payment method

contain a finite number of categories or distinct groups.

dimension table

describes business dimensions such as customer, product, location, and time

Outliers • It is noteworthy that in the presence of outliers, it is preferred to use the median instead of the mean to impute missing values.

extremely small or large values

fact table

facts about the business operation, often quantitative format

Predictive Analytics

forecasting future outcomes based on patterns in the past data

numeric scales, where intervals are fixed, uniform values throughout (limitation: no fixed 0 point) ex: thermometer

interval measurement scale

An enterprise data warehouse or data warehouse - Integrated and accurate - Supports managerial decision making - Organized around subjects such as sales, customers, or products - Historical and comprehensive view of the entire organization - Volume of data can become very large very quickly

is a central repository of data from multiple departments within an organization.

Database

is a collection of data logically organized to enable easy retrieval, management, and distribution of data.

Entity Relationship Diagram (ERD)

is a graphical representation used to illustrate the structure of the data.

data management

is a process that an organization uses to acquire, organize, store, manipulate, and distribute data.

Quantitative analysis

is a scientific approach to managerial decision making in which raw data are processed and manipulated to produce meaningful information

data mart - A subset of the enterprise data warehouse - Focuses on one particular subject or decision areas

is a small-scale data warehouse.

Database Management System (DBMS)

is a software application. - Defining, manipulating, and managing data - Examples: Oracle, IBM DB2, SQL Server, MySQL, Microsoft Access

data transformation

is the data conversion process from one format or structure to another.

Binning • It is important that the bins are consecutive and non-overlapping so that each numerical value falls into only one bin. • Binning can be an effective way to reduce noise in the data if we believe that all observations in the same bin tend to behave the same way.

is the process of transforming numerical variables into categorical variables by grouping the numerical values into a small number of groups or bins.

most basic: classify responses into categories ex: gender, race, religion, marital status

nominal measurement scale

compares categories ex: movie ratings, surveys (question with scales like "highly satisfied" or "1-good, 2-fair...")

ordinal measurement scale

entity - Entities have relationships with one another: either 1:1, 1:M, M:N • 1:1 - customer bought one item • 1:M - customer bought many items • M:M - many customers with many items

person, places, things, events

composite primary key

primary key that consists of more than one attribute; used when none of the individual attributes alone can uniquely identify each instance of the entity

imputation

replace missing values with some reasonable values

ratio measurement scale

similar to interval scale, but have a true zero point (height, age, weight, length, etc.)

Data Modeling

the process of defining the structure of a database

Descriptive analytics

the study and consolidation of historical data

Prescriptive analytics

the use of optimization methods

Categorical variables (qualitative variables)

those that divide subjects into groups, but do not allow any sort of mathematical operations to be performed on the data

handling missing values and sub-setting data.

two important data preparation techniques

omission and imputation

two strategies for handling missing values

discrete variable For example, the number of customer complaints or the number of flaws or defects

variable that has specific values and that cannot have values between these specific values

Acquiring input data

• Input data must be accurate - GIGO rule • Garbage in => Process => Garbage out

Each of the dimension tables has a 1:M relationship with the fact table

• Primary keys of the dimension table are also the foreign keys in the fact table • Combination of the primary keys of the dimension tables forms the composite primary key of the fact table

Statistical Computer Packages

• SAS •SPSS • MINITAB • Excel are examples of

Sensitivity Analysis - Sensitive models should be very thoroughly tested

• determines how much the results will change if the model or input data changes

Stratified sampling

• is a type of probability sampling, in which first of all the population is divided, homogeneous subgroups (strata) • after that, a subject is selected randomly from each group (stratum), which are then combined to form a single sample. The common factors in which the population is separated are age, gender, income, race, religion, etc.

Data Wrangling

• the process of retrieving, cleansing, integrating, transforming, and enriching data to support subsequent analysis. - Transform raw data into a format that is more appropriate and easier to analyze - Objectives: improving data quality, reducing time/effort required to perform analytics, reveal the true intelligence in the data

The most common type of database used in organizations is

• the relational database. - Consists of one or more logically related data files called tables or relations - Each table is a two-dimensional grid • Rows: records or tuples • Columns: fields or attributes, characteristics of a physical object, an event, a person

primary key (PK)

:attribute that uniquely identifies each instance of the entity; used to create a data structure called an index for fast data retrieval and searches

record, which represents an object, event, or person

A collection of related fields makes a

population (universe)

All Items of Interest

converted into numerical variables

In many analytical models, such as regression models, categorical variables must first be

Non-random sampling

Convenience Sampling Volunteer Sampling Quota Sampling Purposeful Sampling Snowball Sampling are examples of

deterministic model

Mathematical models that do not involve risk or chance

probabilistic models

Mathematical models that involve risk or chance

Sample

Portion of Population

Parameter

Summary Measure about Population

Sample Statistic

Summary Measure about Sample

star scheme. - Specialized relational database model - Two types of tables: dimension and fact tables

a data mart Conforms to a multidimensional data model called a

foreign key (FK)

a primary key of a related entity

Simple Random Sample

a sample in which (a) every member of the population has the same chance of being chosen, and (b) the members of the sample are chosen independently of each other.

Cluster sampling • The most common variables used in the clustering population are the geographical area, buildings, school, etc.

a sampling technique in which the population is divided into already existing groupings (clusters), and then a sample of the cluster is selected randomly from the population.

mathematical model

a set of mathematical relationships

Instance

a single occurrence of an entity; represented as a record in a database

dummy variable • Oftentimes, a categorical variable is defined by more than two categories. Given k categories of a variable, the general rule is to create k 1 dummy variables, using the last category as reference.

also referred to as an indicator or a binary variable, is commonly used to describe two categories of a variable.


Set pelajaran terkait

Chapter 24: The Digestive System

View Set

Med Surg 2 Final (Ch's 9-12, 47)

View Set

VHLCENTRAL FRE 111: Structure 3A.2: 4 - Décrivez Instructions these descriptions of Hassan's family with the appropriate possessive adjectives.

View Set

Management Chapter 1 Practice Questions

View Set