QBA Block 1-2
Algebraic Models
They specify a set of relationships in a very precise way, and their preciseness and lack of ambiguity are very appealing to people with a mathematical background.
Contingency table/Crosstabs
Traditional statistical terms for pivot tables that list counts
Likert scale
a 1-5 scale....? coded for Strongly disagree to Strongly Agree
Histogram
a bar chart of the frequency table
Frequency Table
indicates the numbers of observations in various catergories
Range
- Difference between max and min =MAX(dataset)-MIN(dataset)
Variables
-A variable is a specific attribute being observed/measured. -Each column represents a variable.
Observations
-An observation is a member of the population or sample. -Each row corresponds to an observation.
Quantitative Data types
-Continuous -Discrete -(aka numerical)
Pie Chart
-Cousin of the Histogram -Visualize frequency as a proportion of the entire data set.
The seven-step modeling process
-Define the problem -Collect and summarize Data -Develop a model -Verify the model -Select one or more suitable decisions -Present results to the organization -Implement the model and update it over time
Discrete
-Gaps in possible values
Continuous
-Infinite number of possible values -No gaps in possible values
Why Standard Deviation?
-Measured in original units -Same order of magnitude as data -Easier to interpret and respond to. -Excel formulae: Variance=var(dataset) Standard deviation=stdev(dataset) -OR Variance=varp(dataset) Standard deviation=stdevp(dataset)
Standard Deviation
-More intuitive measure of variability is the standard deviation. -Standard deviation is defined to be the square root of the variance.
The Mode
-Most frequently occurring observation -Represents location of greatest clustering -Remember multiple modes are possible -Can be used for both quantitative and qualitative data -Excel formula: =MODE(dataset)
Arithmetic Mean
-Most popular and useful measure of central location -Called "mean" or "average" -Mean = Sum of measurements / # of observations. In symbols (3.1): -Represents the "balance point" of a distribution -Influenced by extreme observations -Sum of deviations from the mean is zero - always -Excel formula: =AVERAGE(dataset)
Nominal
-Named category as variable -Name can be a number
Qualitative Data types
-Nominal -Ordinal -(aka categorical)
Creating frequency tables and histograms
-Place the cursor anywhere in the data set. -Define DataSet by selecting StatTools/ DataSet Manager - Click OK for defaults. -Select the StatTools/SummaryGraphs/ Histogram... ribbon item. -A list of numerical variables in the data set appear. Select the Proportion variable. -Set X-axis as categorical or numeric.
Numerical Descriptive Measures
-Precise, objectively determined values to describe populations and samples -Easily manipulated, interpreted and compared -Permit more detailed analysis of data sets
Categorical
-Qualitative -Categorical quality of an observation -Mutually exclusive -Collectively exhaustive
The Median
-Represents the middle value in a series of observation -Middle in terms of order - low to high -Same number of data points above and below -Excel formula: =MEDIAN(dataset) -If N in odd: -median is the middle value -If N is even: -median is the average of the two middle values -Value of other data points do not factor into median calculation -Not influenced by extreme values
Creating a Time Series Plot
-Select the StatTools/Time Series.../ Time Series Graph ribbon item. -In this example there is only one timebased variable to plot, "Total". -We also have the option of selecting a "date" variable for labeling the horizontal axis (see pull-down menu).
Guidelines for bins/classes
-Should be mutually exclusive -Should be collectively exhaustive For Quantitative categories: -8-15 bins works best -Should have equal widths ("round" numbers better, e.g. 5, 10, 100, etc, )
Pivot Tables
-Statisticians often refer to the resulting tables as contingency tables or crosstabs.
Samples
-Subset drawn from the population. -Size and method of choice is critical.
Variance
-The variance is the average of the squared deviations from the mean. -There are two versions of the variance: the population variance and the sample variance. -The variance increases when there is more deviation around the mean -Large deviations from the mean contribute heavily to the variance because they are squared -Hard to relate to, due to squared units
Scatterplots
-We are often interested in the relationship between two variables. -Plot a point for each observation, where the coordinates represent the values of the two variables. -After constructing a scatterplot, we can examine the scatter of points. -We look for any relationship between the two variables. -Is there a tendency for one variable to move in concert with, or in opposition to, the other variable?
Time Series Plots
-When we need to forecast future values of a time series, it is helpful to create a time series plot. -This is essentially a scatterplot, with the time series variable on the vertical axis and the time itself on the horizontal axis. -Also, to make patterns in the data more apparent, the points are usually connected with lines.
Relative Standing
-Where an observation falls in relation to all other observations -A value in a data set such that a certain proportion of the data has values less than the value in question. -Measures are quartile, decile, percentile, etc. -Breaks data set up into clusters of equal size - by # of points, not range of values -Excel formulae: = PERCENTILE(data_array,%) = QUARTILE(data_array,#)
Percentiles
-break into 100 groups -E.g. 10th, 50th, 80th, 95th -Result is a value and not a range
Quartiles
-break into 4 quarters -1st, 2nd, 3rd, 4th
What is QBA?
A collection of methods and applications that help decision-makers turn data into information they, and others, can use.
Negatively Skewed Histograms
A distribution is negatively skewed or skewed to the left if its longer tail is on the left.
Positively Skewed Histograms?
A distribution is positively skewed or skewed to the right if it has a single peak and the values of the distribution extend much farther to the right of the peak than to the left of the peak.
Symmetric Histograms
A distribution is symmetric if it has a single peak and looks approximately the same to the left and right of the peak.
Spreadsheet Model
A fairly recent alternative to the algebraic model. Instead of relating various quantities with algebraic inequalities and equations we relate them with spreadsheet with cell formulas. -Immediate feedback from the spreadsheet software
Cross-sectional data
All variables measured at one point in time - snap shot
Ordinal
Category identifies ranked order of values
Inferential Statistics
Conclusions, estimates or forecasts about a population based on a representative sample data set.
Population
Entire group of interest in a statistical problem.
Linear relationship in a scatter plot?
If the correlation of the points is above .5 or near... It shows a linear pattern
Times Series Graph main characteristic
It looks like a scatter plot with the time on the x-axis -Great for finding seasonal patterns and changes over time for a variable.
Time-series data
Measure one or more variables at successive points in time.
Descriptive Statistics
Organize, summarize and present data in a meaningful and informative way.
Statistical Symbology
Population parameters -Typically capital English, or lowercase Greek letters Sample Statistics -Typically lowercase English letters
Graphical Models
Probably the most intuitive and least quantitative type of model. They attempt to portray graphically how different elements of a problem are related-- what affects what.
Numerical
Quantitative values -Real numbers -Arithmetic calculations valid
Bimodal Distributions
Some distributions have two or more peaks. This indicates that the data comes from two or more distinct populations.