QBA 2302 Exam 1
*Construct and analyze a box and whisker plot*
*Box and Whisker Plot- A graphic display that shows the general shape of a variable's distribution. It is based on the max, min, mean, and first and third quartiles. (It is also beneficial to include the median as including the median will indicate the skew)*
Quasi-Experiment
(one type of correlational study), they look like experiments on the surface, but lack random assignment of subjects. (Ex. asking for volunteers to participate)
*Display a frequency table using a bar or pie chart.*
*Bar Charts- an illustration of qualitative data representing a variable's categories on the x-axis as independent vertical bars and simple frequencies on the y-axis* & *Pie Chart- an illustration of qualitative data representing a variable's categories as portions of a circle (or slices of a pie).*
Qualitative Variables
Can be translated with either summaries or illustrations
Experimentation
If ________________ is possible, you always want to conduct an experiment (because it will always give you the highest quality answer to your questions)
representative
If a sample is highly _____________, conclusions drawn from the sample are likely to reflect conclusions that would have been drawn from the population.
representative
If a sample is not _____________, conclusions drawn from the sample are unlikely to reflect the population.
"tail"
If the ______ points towards positive numbers, it's positive skew. If the _____ points towards negative numbers, it's negative skew.
Illustrate
If you are trying to translate data for a client or customer, you will usually choose to ___________ the data rather than to summarize them
Illustrations
If you want to deliver a clear, powerful message about a variable, _____________ are the most useful
nominal, ordinal, interval, and ratio (in that order, each is more specific and has more requirements than the one before)
What are the 4 scales of measurement?
temperature in degrees Celcius
What is an example of Interval Data?
Rank data
What is an example of Ordinal Data?
temperature in degrees Kelvin
What is an example of Ratio Data?
addition and subtraction
What kinds of math do Interval Scales use?
multiplication and division
What kinds of math do Ratio Scales use?
robust
When a statistic has fewer assumptions, it is described as being more _________ than statistics with more assumptions.
correlational study
When an experiment is not possible, a researcher often uses a ____________________
Sample Mean
X bar = sum of ("E" shaped symbol) x/ n - x bar = sample mean - n = number of values in the sample - x = any particular value - "E" shaped symbol (sigma) = sum of the x values in the sample
precise
You want to use the most __________ statistics whose assumptions are met.
Mean/Arithmetic Mean
____________ is affected by extreme values/outliers
Quantitative variables
______________ are best translated with illustrations.
Scatterplots
_______________ can be used to examine qualitative OR quantitative data, and can even be used to compare quantitative and qualitative data with each other
Representativeness
_________________ can never be addressed with numbers, it is a rational argument.
Normal Distribution
a bell-shaped curved line (a common shape that data take)
Variable
a collection of data with different values based upon its source, represented in a dataset as a column
Fractions
a common way of representing portions of wholes
A frequency
a count of how many times a value appears in a variable within a dataset, are symbolized with an italicized "f", they are sometimes called "simple frequencies"
Case
a group of data, collected across one or more variables from one source, represented in a dataset as a complete row
Parameter
a measurable characteristic of a population
Datum
a single value collected in the context of research, could be a number/letter/word
Dichotomous data
a special type of nominal data with only two possible values (ex. "male" or "female")
Hypothesis
a testable relationship between operational definitions that reflects a theory
1. You could be fooled by others with their poor translations, 2. You could fool others with your poor translations, 3. You could fool yourself with your poor translations
Reasons not to be misled by a pretty picture:
Theory
Specific beliefs about what might be true in a relationship between constructs.
Arithmetic Mean
The ______________ is the most widely reported measure of location
Central tendency
refers to a family of statistics that can be used to determine where the "middle" of a data set is (the approach you choose to find the "middle" depends on the data's shape --- this dependency is called an "assumption")
Relative Frequency
represented by the term "rel.f" (rel.f = f/n)
n
sample size (the number of cases in a particular sample)
Frequency Polygon
similar to a histogram, also shows the shape of a distribution (these are good to use when comparing two or more distributions)
Subscripts
smaller letters used beneath terms in formulas to denote where a variable came from
Range
the difference between the maximum and minimum values in a set of data (= Maximum Value - Minimum Value)
Treatment Condition
the group you chose to receive the treatment
Control condition
the group you did not choose to receive the treatment, they receive nothing different
Q2
the median of the entire data set
Q1
the median of the lower half of the data
Q3
the median of the upper half of the data
Median
the midpoint of the values after they have been ordered from the minimum to the maximum values
Cumulative Frequency
the number of cases, represented as "cum.f"
Class Frequency
the number of observations
Cumulative Percentage
the percentage of cases, represented as "cum.%"
Random Assignment
the process by which a sample is randomly split into two or more groups during an experiment
Percentage
the value of a proportion multiplied by 100, followed by the percentage symbol (%)
Dispersion
the variation or spread in a set of data
Operational Definition
the way we represent a construct in a dataset (this definition is sometimes referred to as the "operationalization" of a construct)
Relative Frequency Distributions
to find the relative frequencies, simply take the class frequency and divide by the total number of observations
Conditions
two or more groups that a sample is randomly split into during an experiment
Inferential Statistics
used to estimate properties of a population, the methods used to estimate a property of a population on the basis of a sample
Descriptive Statistics
used to organize data into a meaningful form, methods of organizing, summarizing, and presenting data in an informative way
Dataset
when multiple related data are collected in one place
Descriptive & Inferential
What are the 2 types of Statistics?
Likert-type Item
A particular type of question found on surveys
"i" >=
(highest value - lowest value)/k
*Differentiate between descriptive and inferential statistics.*
*Descriptive statistics: The techniques used to describe the important characteristics of a set of data. This includes organizing the data values into a frequency distribution, computing measures of location, and computing measures of dispersion and skewness.* vs. *Inferential statistics: The methods used to estimate a property of a population on the basis of a sample.*
*Summarize quantitative variables with frequency and relative frequency distributions.*
*Frequency Distribution: a grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class*, 4 Steps to make a Frequency Distribution: 1. Decide on the # of classes (use 2^k > n rule), 2. Determine the class interval ("i"), 3. Set the individual class limits, 4. Tally the data into classes and determine the number of observations in each class, *Relative Frequency Distribution: to find the relative frequencies, simply take the class frequency and divide by the total number of observations*
*Summarize qualitative variables with frequency and relative frequency tables.*
*Frequency Table- A table containing all categories, simple frequencies, relative frequencies and cumulative frequencies for a given variable.* & *Relative Frequency- Count of how many times a value appears in a variable, calculated as a proportion of the total number of cases; the number of observations of a specific outcome compared to total number of observations in the dataset*... *Did you notice how, in the frequency table, I put the classes in order already from most likely to least likely? This is because this is ordinal data. With ordinal data, make sure you always keep the order of the ordinal data in tables and graphs. If you are dealing with nominal data, I normally encourage you to organize graphic displays from greatest frequency to least greatest frequency in graphs and alphabetically in the tables*
*Compute the mean of grouped data*
*Geometric Mean- Variation of mean used for finding the average change of percentages, ratios, indexes, or growth rates over time*... *the first step is to determine the midpoint (also called a class mark) of each interval, or class. These midpoints must then be multiplied by the frequencies of the corresponding classes. The sum of the products divided by the total number of values will be the value of the mean*
*Display a frequency distribution using a histogram or frequency polygon.*
*Histograms- an illustration of quantitative data representing the range of a variable's values on the x-axis and frequencies of those ranges on the y-axis with no gaps between the bars.*, *Frequency Polygon- an illustration of quantitative data representing bins as segments on a line graph*, *(Bin- A range of values represented as a single bar in a histogram)*
*Compute and interpret the mean, the median, and the mode*
*Median- the midpoint of the values after they have been ordered from the minimum to the maximum values*, *Mode- the value of the observation that appears most frequently* , *Arithmetic Mean- (This is what is typically referred to as the mean) the average of a set of numerical values, calculated by adding all values together and dividing by the number of values in a set*
*Distinguish between nominal, ordinal, interval, and ratio levels of measurement. (And Dichotomous)*
*Nominal level of measurement: Data recorded at the nominal level of measurement is represented as labels or names. They have no order. They can only be classified and counted.*, *Ordinal level of measurement: Data recorded at the ordinal level of measurement is based on a relative ranking or rating of items based on a defined attribute or qualitative variable. Variables based on this level of measurement are only ranked or counted.*, *Interval level of measurement: For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement.*, *Ratio level of measurement: Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale.*
*Identify and compute measures of position*
*Percentile - a portion of a whole represented as its share of 100 (eg 50th percentile)*, *Decile - a portion of a whole represented as its share of 10 (eg 50th percentile is equal to 5th decile)*, *Quartile - a portion of a whole represented as its share of 4 (eg 50th percentile is equal to 5th decile is equal to 2nd quartile or median)*
*Classify variables as qualitative or quantitative, and discrete or continuous.*
*Qualitative variables: When data refers to qualities usually represented in words or letters.*, *Quantitative variables: when a variable can be reported numerically and mathematic functions can be performed*, *Discrete variable: can assume only certain values, and there are "gaps" between the values*, *Continuous variable Can assume any value within a specific range.*
*Construct a scatterplot for 2 quantitative variables*
*Scatterplot- an illustration of two qualitative or quantitative variables where one variable is represented on the x-axis and the other variable is represented on the y axis. Each case is represented with a mark at the intersection of its scores on those two variables* 2k>n k = about the number of bins you should use n = the number of observations in the sample i = the width of the bin H = the highest value L = the lowest value
*Describe the Shape of Data*
*Shape- Distribution shape is characterized by the number of peaks, symmetry, skewness and uniformity.*, *Normal- a common shape in which data are found resembling a bell or hill*, *Uniform- Data is spread equally across the entire range.*, *Symmetric-the data on the left side of the distribution perfectly mirrors the data on the right side of the distribution. If folded in half, it would be a mirror itself*
Geometric Mean
*The ______________ is used to find the rate of change from one period to another*, GM = [the nth root of (value at end of period/value at start of period)] - 1
*Define statistics and provide an example of how statistics is applied.*
*The science of collecting, organizing, analyzing, and interpreting data for the purpose of making more effective decisions.* How Statistics is Applied -->
*Compute a weighted mean.*
*Weighted Mean- Variation of arithmetic mean used when observations carry different weights (contributions to overall average).* w = (w1x1 + w2x2 + ... + wnxn) / (w1 + w2 + ... + wn)
Scatter Diagram
*a graphical tool to portray the relationship between two variables or bivariate data (both variables have to be measured with interval or ratio level scale -- must be quantitative)* - If the scatter of points moves from the *lower left to the upper right*, the variables under consideration are *directly or positively related* - If the scatter of points moves from the *upper left to the lower right*, the variables are *inversely or negatively related* - If the scatter of points are *all over the place*, there is *no correlation*
Frequency Table
*a grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class*, *Mutually exclusive = the data fit in just one class (ex. can't have Waco and Texas)*, *Collectively exhaustive = there is a class for each value*
Coefficient of Skewness
*a measure of the symmetry of a distribution* - Pearson's Coefficient of Skewness: sk = 3(Mean - Median)/s - The coefficient of skewness can range from -3 to +3 - A value near -3 indicates considerable negative skewness - A value of 1.63 indicates moderate positive skewness - A value of 0 means the distribution is symmetrical
Bar Chart
*illustrates data by displaying categories on the x-axis as independent bars and simple frequencies on the y-axis*, The most common illustration for qualitative data, they represent simple frequencies as the height of a bar
Pie Chart
*illustrates data by displaying frequencies and relative frequencies or percentages as portions of a circle*, unless you specifically want this feature of a pie chart, you should stick with a bar chart, each number on a pue slice represents "f", while the percentage represents "rel.f" expressed as a percentage
Scatterplots
*place one variable of interest on the x-axis and a second variable of interest on the y-axis. This allows the viewer to see the relationship between the two variables*, used when you want to compare two variables to see how one variable behaves in relation to another (ex. as one variable increases, what does the other do), each point in a scatterplot represents the scores of two variables for a single case
Frequency Polygons
*similar to histograms in that they contain the same information (frequencies are represented on the y-axis, and bins are computed to group the data into manageable chunks), but instead of using bars to represent frequencies with bins, a line is drawn*, do not illustrate data as clearly as histograms, so unless you have a specific reason to use a polygon, you should use a histogram (ex. of a specific reason would be to compare two frequency distributions)
Histogram
*strongly resemble bar charts, but with no gaps between bars*, the most common illustration for a single quantitative variable, there are no gaps between the bars because each bar does not represent a unique and discrete variable, instead each bar represents a range of values called a bin
Geometric Mean
*the nth root of the product of n positive values*, GM = the nth square root of (x1)(x2)...(xn)
Arithmetic Mean (Mean)
- Must be interval or ratio level - All values are included in computing the mean - The mean is unique (only one per data set) - The sum of the deviations of each value from the mean is zero
Median
- Not affected by extremely large or small values - Can be computed for ordinal level data or higher
Mode
- Not always unique - Used to find most typical value - Can be used on all levels of measurement
Major Characteristics of the Variance:
- all observations are used in the calculation - the units are somewhat difficult to work with (they are the original units squared) - the units are in real world metrics - it cannot be negative
Major Characteristics of the Range:
- only 2 values are used in its calculation - it is influenced by extreme values - it is easy to compute and understand
one
How many populations are present in an experiment?
How to Create a Frequency Table:
1. Get a list of every unique value in your qualitative variable & put them in order (if your data are text based -- ex. apple, banana, this should be alphabetical order; if numeric, you should use numeric order), 2. Once you have your list of values in a meaningful order, determine the frequencies and relative frequencies of each of those values, and enter them into your table
Steps to Find the Standard Deviation:
1. Is it a sample or a population? 2. Find the average 3. Subtract the average from each observation (this gives you each observation's "deviation" from the average) 4. Square each "deviation" to give you the squared deviation from the mean 5. Add all the squared deviations to give you the "squared deviations from the mean" 6. Divide by n if it is a population. Divide by (n-1) if it is a sample. 7. Take the square root of the quotient (the square root of the quotient is the standard deviation or the average distance each observation is from the mean)
Common Measures of Location/Central Tendency:
1. Mean, 2. Median, 3. Mode
How to Construct a Frequency Table:
1. Sort data into classes, 2. Count the number in each class and report as the class frequency, 3. Convert each frequency to a relative frequency
an experiment, a quasi-experiment, or a correlational study
3 Major Approaches to Testing a Hypothesis in a Business Context:
Median
50% of values are larger than this, and 50% of values are lower than this
Construct
A characteristic or property of interest in a population that can not be measured directly *a proposed attribute of a person that often cannot be measured directly, but can be assessed using a number of indicators *
Likert-type Scale
A grouping of multiple Likert-type Items (Ex. "Strongly Agree, Agree, Disagree")
Statistic
A measurable characteristic of a sample
population, sample
All = ______________, Some = ______________
guessing
Basing decisions on faulty translations is no better than ____________
Formulas
Constants are usually used in ______________
*Explain why knowledge of statistics is important.*
Data are collected everywhere and statistical knowledge is required to make that information useful, statistical techniques are used to make professional and personal decisions, a knowledge of statistics is needed to understand the world and your career *Statistics will help you make more effective personal and professional decisions*
multiple
Data will be either Nominal, Ordinal, Interval, or Ratio (one of the 4), but not ______________!
Skew
Data with ________ are data that deviate from the normal distribution.
Histogram
Each class is depicted as a rectangle, with the height of the bar representing the number in each class
assumptions
For a statistic to be meaningful, its ________________ must be met.
The Emperical Rule (aka the "68-95-99.7 Rule")
For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus/minus one standard deviation of the mean, about 95% of the observations will lie within plus/minus two standard deviations of the mean, and practically all (99.7%) will lie within three standard deviations of the mean
Discrete Data
Have only specific meaningful values (Ex. the number of applicants for a specific position -- you can't have a fraction of a person)
one
How many populations are present in a correlational study?
two or more
How many populations are present in a quasi-experiment?
Variables
In formulas, fractions are used to illustrate algebraic relationships between ___________ instead of portions of whole.
Ratio
Interval data with a meaningful zero point -- they have all the properties of interval data (meaningful ordered labels with consistent distances between values) plus a meaningful zero. (quantitative) *Based on a scale with a known unit of measurement and a meaningful interpretation of zero. Indicates difference, direction of difference, amount of difference, and has an absolute zero * ~ Can be classified, ranked, counted, added, subtracted, multiplied, and divided ~
nominal
It is inappropriate to use the median with ____________ data, because this type of data have no order.
Population Mean
Mu ("u" shaped symbol) = sum of ("E" shaped symbol) x/ N - Mu = population mean - N = number of values in the population - x = any particular value - "E" shaped symbol (sigma) = sum of the x values in the population
level of the subject
Random assignment must take place at the _____________________, or it is still a quasi-experiment (not a real experiment)
Not starting with zero on a chart/graph (this exaggerates the differences between regions)
One of the most common ways to mislead people with graphics
Interval
Ordinal data with meaningful distances between values -- they have all the properties of ordinal data (meaningful labels that are ordered) plus meaningful distances (quantitative) *Distance between values is meaningful. This level of measurement is based on a scale with a known unit of measurement. Indicates difference, direction of difference, and amount of difference* ~ Can be classified, ranked, counted, added, subtracted ~
Lower Outlier Limit =
Q1 - 1.5 (IQR)
Upper Outlier Limit =
Q3 + 1.5 (IQR)
discrete
Qualitative Data (nominal and ordinal) are always ____________
discrete or continuous
Quantitative Data (interval and ratio) may be ___________________
backup choice
Quasi-Experiments and Correlational Studies are always a _________________ (but are sometimes the only choice)
sex, gender, race
What are some examples of Nominal Data?
Proportions
The decimal equivalents of the fractional values
Rational argument
The link between a construct an operational definition, just like the link between a population and a sample, cannot be demonstrated with numbers, it too is a ___________________
Experiment
The most rigorous approach, the only approach that allows you to conclude something causes something else. Unfortunately it is also the most difficult approach to implement.
Statistics
The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions
Skewness
The shape of data - Symmetric (left side matches the right) - Positively Skewed (the mean is greater than the median, tapers off to positive numbers) - Negatively Skewed (the mean is lower than the median, tapers off to negative numbers) - Bimodal (higher probability on the exterior of the distribution, lower probability on the interior of the data)
Populations
The theoretical group that you want to draw conclusions about
Mode
The value of the observation that occurs the most frequently
Median
This is unique to a set of data
Proportions
To avoid confusion, most statisticians writing formulas represent most fractions as _____________
Summaries
Translating many numbers into fewer numbers
Illustrations
Translating many numbers into pictures
1. Bar Charts, 2. Pie Charts
Two Common Illustrations for Qualitative Data:
1. Histograms, 2. Frequency Polygons
Two Most Common Illustrations for Quantitative Variables:
Datasets
Variables are usually parts of ______________
causality
We can never make conclusions about __________ from correlational studies
Interval
We will consider survey data to be ____________
Correlational study
any study where multiple variables are measured at the same time without special treatment by the researcher
Continuous Data
can be subdivided infinitely without losing their meaning (Ex. money)
Negative Skew
data trails off to the left
Positive Skew
data trails off to the right
Nominal
data with meaningful labels -- the symbols used as data represent something in the real world (qualitative) *Data recorded is represented as labels or names. They have no order. They can only be classified or counted, indicates difference* ~ Can be classified & counted ~
Bar Charts have gaps between the bars because the data they represent are _____________ values
discrete, unique
Quantitative Data
quantities (usually represented as numbers)
Relative Frequency
each of the class frequencies is divided by the total number of observations, shows the fraction of the total number of observations in each class
Chebyshev's Theorem
for any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1-1/k^2, where k is any value greater than 1
Samples
groups of subjects drawn from a population at random; because subjects are much smaller than populations, we can more realistically collect measurements from the samples
"That doesn't look right" sense
helps you avoid making silly mistakes
Formulas
how statistical concepts are concisely represented
Weighted Mean
is found by multiplying each observation (x) by its corresponding weight (w)
Cummulative Frequency Distribution
is used to determine the number of observations that lie above or below a particular value in a dataset
2^k > n rule
k is the number of classes, n is the number of values in each dataset
Qualitative Data
labels or qualities (usually represented as words or letters)
Data
more than one datum
Ordinal
nominal data with a meaningful order -- they have all the properties of nominal data (meaningful labels) plus ordering (qualitative) *Based on relative ranking or rating of items based on a defined attribute or qualitative variable. Variables on this level of measurement are only ranked or counted. Indicates difference and direction of difference* ~ Can be classified, counted, and ranked ~
N
population size (the total number of cases within a particular population)