Business Stats Test 1
Relative Frequency Distribution shows
the fraction or percentage of observations in each class interval
Advantage of stem and leaf over frequency distribution
the identity of each observation is not lost and helps better understand the distribution
Median
the midpoint of the values after they have been ordered from the minimum to the maximum values
5 statistics needed to construct a box plot
the min value Q1 the median Q3 the max value
Quantitative Variable
information is reported numerically ex: balance in your checking account, minutes remaining in class
An advantage of a cumulative frequency polygon over a histogram
it can show the total number of observations up to a particular class boundary
Advantage of the standard deviation over the variance
it is in the same units as the data
the advantage of standard deviation over the variance
it is in the same units as the data
Mode
the value of the observation that appears most frequently
Raw Data
ungrouped data which is to be organized into a frequency distribution
Properties of median
unique for each data set not affected by extremely large or small values valuable measure of central tendency can be computed for an open-ended frequency distribution
Why does sample variance use one less than sample size
use of sample size tends to underestimate the population variance and subtracting one corrects this
What is the purpose of a measure of location
'to pinpoint the center' of a distribution data
What characteristic of a data set makes the median the best measure of the center of the data?
One or two very large or very small values
Quantitative Variable Classification
Discrete and Continuous
Box Plot
A graphical display, based on quartiles, that helps us picture a set of data
Shape
characteristic of distribution
Coefficient of skewness range
-3 up to 3
Formula to determine the number of classes
2^k>n
Steps to Relative Frequency Table
Divide the data into classes Count the observations Calculate the fraction of observations in each class
Pie Chart
A chart that shows the proportion or percent that each class represents of the total number of frequencies
Inferential Statistics
A decision, estimate, prediction, or generalization about a population based on a sample.
Bar Chart
A graph that shows qualitative classes on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars
Histogram
A graph which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars.
Frequency Table
A grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class
Frequency Distribution
A grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class
Parameter
A measurable characteristic of a population, such as the mean or dispersion
Statistic
A measurable characteristic of a sample
Two major characteristics of mean
ALL values are used the sum of deviations from the mean is 0
Frequency Polygon
Consists of line segments connecting the class midpoints of the class frequencies
Frequency Distribution Steps
Decide the # of classes Determine class width Set individual class limits Tally the # of observations in each class
Three measures of location
Arithmetic mean Median Mode
Measures of location are referred to as
Averages
Relative Frequency
Captures the relationship between a class total and the total number of observations
Empirical Rule
For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus and minus one standard deviation of the mean; about 95% of the observations will lie within plus and minus two standard deviations of the mean; and practically all (99.7%) will lie within plus and minus three standard deviations of the mean.
Common Displays of Frequency Distribution
Histograms Frequency Polygons Cumulative Frequency Distributions
Types of Variables
Qualitative and Quantitative
Measures of position
Quartiles Deciles Percentiles
Three measures of dispersion
Range, Variance, Standard Deviation
Mode is especially useful in summarizing what kind of data
Nominal-level
Class Interval
Obtained by subtracting the lower limit of a class from the lower limit of the next class
Which two of the following practices is commonly used in setting class limits for a frequency distribution?
Placing "excess" interval width equally in the two tails of the distribution Rounding the class size up
Stem-and-leaf Display
Statistical Technique used to display quantitative info leading digit=stem(vertical) trailing digit=leaf(horizontal)
Two ethical approaches to the use of statistics
requires objective and honest communication of results must maintain independent and principled point of view
Arithmetic Mean
requires the interval scale and is calculated by summing the values and dividing by the # of values
Class Midpoint
The average of the upper and lower limits of two consecutive classes
Class Frequency
The number of observations in each class
Dispersion
The variation/spread in data
Negatively Skewed Distribution
There are a small number of observations that are much lower in value than most of the data
A frequency table shows what kind of data
qualitative (nominal)
Population
a collection of all possible individuals, objects, or measurements of interest
Sample
a portion, or part, of the population of interest
In sample variance, dividing by n-1 corrects what
a tendency to underestimate population variance
A frequency distribution groups..
quantitative data into classes showing the number of observations per class
What kind of data is shown in a histogram
quantitative data/variables (interval or ratio level)
Ratio Level
all quantitative data. the highest level of measurement. data classifications are ordered according to the amount of the characteristics they possess. equal differences in the characteristics are represented by equal differences in the numbers assigned to classifications
two reasons to study dispersion
allows the comparison of the spread in two or more distributions a small value for dispersion indicates that the data is closely clustered around the center
To divide data with a high value of H and a low value of L into k classes, the interval must be
at least (H-L)/k
Continuous Variable
can assume any value within a specified range ex: the pressure in a tire, the height of students in a class
Discrete Variables
can only assume certain values and there are usually "gaps" between values ex: the number of bedrooms in a house
Ordinal Level
data arranged in some order, but the differences between the data values cannot be determined or are meaningless data classified can be ranked or ordered
Interval Level
data classifications are ordered according to the amount of the characteristics they possess equal differences in the characteristic are represented by equal differences in the measurements
Nominal Level
data that is classified into categories and cannot be arranged in any particular order
Level of Measurement
dictates the calculation that can be done to summarize and present the data. used to determine the statistical tests that should be performed on the data
example of ordinal
during a taste test of 4 soft drinks, mellow yellow was ranked number 1, sprite number 2, seven up 3, orange 4
Convert a frequency distribution to relative frequency
each class frequency is divided by total number of observations
A frequency distribution displays info of what level of data
ratio (quantitative)
example of nominal
eye color, gender, religious affiliation
Population Mean
for ungrouped data the sum of all the population values divided by the total number of population values
Sample mean
for ungrouped data, is the sum of all the sample values divided by the number of sample values
Dot plots
groups the data as little as possible and the identity of an individual observation is not lost each observation is displayed as a dot along the horizontal # line used for smaller sets of data
Formula to determine the class interval/width
i>_ (H-L)/k
Histogram has what advantage over the frequency polygon
it shows the class width directly as a rectangle with the height representing the # of observations
Three weaknesses of the Range
may be unduly influenced by an unusually large value may be unduly influenced by an especially small value only two values from the data set are used
Why is dispersion important
measures of location do not tell us about the spread or clustering of data
what is the best measure of "average" income in a country where most of the households have annual incomes of about 40,000 but a small number of households have incomes above 1,000,000
median
Descriptive Statistics
methods of organizing, summarizing, and presenting data in an informative way
example of ratio
monthly income of surgeons, or distance traveled by manufacturers representatives per month
three reasons the mode is not a good measure of average
no observation occurs more than once the data is bimodal the most frequent observation is much higher or lower than most of the data values
Four levels of measurement
nominal, interval, ordinal, ratio
"frequency" in a frequency distribution refers to what
number of "observations" in each of the classes into which the data is divided
Mean and standard deviation calculation for grouped data is..
only an estimate of the corresponding actual value
Variance
overcomes the problem of negative deviations by squaring them uses all of the values in a data set
a relative frequency converts the class frequency to what
percentage or proportion
Why take sample instead of population
prohibitive cost of census, destruction of item being studied may be required, not possible to test or inspect all member of a population being studied
What kind of data can a median be computed for
ratio, interval, and ordinal
4 Shapes
symmetric positively skewed negatively skewed bimodal
dispersion
tells us about the spread of data
example of interval
temperature of the Fahrenheit scale, womens dress sizes
How is the number of observations for each class plotted on a frequency distribution table
the # on the vertical axis and the class midpoint on the horizontal axis
Zero Point
the absence of the characteristic and the ratio between two numbers is meaningful. ratio level data
Qualitative Variable
the characteristic being studied is nonnumeric ex: gender, religion, type of automobile owned, state of birth, eye color
What info is on the vertical axis of a bar chart
the class and relative class frequencies
when is the mode used to measure the "average" of a set of data
the data is symmetrically distributed but has one very high value
Weighted mean
the denominator is always the sum of the weights used with data that has repeated values, such as frequency distribution
Range
the difference between the largest and the smallest values in a data set
Sample Standard Deviation
used as an estimator of the population standard deviation
Weighted Mean
used with data that has repeated values denominator is always the sum of the weights
Geometric Mean
useful in finding the average change of percentages, ratios, indexes, or growth rates over time *will always be less than or equal to the arithmetic mean
Piled on dots
when there are identical observations that are too close to be shown individually, the dots are piled on top of each other