Business Stats 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Quantitative Data

indicate how many or how much: 1) discrete, if measuring how many 2) continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data.

coefficient of variation

indicates how large the standard deviation is in relation to the mean. In words: the standard deviation is about __% of the mean

Expected value of x bar

is equal to the population mean

Expected value of p bar

is equal to the population proportion

The standard deviation of p bar

is referred to as the standard error of the proportion

Range

difference between the largest and smallest data values • It is the simplest measure of variability • It is sensitive to the smallest and largest data values

Interquartile Range

difference between the third quartile and the first quartile • It is the range for the middle 50% of the data • It overcomes the sensitivity to extreme data values

Percent Frequency

relative frequency multiplied by 100

Median for an even number of observations

the median is the average of the middle two values in ascending order

Median

value in the middle when the data items are arranged in ascending order. -->Whenever a data set has extreme values, the median is the preferred measure of central location -->The median is the measure of location most often reported for annual income and property value data -->A few extremely large incomes or property values can inflate the mean

Interval estimate of a population mean

x bar + or - margin of error

(Data Sources) In Experimental Studies

the variable of interest is first identified --> then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest.

Extra info on sampling distribution of x bar

• A finite population is treated as being infinite if n/N is less than or equal to .05. • sqrt( (N − n) /(N −1)) is the finite population correction factor. -the standard deviation of x bar is referred to as the standard error of the mean. • When the population has a normal distribution, the sampling distribution of x bar is normally distributed for any sample size. • In most applications, the sampling distribution of x bar can be approximated by a normal distribution whenever the sample is size 30 or more. • In cases where the population is highly skewed or outliers are present, samples of size 50 may be needed. • The sampling distribution of x bar can be used to provide probability information about how close the sample mean x bar is to the population mean μ .

Dot Plot

• A horizontal axis shows the range of data values • Then each data value is represented by a dot placed above the axis

Percentiles

A percentile provides information about how the data are spread over the interval from the smallest value to the largest value -The pth percentile of a data set value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more

Consistency

A point estimator is consistent if the values of the point estimator tend to become closer to the population parameter as the sample size becomes larger. In other words, a large sample size tends to provide a better point estimate than a small sample size

point estimator (Sample vs Pop)

A sample statistic is referred to as the point estimator of the corresponding population parameter

Applications of Statistics

Accounting -Public accounting firms use statistical sampling procedures when conducting audits for their clients Economics -Economists use statistical information in making forecasts about the future of the economy. Marketing -Electronic scanners at retail checkout counters are used to collect data for marketing research. Production -A variety of statistical quality control charts are used to monitor the output of a production process.

Cross-sectional Data

are collected at the same or approximately the same point in time. ex. the number of building permits issued in February 2010 in each of the counties of Ohio

Time-series data

are collected over several time periods. ex. the number of building permits issued in Lucas County, Ohio in each of the last 36 months

unbiased

When the expected value of the point estimator equals the population parameter, we say the point estimator is unbiased.

Weighted Mean

When the mean is computed by giving each data value a weight that reflects its importance. • When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value

Interval Estimate

can be computed by adding and subtracting a margin of error to the point estimate. Point Estimate +/− Margin of Error The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter.

Sample Survey

collecting data for a sample

Census

collecting data for the entire population

Existing Sources (data sources)

companies' internal records, Internet, government agencies, etc.

Descriptive measures for the relationship between two variables

covariance and correlation coefficient

Standard Deviation

positive square root of the variance -->It is measured in the same units as the data, making it more easily interpreted than the variance

Exploratory data analysis

procedures enable us to use simple arithmetic and easy-to-draw pictures to summarize data. Simply sort the data values into ascending order and identify the five-number summary and then construct a box plot.

Guidelines for Determining the Width of Each Class

+ Use classes of equal width • Class width formula

Summarizing Quantitative Data

+Frequency Distribution • Relative Frequency and Percent Frequency Distributions • Dot Plot • Histogram • Cumulative Distributions • Ogive

Things to note about Cumulative Distributions

-->The last entry in a cumulative frequency distribution always equals the total number of observations -->The last entry in a cumulative relative frequency distribution always equals 1.00 -->The last entry in a cumulative percent frequency distribution always equals 100

Systematic Sampling

-If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. -We randomly select one of the first n/N elements from the population list. -We then select every n/Nth element that follows in the population list.

Convenience Sampling

-It is a nonprobability sampling technique. Items are included in the sample without known probabilities of being selected. -The sample is identified primarily by convenience.

Sampling recommendation

-It is recommended that probability sampling methods (simple random, stratified, cluster, or systematic) be used. -For these methods, formulas are available for evaluating the "goodness" of the sample results in terms of the closeness of the results to the population parameters being estimated. -An evaluation of the goodness cannot be made with non-probability (convenience or judgment) sampling methods.

Sampling from an infinite population

-Sometimes we want to select a sample, but find it is not possible to obtain a list of all elements in the population. -As a result, we cannot construct a frame for the population. -Hence, we cannot use the random number selection procedure. -Most often this situation occurs in infinite population cases. Populations are often generated by an ongoing process where there is no upper limit on the number of units that can be generated.

Judgement Sampling

-The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. -It is a nonprobability sampling technique.

Stratified Random Sampling

-The population is first divided into groups of elements called strata. -Each element in the population belongs to one and only one stratum. -Best results are obtained when the elements within each stratum are as much alike as possible -A simple random sample is taken from each stratum. -Formulas are available for combining the stratum sample results into one population parameter estimate.

Cluster Sampling

-The population is first divided into separate groups of elements called clusters. -Ideally, each cluster is a representative small-scale version of the population (i.e. heterogeneous group). -A simple random sample of the clusters is then taken. -All elements within each sampled (chosen) cluster form the sample.

(Sampling distribution of x bar) Standard deviation of the sampling of distribution of x bar

...

Nominal Scale

1) Data are labels or names used to identify an attribute of the element; 2) A nonnumeric label or numeric code may be used. ex. Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on.

three steps necessary to define the classes for a frequency distribution with quantitative data are

1) Determine the number of non-overlapping classes 2) Determine the width of each class 3) Determine the class limits

Statistical Studies

1) Experimental; 2) Observational.

Categorical Data

1) Labels or names used to identify an attribute of elements; 2) Often referred to as qualitative data; 3) Use either the nominal or ordinal scale of measurement; 4) Can be either numeric or nonnumeric; 5) Appropriate statistical analyses are rather limited.

Ratio Scale

1) The data have all the properties of interval data and the ratio of two values is meaningful; 2) This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. ex. Melissa's college record shows 36 credit hours earned, while Kevin's - 72 credit hours. Kevin has twice as many credit hours earned as Melissa.

Ordinal Scale

1) The data have the properties of nominal data and the order or rank of the data is meaningful; 2) A nonnumeric label or numeric code may be used. ex. Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior.

Interval Scale

1) The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure; 2) Interval data are always numeric. ex. Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin.

The scale

1) determines the amount of information contained in the data; 2) indicates the data summarization and statistical analyses that are most appropriate

Calculating Percentiles

1. Arrange the data in ascending order 2. Compute index i, the position of the pth percentile i = (p/100)n 3. If i is not an integer, round up. The pth percentile is the value in the ith position. 4. If i is an integer, the pth percentile is the average of the values in positions i and i +1

Empirical Rule

A statistical rule stating that for a normal distribution, almost all datawill fall within three standard deviations of the mean. Broken down, the empirical rule shows that 68% will fall within the first standard deviation, 95% within the first two standard deviations, and 99.7% will fall within the first three standard deviations of the mean.

Data set

All the data collected in a particular study

Skewness

An important measure of the shape of a distribution

Detecting Outliers

An outlier is an unusually small or unusually large value in a data set • A data value with a z-score less than -3 or greater than +3 might be considered an outlier It might be: • an incorrectly recorded data value • a data value incorrectly included in the data set • a correctly recorded data value that belongs in the data set

Chebyshev's Theorem

At least (1 - 1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. Chebyshev's theorem requires z > 1, but z need not be an integer

Guidelines for determining the class limits

Class limits must be chosen so that each data item belongs to one and only one class • The lower class limit identifies the smallest possible data value assigned to the class • The upper class limit identifies the largest possible data value assigned to the class • The appropriate values for the class limits depend on the level of accuracy of the data

Summarizing Relationship between two variables

Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously

Sampling from a finite population

Finite populations are often defined by lists such as: • Organization membership roster • Credit card account numbers • Inventory product numbers • A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.

Efficiency

Given the choice of two unbiased estimators of the same population parameter, we would prefer to use the point estimator with the smaller standard deviation, since it tends to provide estimates closer to the population parameter The point estimator with the smaller standard deviation is said to have greater relative efficiency than the other

Interval Estimate of a Population Mean: σ (standard deviation of pop) Unknown

If an estimate of the population standard deviation σ cannot be developed prior to sampling, we use the sample standard deviation s to estimate σ . In this case, the interval estimate for μ is based on the t distribution. • (We'll assume for now that the population is normally distributed.)

Unbiased (point estimators)

If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to be an unbiased estimator of the population parameter

Population Parameters

If the measures are computed for data from a population

Sample Statistics

If the measures are computed for data from a sample

Interval estimate of a population mean: standard deviation of pop (sigma) known

In order to develop an interval estimate of a population mean, the margin of error must be computed using either: • the population standard deviation σ , or • the sample standard deviation s - σ is rarely known exactly, but often a good estimate can be obtained based on historical data or other information. In most applications, a sample size of n = 30 is adequate. • If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. • If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. ^ same for pop mean unknown

Random Sample (Infinite pop)

In the case of an infinite population, we must select a random sample in order to make valid statistical inferences about the population from which the sample is taken. it is a sample selected such that the following conditions are satisfied. • Each elements elected comes from the population of interest. • Each element is selected independently.

Formula Class Width

Largest Data Value - Smallest Data Value / Number of Classes

Sample Size for an Interval Estimate of a Population Mean

Let E = the desired margin of error. E is the amount added to and subtracted from the point estimate to obtain an interval estimate. If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined. The Necessary Sample Size equation requires a value for the population standard deviation σ . If σ is unknown, a preliminary or planning value for σ can be used in the equation. 1. Use the estimate of the population standard deviation computed in a previous study. 2. Use a pilot study to select a preliminary study and use the sample standard deviation from the study. 3. Use judgment or a "best guess" for the value of σ .

Descriptive Statistics

Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand such summaries of data, which may be tabular, graphical, or numerical

Sampling with replacement (finite pop)

Replacing each sampled element before selecting subsequent elements. Sampling without replacement is the procedure used most often.

Histograms showing Skewness

Symmetric • Left tail is the mirror image of the right tail Examples: heights and weights of people Moderately Skewed Left • A longer tail to the left Example: exam scores Highly Skewed Right • A very long tail to the right Example: executive salaries

Skewness

Symmetric (not skewed) • Skewness is zero • Mean and median are equal Moderately Skewed Left • Skewness is negative • Mean will usually be less than the median Moderately Skewed Right • Skewness is positive • Mean will usually be more than the median Highly Skewed Right • Skewness is positive (often above 1.0) • Mean will usually be more than the median

Correlation Coefficient

The coefficient can take on values between -1 and +1 • Values near -1 indicate a strong negative linear relationship • Values near +1 indicate a strong positive linear relationship • The closer the correlation is to zero, the weaker the relationship

Median for odd number of observations

The median is the middle value

Form of the Sampling Distribution of p bar

The sampling distribution of p bar can be approximated by a normal distribution whenever the sample size is large enough to satisfy the two conditions: np (greater than or equal to) 5 and n(1 - p) {greater than or equal to} 5 . . . because when these conditions are satisfied, the probability distribution of x in the sample proportion, p bar = x/n, can be approximated by normal distribution (and because n is a constant).

Observation

The set of measurements obtained for a particular element

Z-Scores

The z-score is often called the standardized value. It denotes the number of standard deviations a data value is from the mean An observations z-score is a measure of the relative location of the observation in a data set -A data value less than the sample mean will have a z-score less than zero -A data value greater than the sample mean will have a z-score greater than zero -A data value equal to the sample mean will have a z-score of zero

Central Limit Theorem

When the population from which we are selecting a random sample does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution of x bar. In selecting random samples of size n from a population, the sampling distribution of the sample mean x bar can be approximated by a normal distribution as the sample size becomes large.

Variable

a characteristic of interest for the elements.

Pie Chart

a commonly used graphical device for presenting relative frequency and percent frequency distributions for categorical data -->First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class -->Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.

Point estimation

a form of statistical inference. we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. • We refer to x bar as the point estimator of the population mean μ. • s is the point estimator of the population standard deviation σ. • p bar is the point estimator of the population proportion p.

Ogive

a graph of a cumulative distribution • The data values are shown on the horizontal axis • Shown on the vertical axis are the cumulative frequencies, cumulative relative frequencies, or cumulative percent frequencies • The frequency (one of the above) of each class is plotted as a point • The plotted points are connected by straight lines

Scatter Diagram

a graphical presentation of the relationship between two quantitative variables • One variable is shown on the horizontal axis and the other variable is shown on the vertical axis • The general pattern of the plotted points suggests the overall relationship between the variables • A trendline provides an approximation of the relationship (positive angles up, negative angles down, no apparent relationship straight across)

Box Plot

a graphical summary of data that is based on a five-number summary A key to the development of a box plot is the computation of the median and the quartiles Q1 and Q3 -->Box plots provide another way to identify outliers A box is drawn with its ends located at the first and third quartiles • A vertical line is drawn in the box at the location of the median (second quartile) Limits are located (but not drawn in the plot) using the interquartile range (IQR): • The lower limit is located 1.5(IQR) below Q1; • The upper limit is located 1.5(IQR) above Q3. Data outside these limits are considered outliers.

Frame

a list of the elements that the sample will be selected from.

Correlation

a measure of linear association and not necessarily causation. Just because two variables are highly correlated, it does not mean that one variable is the cause of the other.

Covariance

a measure of the linear association between two variables • Positive values indicate a positive relationship • Negative values indicate a negative relationship

Variance

a measure of variability that utilizes all the data The variance is the average of the squared differences between each data value and the mean It is based on the difference between the value of each observation and the mean(x bar) for a sample, μ for a population) • The variance is useful in comparing the variability of two or more variables

Sample

a subset of the population

Relative Frequency

fraction or proportion of the total number of data items belonging to the class

Bar Chart

graphical device for depicting qualitative data. Fixed width of bars. • On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes • A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis)

Most common numerical descriptive statistic

mean (average)

(Data Sources) in Observational (non experimental) Studies

no attempt is made to control or influence the variables of interest (for example, survey)

Statistics

numerical facts that help us understand a variety of situations : - averages; - medians; - percents; - index numbers. also: the art and science of collecting, analyzing, presenting, and interpreting data.

Interval Estimate of a Population Proportion

p bar ± Margin of Error The sampling distribution of p bar plays a key role in computing the margin of error for this interval estimate. The sampling distribution of p bar can be approximated by a normal distribution whenever np {greater than or equal to} 5 and n(1 - p) {greater than or equal to} 5.

Open-end Class

requires only a lower class limit or an upper class limit

Stem and Leaf Display

shows both the rank order and shape of the distribution of the data • It is similar to a histogram on its side, but it has the advantage of showing the actual data values • Thefirstdigitsofeachdataitemarearrangedtothe left of a vertical line • To the right of the vertical line we record the last digit for each item in rank order • Each line in the display is referred to as a stem • Each digit on a stem is a leaf

Cumulative Frequency Distributions

shows the number of items with values less than or equal to the upper limit of each class

Cumulative percent frequency distribution

shows the percentage of items with values less than or equal to the upper limit of each class

Cumulative relative frequency distribution

shows the proportion of items with values less than or equal to the upper limit of each class

Percent Frequency Distribution

tabular summary of a set of data showing the percent frequency for each class

Relative Frequency Distribution

tabular summary of a set of data showing the relative frequency for each class

Crosstabulation

tabular summary of data for two variables Crosstabulation can be used when: • one variable is qualitative, the other is quantitative • both variables are qualitative • both variables are quantitative

Frequency Distribution

tabular summary of data showing the frequency (or number) of items in each of several non- overlapping classes The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data

Elements

the entities on which data are collected

Data

the facts and figures collected, analyzed, and summarized for presentation and interpretation

Properties of point estimators

the following properties associated with good point estimators. • Unbiased • Efficiency • Consistency

Sampled population

the population from which the sample is actually taken. Whenever a sample is used to make inferences about a population, we should make sure that the targeted population and the sampled population are in close agreement.

Sampled Population

the population from which the sample is drawn.

Target population

the population we want to make inferences about.

Sampling distribution of x bar

the probability distribution of all possible values of the sample mean (x bar).

Sampling distribution of p bar

the probability distribution of all possible values of the sample proportion p bar.

Statistical Inference

the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population

Population

the set of all elements of interest in a study

Mode

value that occurs with greatest frequency -The greatest frequency can occur at two or more different values -If the data have exactly two modes, the data are bimodal -If the data have more than two modes, the data are multimodal -Caution: If the data are bimodal or multimodal, Excel's MODE function will incorrectly identify a single mode

Measures of Location

• Mean • Median • Mode • Percentiles • Quartiles

Quartiles

• Quartiles are specific percentiles • First Quartile: 25th percentile • Second Quartile: 50th percentile = Median • Third Quartile: 75th percentile

Measures of dispersion (Variability)

• Range • Interquartile Range • Variance • Standard Deviation • Coefficient of Variation

Other Sampling Methods

• Stratified Random Sampling • Cluster Sampling • Systematic Sampling • Convenience Sampling • Judgment Sampling

Mean

• The mean provides a measure of central location • The mean of a data set is the average of all the data values • The sample mean (x bar) is the point estimator of the population mean μ

Introduction to Sampling and Sampling distributions

• The reason we select a sample is to collect data to answer a research question about a population. • The sample results provide only estimates of the values of the population characteristics. • The reason is simply that the sample contains only a portion of the population. • With proper sampling methods, the sample results can provide "good" estimates of the population characteristics.

T Distribution

• The t distribution is a family of similar probability distributions. • A specific t distribution depends on a parameter known as the degrees of freedom. • Degrees of freedom refer to the number of independent pieces of information that go into the computation of s. • A t distribution with more degrees of freedom has less dispersion. • As the degrees of freedom increases, the difference between the t distribution and the standard normal probability distribution becomes smaller and smaller.

Histogram

• The variable of interest is placed on the horizontal axis • A rectangle is drawn above each class interval with its height corresponding to the interval's frequency, relative frequency, or percent frequency • A histogram has no natural separation between rectangles of adjacent classes

Grouped Data

• The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data • To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class • We compute a weighted mean of the class midpoints using the class frequencies as weights • Similarly, in computing the variance and standard deviation, the class frequencies are used as weights

Guidelines for determining number of classes

• Use between 5 and 20 classes • Data sets with a larger number of elements usually require a larger number of classes • Smaller data sets usually require fewer classes The goal is to use enough classes to show the variation in the data, but not so many classes that some contain only a few data items

Some examples of on-going processes, with infinite populations, are:

• parts being manufactured on a production line • transactions occurring at a bank • telephone calls arriving at a technical help desk • customers entering a store


Ensembles d'études connexes

RN Comprehensive Online Practice 2023 B

View Set

Contemporary American Indian Voices Assignment

View Set

REVIEW HUMAN NUTRITION AND WELLNESS

View Set

PrepU: Chap. 37: Management of Patients with Musculoskeletal Trauma

View Set

Junior Year Business Classes-Principles of Finance

View Set

Franchising ch. 2 "The Advantages of Franchising"

View Set

U.S History: The Great Depression

View Set

economics rational self interest

View Set