Statistical Quantitative Analysis

Ace your homework & exams now with Quizwiz!

A confidence interval is...

A confidence interval is a range of numbers constructed around a point estimate such as a sample mean or sample proportion. It is used to provide a range estimate of a population parameter, based on a sample statistic.

Histogram

A graphical representation of a distribution table

Percentage Polygon

A line graph connecting the midpoints of the tops of the columns of a histogram based on percentages instead of counts.

Parameter vs. Statistic

A measurable characteristic of a population, such as a mean or standard deviation, is called a parameter; but a measurable characteristic of a sample is called a statistic.

Population vs. Sample

A population includes each element from the set of observations that can be made. A sample consists only of observations drawn from the population. There are different formulas for variation and standard deviation depending on whether you are measuring them for a population or a sample.

A sampling distribution drawn from a normally distributed population....

A sampling distribution for samples drawn from a normally-distributed population is always normally distributed, regardless of sample size.

Scatterplot

A scatterplot is a graphic tool used to display the relationship between two quantitative variables. A scatterplot consists of an X axis (the horizontal axis), a Y axis (the vertical axis), and a series of dots. Each dot on the scatterplot represents one observation from a data set. The position of the dot on the scatterplot represents its X and Y values.

Sampling frame

A set of information used to identify a sample population for statistical treatment; a list of all the people that are in the population.

Type 2 Error

A statistical term used within the context of hypothesis testing that describes the error that occurs when one accepts a null hypothesis that is actually false. The error rejects the alternative hypothesis, even though it does not occur due to chance.

Frequency Distribution Table

A table that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.

Type 1 Error

A type of error that occurs when a null hypothesis is rejected although it is true. The error accepts the alternative hypothesis, despite it being attributed to chance.

Variance

An equivalent measure is the square root of the variance, called the standard deviation. The standard deviation has the same dimension as the data, and hence is comparable to deviations from the mean. The variance is one of several descriptors of a probability distribution.;; The average of the squared differences from the Mean.

The empirical rule for a normal distribution

Approximately 68% of the values are within +/- 1 standard deviation from the mean Approximately 95% of the values are within +/- 2 standard deviations from the mean Approximately 99.7% of the values are within +/- 3 standard deviations from the mean

Scatterplot Strengths

Be able to look at a scatterplot and describe the apparent relationship between the two plotted variables (ie. No relationship, strong positive ("increasing") relationship, weak positive ("increasing") relationship, strong negative ("decreasing") relationship, weak negative ("decreasing") relationship

FDT stuff

Classes Class interval width can be approximated by (highest value - lowest value)/(number of classes) Typical number of classes is 5-15

Descriptive Statistics

Collecting, summarizing, presenting, and analyzing data & identifying trends & patterns

Confidence intervals become _____ as sample size increases.

Confidence intervals become narrower as sample size increases.

Convenience Sample

Convenience sampling is a non-probability sampling technique where subjects are selected because of their convenient accessibility and proximity to the researcher.

Coverage Error

Coverage error is a bias in a statistic that occurs when the target population does not coincide with the population actually sampled. The source of the coverage error may be an inadequate sampling frame or flaws in the implementation of the data collection.

Null Hypothesis

Denoted by H0, a type of hypothesis used in statistics that proposes that no statistical significance exists in a set of given observations. The null hypothesis attempts to show that no variation exists between variables, or that a single variable is no different than zero. It is presumed to be true until statistical evidence nullifies it for an alternative hypothesis.

Kurtosis

Describes how sharply the distribution rises as it approaches its center. Datasets with positive kurtosis numbers have a more sharply peaked distribution. Datasets with a negative kurtosis number have wider, non-peaked distributions.

Be able to identify the correct null or alternative hypothesis

For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as H0: p = 0.5 Ha: p <> 0.5 Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. That is, we would conclude that the coin was probably not fair and balanced.

Why to put two percentage polygons on the same graph

It is a good way to compare the distribution of two or more variables.

The reasons for sampling

It is often impossible or too expensive to gather information on an entire population. Moreover, sometimes the process of measurement itself destroys or harms the sample, so it wouldn't make sense to test/measure the entire population.

Know that multiple regression has _____.

Know that multiple regression has two or more x variables

Know the four assumptions of linear regression ("LINE")

Know the four assumptions of linear regression ("LINE") o Linear relationship between the x and y variables o Residuals are Independent of each other o Residuals are Normally distributed o Residuals have Equal variance (relatively constant variance at different levels of the x variable)

Measurement Error

Measurement errors occur when the response provided differs from the real value; such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the respondent's record-keeping system. Such errors may be random or they may result in a systematic bias if they are not random.

Nonresponse Error

Non-response errors occur when the survey fails to get a response to one, or possibly all, of the questions.

Quartiles

Quartiles are the values that divide a list of numbers into quarters. First put the list of numbers in order. Then cut the list into four equal parts. The Quartiles are at the "cuts."

Sampling Error

Sampling error is the deviation of the selected sample from the true characteristics, traits, behaviors, qualities or figures of the entire population.

Since the sample statistics are normally distributed with a normally distributed population or approximately normally distributed with any population and a sample size of ≥ 30 or so, we can...

Since the sample statistics are normally distributed with a normally distributed population or approximately normally distributed with any population and a sample size of ≥ 30 or so, we can do probability calculations related to the distribution of sample statistics using the standard normal distribution table (z-table) or the Excel formulas for normal distribution probability calculations

Skewness

Skewness measures the degree to which data values are symmetrical about the mean. A dataset with a skew value of zero is perfectly symmetrical. In general, right-skewed data had a positive skew number, and left-skewed data has a negative skew number.

Things that limit the representativeness of our sample

Survey Errors

Interquartile Range

The "Interquartile Range" is from Q1 to Q3, to calculate it just subtract Quartile 1 from Quartile 3.

Mean

The "mean" is the "average" you're used to, where you add up all the numbers and then divide by the number of numbers.

Median

The "median" is the "middle" value in the list of numbers. To find the median, your numbers have to be listed in numerical order, so you may have to rewrite your list first.

Mode

The "mode" is the value that occurs most often. If no number is repeated, then there is no mode for the list.

Norm.Inv

The Excel NORMINV function calculates the inverse of the Cumulative Normal Distribution Function for a supplied value of x, and a supplied distribution mean & standard deviation. NORMINV( probability, mean, standard_dev )

Standard Deviation

The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ (the greek letter sigma) The formula is the square root of the Variance. The standard deviation is just the square root of variance, so it is essentially the average distance from the mean.

Alternative Hypothesis

The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.

Difference between Probability Sample and Non-probability Sample

The big difference is that in probability sampling all persons have a chance of being selected, and results are more likely to accurately reflect the entire population.

The confidence level (1-α ) is the probability of....

The confidence level (1-α ) is the probability of NOT making a type 1 error, that is, the probability of NOT rejecting the null hypothesis when it is actually true and should not be rejected

The confidence level (denoted as 1-α) represents...

The confidence level (denoted as 1-α) represents the probability that the confidence interval contains the population parameter. So, for example, a 95% confidence interval is constructed so that the probability that the interval contains the population parameter is .95.

A Sampling distribution

The distribution of a given sample statistic (such as a mean or proportion) that would result if you took all possible samples of a given size from a population.

The line is fit to...

The line is fit to the data to minimize the sum of the squared distances between the actual values of the dependent variable and the predicted values of the dependent variable

The sample mean is an "unbiased estimator" of...

The sample mean is an "unbiased estimator" of the population mean. That means that the mean sample mean (the mean of the distribution of sample means) is always equal to the population mean if all possible samples of a given size are taken.

The sample proportion is an "unbiased estimator" of...

The sample proportion is an "unbiased estimator" of the population proportion. That means the mean sample proportion (the mean of the distribution of sample proportions) is always equal to the population proportion if all possible samples of a given size are taken.

How the Central Limit Theorem applies to sampling distributions:

The sampling distribution for samples drawn from any population (regardless of the population distribution) approximates the normal distribution as long as the sample size is large enough. A sample size of 30 is typically used as the guideline for "large enough."

The size of standard errors ____ as sample size increases.

The size of standard errors decreases as sample size increases

Coefficient of variation

The standard deviation divided by the mean, and it is typically displayed as a percentage. The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each other.

The standard deviation of sample means is called...

The standard deviation of sample means is called the standard error of the mean. It is equal to the population standard deviation divided by the square root of the sample size: σ/√n

The standard deviation of sample proportions is called...

The standard deviation of sample proportions is called the standard error of the proportion and is equal to √((π(1-π))/n) where π is the population proportion and n is the sample size.

Standard normal distribution

The standard normal distribution is one special distribution out of the infinite number of normal distributions. The standard normal distribution has a mean of 0 and a standard deviation of 1. Any normal distribution can be standardized to the standard normal distribution by a simple formula. This is why typically the only normal distribution with tabled values is that of the standard normal distribution. This type of table is sometimes referred to as a table of z-scores.

Central Tendency

The term central tendency refers to the "middle" value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation.

Z-Score

The z-score for a particular value is the difference between the value and the mean, expressed in terms of standard deviations. So, the formula is (value - mean)/(standard deviation).

Normal Distributions

There are an infinite number of normal distributions. A normal distribution is defined by a particular function in which two values have been determined: the mean and the standard deviation. The mean is any real number that indicates the center of the distribution. The standard deviation is a positive real number that is a measurement of how spread out the distribution is. Once we know the values of the mean and standard deviation, the particular normal distribution that we are using has been completely determined.

Norm.Dist

There are four arguments required for the function: "x," "mean," "standard deviation" and "cumulative." The first argument of x is the observed value from our distribution. The mean and standard deviation are self-explanatory. The last argument of "cumulative" is identical to that of the NORM.S.DIST function.

The key goal or principle of sampling

To acquire a sample that is representative of the population. That is, conclusions made about the sample will also apply to the population.

Inferential Statistics

Using data collected from a small group to draw conclusions about a larger group & assessing relationships between variables

What is the probability of getting the results observed from this sample if there is actually no relationship between age and systolic blood pressure in the population? a. .017% b. 40.07% c. 4.33% d. .909%

a. .017%

What percent of the variation in systolic blood pressure can be explained by a person's age? a. 40.07% b. 63.3% c. 17.31% d. .017%

a. 40.07%

The following scenario is the basis for questions #4 - #6: A researcher is interested in learning more about the online purchasing habits of retired Americans. She surveys 50 randomly-selected retired Americans and finds that they spend an average of $243.65 per month on online purchases, with a standard deviation of $56.99. Based on that sample, she wishes to build a 95% confidence interval for the population mean. How many degrees of freedom are there? a. 49 (This refers to the degrees of freedom in the t-distribution - always sample size minus one) b. 95 c. 1 d. 5

a. 49 (This refers to the degrees of freedom in the t-distribution - always sample size minus one)

A sample of 200 accounting majors at AACSB-accredited schools is randomly selected and used to create a 90% confidence interval for the mean GPA of accounting majors at AACSB-accredited schools. The confidence interval is calculated as 2.98 to 3.42. What does the confidence interval mean? a. It is 90% likely that the true population mean is between 2.98 and 3.42 b. The population mean may be expected to be between 2.98 and 3.42 90% of the time c. 90% of accounting majors at AACSB-accredited schools have GPAs between 2.98 and 3.42 d. 90% of samples of 200 accounting majors will have means between 2.98 and 3.42

a. It is 90% likely that the true population mean is between 2.98 and 3.42

In hypothesis testing, a Type I error is committed when a. you reject a null hypothesis that is true b. you don't reject a null hypothesis that is true c. you reject a null hypothesis that is false d. you don't reject a null hypothesis that is false.

a. you reject a null hypothesis that is true

When taking samples of size 25 from a population with a mean of 72.5, a standard deviation of 10, what is the mean of the sampling distribution? a. We don't know; the mean has to be calculated for each sample b. 72.5 c. 10⁄√25 d. 10

b. 72.5

The standard deviation of the sample mean in a sampling distribution is called the ________. a. unbiased estimator b. standard error of the mean c. central limit theorem d. t-statistic

b. standard error of the mean

Sampling distributions describe the distribution of ________. a. parameters b. statistics c. both parameters and statistics d. neither parameters nor statistics

b. statistics

The following scenario is the basis for questions #4 - #6: A researcher is interested in learning more about the online purchasing habits of retired Americans. She surveys 50 randomly-selected retired Americans and finds that they spend an average of $243.65 per month on online purchases, with a standard deviation of $56.99. Based on that sample, she wishes to build a 95% confidence interval for the population mean. What distribution will she need to use to estimate the confidence interval? a. normal distribution b. t-distribution c. z-distribution (standard normal) d. poisson distribution e. binomial distribution

b. t-distribution

In a simple linear regression equation the slope (b1) represents a. predicted value of Y when X = 0 b. the estimated average change in Y per unit change in X c. the predicted value of Y d. variation around the line of regression

b. the estimated average change in Y per unit change in X

Which of the following statements about the sampling distribution of the sample mean is incorrect? a. The sampling distribution of the sample mean is approximately normal whenever the sample size is sufficiently large ( n ≥ 30 ) b. The sampling distribution of the sample mean is generated by repeatedly taking samples of size n and computing the sample means c. The standard deviation of the sampling distribution of the sample mean is equal to σ d. The mean of the sampling distribution of the sample mean is equal to μ

c. The standard deviation of the sampling distribution of the sample mean is equal to σ

A sample of 15 mid-career project managers is randomly selected and used to create a 95% confidence interval for the mean salary for mid-career project managers. The 95% confidence interval is calculated as $43,573 to $63,455. What is an important potential problem with this confidence interval? a. The interval looks too wide for a 95% confidence interval b. A 95% confidence interval can never be accurately constructed from a sample size of 15 c. Unless the population is approximately normally-distributed the sample size is too small to yield a valid confidence interval (Confidence intervals are only valid if the population is normally distributed, or if the sample size is greater than 30 or so. This is a sample size of 15, and salary data is usually right skewed) d. The central limit theorem suggests that the confidence interval is too wide

c. Unless the population is approximately normally-distributed the sample size is too small to yield a valid confidence interval (Confidence intervals are only valid if the population is normally distributed, or if the sample size is greater than 30 or so. This is a sample size of 15, and salary data is usually right skewed)

Based on his model, each additional year of age is associated with a systolic blood pressure increase of how much? a. 100.00251 b. 0.000174 c. 0.63297 d. 0.90910

d. 0.90910

The following scenario is the basis for questions #4 - #6: A researcher is interested in learning more about the online purchasing habits of retired Americans. She surveys 50 randomly-selected retired Americans and finds that they spend an average of $243.65 per month on online purchases, with a standard deviation of $56.99. Based on that sample, she wishes to build a 95% confidence interval for the population mean. What is alpha in this scenario? a. 10% b. 95% c. 20% d. 5%

d. 5%

Which of the following is NOT one of the assumptions that must be met for a linear regression analysis to be valid? a. The relationship between the independent and dependent variables is linear b. Independent residuals (errors) c. Residuals (errors) are normally distributed d. Residuals (errors) average to a positive number e. Residuals (errors) have equal variance

d. Residuals (errors) average to a positive number

How is a sample mean related to a population mean? a. The sample mean is equal to the population mean if the sample size is at least 30 b. The sample mean is usually less than the population mean c. The sample mean is usually more than the population mean d. The sample mean is an unbiased estimator of the population mean ("unbiased" means that the mean sample mean is equal to the population mean)

d. The sample mean is an unbiased estimator of the population mean ("unbiased" means that the mean sample mean is equal to the population mean)

In hypothesis testing, a Type II error is committed when a. you reject a null hypothesis that is true b. you don't reject a null hypothesis that is true c. you reject a null hypothesis that is false d. you don't reject a null hypothesis that is false.

d. you don't reject a null hypothesis that is false.

Know the reasons that the normal distribution is very important in statistics

o Many continuous variables common in business have distributions that are normal or very close to normal o The normal distribution may be used to approximate various discrete probability distributions (such as the binomial distribution) o It is possible to calculate probabilities related to the normal and standard normal distribution using either a z-table or Excel formulas o The normal distribution provides the basis for classical inferential statistics because of its relationship to the central limit theorem

Know the fundamental concept of hypothesis testing:

o We calculate the value of a test statistic from the sample. o Then, we estimate what the probability of getting a test statistic as far away or farther away from the mean of the sampling distribution as the test statistic we calculated would be if the null hypothesis were true. This is the p-value. o If the p-value is low (below 1 - our selected confidence level), we reject the null hypothesis (because we got a test statistic that would be really unlikely if the null hypothesis were true).

Know the properties of a normal distribution:

o bell-shaped and symmetrical (mean and median are equal) o Kurtosis = 0 o Skewness = 0 o Interquartile range is approximately 1.33 standard deviations o Overall range is approximately 6 standard deviations o Empirical rule percentages (described in Chapter 3 bullet points above)

The probability of a type 1 error is __.

α


Related study sets

UNFAIR TRADE, MARKETING & CLAIM SETTLEMENT PRACTICES

View Set

EMT Chapter 3- lifting and moving patients

View Set

MRKT 239 Exam 2 - Practice Quiz Questions

View Set

examples of monosaccharides, disaccharides, and polysaccharides

View Set

The Federal Reserve and Monetary Policy

View Set

Chapter 40: Drugs for Asthma and other Pulmonary Disorders

View Set

CHAPTER 9. Production and Operations Management

View Set