STATS Final
Variables
_________ are the characteristics of the individuals of the population being studied.
Interpretation of a Confidence Interval
A (1-alpha)* 100% confidence interval indicates that (1-alpha)*100% of all simple random samples of size n from the population whose parameter is unknown will result in an interval that contains the parameter
designed experiment
A ___________________ allows the researcher to claim causation between an explanatory variable and a response variable
Trial
A binomial experiment is performed a fixed number of times. What is each repetition of the experiment called?
Normally distributed; normal probability distribution
A continuous random variable is ___ or has a ___, if its relative frequency histogram has the shape of a normal curve.
a finite number of
A discrete random variable has _____values.
Binomial Probability Distribution
A discrete probability distribution that describes probabilities for experiments in which there are two mutually exclusive (disjoint) outcomies
Random Variable
A numerical measure of the outcome of a probability experiment; so its value is determined by chance
Population Arithmetic Mean
A parameter that is computed using data from all the individuals in a population
Subjective Probability
A probability that is determined based on personal judgement
1. The probability of two or more successes in any sufficiently small subinterval is 0. For example, the fixed interval might be any time between 0 and 5 minutes. A subinterval could be any time between 1 and 2 seconds. 2. The probability of success is the same for any two intervals of equal length. 3. The number of successes in any interval is independent of the number of successes in any other interval provided the intervals are not overlapping
A random variable X, the number of successes in a fixed interval, follows a Poisson process provided the following conditions are met.
Sample Arithmetic Mean
A statistic that is computed using data from individuals in a sample
Simulation
A technique used to recreate a random event
Kth percentile
A value such that k percent of the observations are less than or equal to the value
Individual
A(n) _________ is a person or object that is a member of the population being studied.
Relative Frequency Distribution
lists each category of data together with the relative frequency.
Unusual Event
An event that has a low probability of occurring
Empirical Method
According to a sports analyst, the probability that a football team will win the next game is 0.31.
68%; 95%; 99.7%
According to the Empirical Rule, if a distribution is bell-shaped, then approximately _____ of the data will lie within 1 standard deviation of the mean; approximately _____ of the data will lie within 2 standard deviations of the mean; approximately ____ of the data will lie within 3 standard deviations of the mean
Event
Any collection of outcomes from a probability experiment
Law of Large Numbers
As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome.
0.95
As the number of samples increases, the proportion of 95% confidence intervals that include the population proportion approaches ______.
The standard error of the mean decreases.
As the sample size n increases, what happens to the standard error of the mean?
No; since the variance is based on the squared deviations from the mean and N, it cannot be negative.
Can the variance of a data set ever be negative? Explain.
The results might differ because there is always a chance that the sample surveyed is unlike the population.
Contact a local hospital and ask them the percentage of the population that is blood type O. Why might the results differ?
Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
Define Statistics
Ordinal
Determine the level of measurement of the variable. Position of runners in a race
Ratio
Determine the level of measurement of the variable. Weight of a child
Interval
Determine the level of measurement of the variable. Years of election
The variable is continuous because it is not countable.
Determine whether the quantitative variable is discrete or continuous. Volume of a sound
The value is a statistic because the respondents who were full-time college students aged 18 to 22 are a sample.
Determine whether the underlined value is a parameter or a statistic. In a national survey on substance abuse, 66.4% of respondents who were full-time college students aged 18 to 22 reported using alcohol within the past month.
The value is a statistic because the 1,502 adults 18 years of age or older are a sample
Determine whether the underlined value is a parameter or a statistic. Telephone interview of 1,502 adults 18 years of age or older found that only 69% could identify the current vice- president
Quantitative; it is a numerical value
Determine whether the variable is qualitative or quantitative. Grams of carbohydrates in a donut
Class width= (largest value-smallest value)/ number of classes
Determining Class Width Formula
A population is the entire group that is being studied while a sample is a subset of the population that is being studied.
Explain the difference between a population and a sample.
The word "average" is ambiguous and can refer to any measure of center. It is better to use the specific measure of center you intend (mean, median, or mode).
Explain why it is misleading to use the term "average" to describe your typical bowling score.
mean= median
Fo a distribution that is symmetric...
Mean < Median
For a distribution that is skewed left...
Mean > median
For a distribution that is skewed right...
longer than
For a distribution that is skewed left, the left whisker is _____ the right whisker.
Left of Center
For a distribution that is skewed right, the median is _____ of the box.
the same length
For a distribution that is symmetric, the left whisker is ______ as the right whisker.
The General Addition Rule
For any 2 events, E and F, P (E or F) = P(E) + P(F) - P(E and F)
10
For the shape of the distribution of the sample proportion to be approximately normal, it is required that np(1−p)≥______.
A random sample of 100 adults aged 18 years or older were given a list of ice cream flavors and were asked to list which flavors they liked. The responses are given below. Chocolate 45 Strawberry 42 Vanilla 23 Mocha 19 Which of the following graphs would be most appropriate for visually displaying the results?
Frequency Bar Graph
P(E) + P(F)
If E and F are disjoint events then, P(E or F)
Inferential; makes a prediction
If a polling organization claimed that the results of the survey indicate that 9% of adults in the country believe that the action is acceptable in certain situations, would you say this statement is descriptive or inferential? Why? The statement is _____ because it _____.,
Normal
If a random variable X is normally distributed, what will be the shape of the distribution of the sample mean?
1.96
If a 95% confidence interval results in a sample proportion that does not include the population proportion, then the sample proportion is more than ______ standard errors from the population proportion.
Independent
If n<0.005N, treat the event as ___.
Less than
If the normality requirement is not satisfied (that is, np(1−p) is not at least 10), then a 95% confidence interval about the population proportion will include the population proportion in ________ 95% of the intervals.
1
In a relative frequency distribution, what should the relative frequencies add up to?
Interquartile Range (IQR)
In a typical box plot, the length of the box indicates which measure of spread?
Experiment
In probability, a(n) ________ is any process that can be repeated in which the results are uncertain.
False. In statistics, results are not reported with 100% certainty. Because statistical studies draw on samples, and because there is variation within groups, results cannot be reported with 100% certainty.
In statistics, results are always reported with 100% certainty. Choose the correct answer below.
Empirical Method
On the basis of a survey of 1000 families with eight children, the probability of a family having eight girls is 0.0064.
Empirical Method
On the basis of clinical trials, the probability of efficacy of a new drug is 0.82.
Multiplication Rule for Independent Events
P(E&F)= P(E) * P(F)
Complement Rule
P(Ec)= 1- P(E)
Conditional Probability
P(F/E) is read the probability of event F given event E. Event F occurs, given that event E has occurred.
Conditional Probability Rule
P(F/E)= P(E&F)/P(E)
Probability Distribution
Provides the possible values of the random variable and their corresponding probabilities.
B, C, A
Put the following in order for the most area in the tails of the distribution. (a) Standard Normal Distribution (b) Student's t-Distribution with 25 degrees of freedom. (c) Student's t-Distribution with 45 degrees of freedom.
Practical Significance
Refers to the idea that although small differences between the statistic and parameter stated in the null hypothesis are statistically significant, the difference may not be large enough to cause concern or be considered important.
Random Process
Represents scenarios where the outcome of any particular trial of an experiment is unknown, but the proportion (or relative frequency) a particular outcome is observed approaches a specific value.
False. Statistical studies are concerned with both describing the variability in the data and understanding the sources of variability in data. Understanding the sources allows researchers to control it and reach better conclusions.
Statistical studies are not concerned with understanding the sources of variability in data, only with describing the variability in the data. Choose the correct answer below.
Since the sample size is large enough , the population distribution does not need to be normal
Suppose a simple random sample of size n=47 is obtained from a population with μ=62 and σ=15. (a) What must be true regarding the distribution of the population in order to use the normal model to compute probabilities regarding the sample mean? Assuming the normal model can be used, describe the sampling distribution x.
0.95
Suppose the proportion of a population that has a certain characteristic is 0.95. The mean of the sampling distribution of p from this population is μp=______.
True. Statistical studies typically look at samples rather than entire populations. Since each study is likely to draw different samples, it is quite possible that each study ends up with different results, due to variability in the data.
Suppose three different individuals conduct the same statistical study, such as estimating the average commute time of students at a college. It is possible that all three studies end up with different results. Choose the correct answer below.
Null
The ___ hypothesis, denoted H0, is a statement to be tested, and is a statement of no change, no effect, or no difference.
True
True or False: @ events E and F are independent if P(E/F)= P(E)
1. The area corresponds to the proportion of the population with the characteristic. Your answer is correct. 2. The area corresponds to the probability that a randomly selected individual from the population has the characteristic.
The area under a normal curve corresponding to a certain characteristic of the normal random variable may be interpreted in any of the following ways.
1/2
The area under the normal curve to the right of μ equals _______.
Sample Space
The collection of all possible outcomes for that experiment
1. Identify the research objective 2. Collect the Data needed to Answer the Research Question(s) 3. Describe the Data 4. Perform Inference
The methods of statistics follows a process. Place the processes in the correct order.
Mode
The observation that occurs most frequently in the data set
Classical Method
The probability of having eight girls in an eight-child family is 0.00390625.
Level of Significance
The probability of making a Type I error
Variance
The square of the standard deviation
Zero
The sum of the deviations about the mean always equals...
Point Estimate
The value of a statistic that estimates the value of a parameter
Median
The value that lies in the middle of the data when arranged in ascending order.
Random
The word _____ suggests an unpredictable result or outcome.
symmetric about 0.
The Student's t-distribution is ___
Sample Proportion; x/n
The _____ _____, denoted p, is given by the formula p=_____, where x is the number of individuals with a specified characteristic in a sample of n individuals.
Lower; upper
The ______ class limit is the smallest value within the class and the ______ class limit is the largest value within the class.
Level of Confidence; (1-alpha)*100
The _______ represents the expected proportion of intervals that will contain the parameter if a large number of different samples of size n is obtained. It is denoted _______.
False
True or False: A 95% confidence interval may be interpreted by saying there is a 95% probability that the interval includes the unknown parameter.
False
True or False: The standard deviation can be negative.
False
True or False: The standard deviation is a resistant measure of spread.
True. A relative frequency histogram will have a different scale on the y-axis but the same shape as a regular histogram.
True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.
Independent
Two events E and F are ________ if the occurrence of event E in a probability experiment does not affect the probability of event F.
All the observation are the same value
What can be said about a set of data with a standard deviation of 0?
An event is unusual if it has a low probability of occurring. The choice of a cutoff should consider the context of the problem.
What does it mean for an event to be unusual? Why should the cutoff for identifying unusual events not always be 0.05?
A prospective study collects the data over time.
What does it mean when an observational study is prospective?
A retrospective study requires that individuals look back in time or require the researcher to look at existing records.
What does it mean when an observational study is retrospective?
The graph of the normal curve slides right.
What happens to the graph of the normal curve as the mean increases?
The graph of the normal curve compresses and becomes steeper.
What happens to the graph of the normal curve as the standard deviation decreases?
Case-control studies are observational studies that are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records.
What is a case-control study?
A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study.
What is a confounding variable?
A designed experiment is when a researcher assigns individuals to a certain group, intentionally changing the value of an explanatory variable, and then recording the value of the response variable for each group.
What is a designed experiment?
A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study.
What is a lurking variable?
An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables.
What is an observational study?
Make an assumption about reality, and collect sample evidence to determine whether it contradicts the assumption.
What is at the "heart" of hypothesis testing in statistics?
Cross-sectional studies are observational studies that collect information about individuals at a specific point in time or over a very short period of time.
What is a cross-sectional study?
Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study.
What is meant by confounding?
0.95
When constructing 95% confidence intervals for the mean when the parent population is right skewed and the sample size is small, the proportion of intervals that include the population mean approaches _____ as the sample size, n, increases.
Below
When constructing 95% confidence intervals for the mean when the parent population is right skewed and the sample size is small, the proportion of intervals that include the population mean is (above, below, equal to) 0.95.
2- tailed Hypothesis Testing Using Confidence Intervals
When testing H0: p=p0 versus H1: p≠p0, if a (1−α)⋅100% confidence interval contains p0, we do not reject the null hypothesis. However, if the confidence interval does not contain p0,we conclude that p≠p0 at the level of significance, α.
Neither study is always the superior to the other. Both have advantages and disadvantages that depend on the situation.
Which is the superior observational study?
Min. Value, Q1, Median, Q3, Max. Value
Which measures are used in the 5- number summary?
1. The area under the normal curve to the right of the mean is 0.5. 2. The graph of a normal curve is symmetric. 3. The high point is located at the value of the mean.
Which of the following are properties of the normal curve?
If the differences were not squared, then the sum of all deviations from the mean would always be zero since the positive deviations are balanced by the negative deviations.
Why does the formula for calculating the sample variance, s2=∑x−x2n−1, involve squaring the difference between each value and the mean?
If the formula involved division by n, the sample variance would be biased and consistently underestimate the population variance
Why does the formula for calculating the sample variance, s2=∑x−x2n−1, involve division by n−1 instead of n?
Use the results of the sample to conjecture the percentage of the population that has type O blood. Is this an example of descriptive or inferential statistics? Select the correct choice below and fill in the answer box to complete your choice.
___% inferential
Descriptive; Inferential
____________ statistics consists of organizing and summarizing informationcollected, while _________________ statistics uses methods that generalize results obtained from a sample to the population and measure the reliability of the results.
Pareto Chart
a bar graph whose bars are drawn in decreasing order of frequency or relative frequency.
Pie Chart
a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category.
Parameter
A(n) ___________ is a numerical summary of a population.
Classes
are the categories by which data are grouped
Histogram
constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch each other.
Resistant
if the observations that are extreme relative to the data do not affect its value substantially
Arithmetic Mean
is computed by adding all the values of the variable in the data set and dividing by the number of observations.
Bar Graph
is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency.
Frequency Distribution
lists each category of data and the number of occurrences for each category of data.
Z- Score
represents the distance that a data value is from the mean in terms of the number of standard deviations. z= (value-mean)/ std. dev.
Mean of a Discrete Random Variable
sum of x*p(x)
Class Width
the difference between consecutive lower class limits.
Range
the difference between the largest and smallest data value.
Relative Frequency
the proportion (or percent) of observations within a category and is found using the formula
Interquartile Range (IQR)
the range of the middle 50% of the observations in a data set. IQR= Q3-Q1
Population Standard Deviation
the square root of the sum of squared deviations about the population mean divided by the number of observations in the population.
Sample Standard Deviation
the square root of the sum of squared deviations about the sample mean divided by n-1
Statistic
A(n) _________ is a numerical summary of a sample.