Statistics - Final 1
How do you calculate a z-score?
(Value-Mean)/Standard Deviation
Heights of adult women have a mean of 63.6 in. and a standard deviation of 2.5 in. What does Chebyshev's Theorem say about the percentage of women with heights between 56.1 in. and 71.1 in.?
89%
Adult IQ scores have a bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the Empirical Rule to find the percentage of adults with scores between 70 and 130.
95%
A normal distribution of scores has a standard deviation of 10. Find the z-scores corresponding to each of the following values: - A score that is 20 points above the mean - A score that is 10 points below the mean
- (20-0)/10 = 20/10 = 2 - (10-20)/10 = -10/10 = -1
The Welcher Adult Intelligence Test Scale is composed of a number of subtests. On one subtest, the raw scores have a mean of 35 and a standard deviation of 6. Assuming these raw scores form a normal distribution: - What number represents the 65th percentile (what number separates the lower 65% of the distribution)? - What number represents the 90th percentile?
- .385=(x-35)/6 x= 37.31 - 1.285=(x-35)/6 x= 42.71
Interpret the following z-scores 3.2 -4 1
- 3.2 above the mean - 4 below the mean - 1 above the mean
What are important components of a good experimenter design?
- Random Assignment: To ensure that the experiment does not systematically favor one experimental condition over another - Blocking: Using extraneous variables to create groups (blocks) that are similar. All experimental conditions + treatments are then tried in each block. - Direct Control: Holding extraneous variables constant so that their effects are not confounded with those of the experimental conditions. - Replication: Ensuring that there is an adequate number of observations for each experimental condition.
List and define sources of bias
- Selection/Undercoverage: Tendency for samples to differ from population as a result of systematic exclusion of some part of the population. (Often caused by volunteers or self-selected individuals) - Measurement/Response: Method of observation tends to produce values that differ from the true value. (Can be caused by weird wording of a question, appearance or behavior of person asking questions) - Nonresponse: Data are not obtained from all individuals selected for inclusion in sample...so sample differ from population. (Often caused by responses not obtained from all individuals)
When assessing the goodness of fit of a regression line, it is important to consider several pieces on information. No single characteristic of data is sufficient for a good assessment. Consider the characteristics be;low. How does each contribute to an assessment of fit? (What would indicate that you have a "good" best-fit line?) The shape of the scatter plot The correlation coefficient The standard deviation of the residuals The coefficient of determination Residual Plot
- The shape of the scatter plot = Linear pattern - The correlation coefficient = R is close to -1 or 1 - The standard deviation of the residuals = Low standard deviation - The coefficient of determination = R^2 is close to 1 - Residual Plot = No pattern (scattered)
List the steps of the Data Analysis Process
1. Understand the nature of the problem 2. Decide what to measure and how to measure it 3. Collect Data 4. Summarize data + perform preliminary analysis 5. Formal data analysis 6. Interpretation of results
Define Observation
A study in which the investigator observes changes in characteristics
Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)
Barchart Categorical Comment on interesting features
The gender of a person buying guitar strings.
Categorical
The order in which students hand in their tests.
Categorical or Discrete
Determine whether the following variables are categorical, discrete numerical, or continuous numerical
Categorical: Yes or no Discrete: Number associated with it Continuous: Range
What are Chebyshev's Rule and the Empirical Rule used for?
Chebyshev's = Used on any graph Empirical = Used on normal distributions
Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)
Comparative stem and leaf
Amount of fluid dispensed by a drink machine at Runza.
Continuous numerical
The height of a 1 year old Panda bear.
Continuous numerical
Number of students in Statistics who are traveling over break.
Discrete
A report gave average math and verbal SAT scores for three language groups shown in the following tables. Average Math SAT English - 521 English and another language - 513 Other language - 521 Average Verbal SAT English - 519 English and another language - 486 Other language - 462 Construct a comparative bar chart for the average verbal and math scores for the three languages.
Do it, then check on your practice final
Consider the following set of data. 46, 49, 62, 41, 19, 77, 71, 30, 53, 53, 67, 43, 48, 28, 54. a. Create a 5-number summary of the set of data. b. Construct a modified box plot of the data. Then create a histogram of the same set of data. Not how the center, shape, and spread of the data shows up in each graphical display. c. Are there any mild or extreme outliers in the data set? How do you know?
Do it, then check on your practice final
Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)
Dotplot Numerical Center about 500 Shape is normal Spread from 350-700 Some possible outliers
Write a few sentences describing the differences and similarities between the three language groups as shown in the bar chart.
English - closest together Math was higher than the other two There's a big gap between the 2 on a language other than English
True/False A study is an observational study of the investigator observes the behavior of a response variable when one or more factors are manipulated.
False
True/False By definition, a simple random sample of size n is any sample that is selected in a manner to guarantee every individual in the population has an equal chance of selection.
False
True/False Clusters are non-overlapping subgroups of a population that have been identified as homogeneous.
False
True/False In a well-designed experiment, the factors are confounded whenever possible.
False
True/False Increasing sample size will generally eliminate bias in a sample.
False
True/False Response bias can occur when responses are not actually obtained from all individuals selected for inclusion in the sample.
False
True/False Stratified sampling is a sampling method that in no way involves simple random sampling.
False
Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)
Histogram Numerical
In a study of male/female differences in carnivores, the height of the canine teeth in the lower jaws were measured. The data below are graphic representations of these data.
Look at practice final
What are the most common measures of central tendency and variability used in statistics?
Mean and Standard Deviation
Suppose an article was published on the amount of time Americans spent drinking coffee. One person was asked to not his starting and completion time for enjoying 30 cups of coffee. The resulting times (in minutes) were summarized using the mean, median, and standard deviation. Mean = 7.854 Median = 7.423 s = 2.129
On average, he spends almost 8 minutes on each cup of coffee. Each day typically varies from this value by approximately 2 minutes.
Experiment
One or two variables are manipulated - Look at cause + effect
Define a simple random sample
Random sample of size n where each thing has an equal chance of being selected
A friend of yours, who is not taking statistics, wonders why it is that anyone would choose to take a sample. "Obviously," she says, "you would get better information from a census." In a short paragraph, explain why it is the statistician take samples rather than taking a census.
Samples are better than a census bc. - Measurements that require destroying the item - Difficult to find entire population - Limited resources Mention that it saves time, money, and can be just as accurate
Is the histogram symmetrical, skewed to the right, or skewed to the left.
Skewed to the left
Define variability
The extent to which data points differ from each other
Response Variable
The variable related to explanatory variable
Explanatory Variable
The variables that have values that are controlled by the experimenter (factors)
True/False A placebo is identical in appearance to the treatment of interest, but contains no active ingredients.
True
True/False Blocking is a technique that can be used to filter out the effects of extraneous factors.
True
True/False Selection bias can occur if volunteers only are used in a study.
True
By definition, strata are groups of population units that a. form well defined subpopulations. b. are selected for the study from the sampling frame. c. are selected for the study by a random sampling process. d. are typically heterogeneous. e. respond in characteristic ways to the explanatory variable.
a. form well defined subpopulations
When constructing a modified box plot, one must find the upper and lower mild outlier cutoffs. For these data, the upper mild outlier cutoff would be: a. 57.0 b. 58.5 c. 60.0 d. 61.5 e. 63.0
b. 58.5
Considering the graphic displays, the best description of these data would be: a. Skewed left b. Skewed right c. Symmetric d. Bimodal e. Light tailed
b. Skewed to the right
Approximately what percentage of the variation in umbilical lead concentrations can be explained by the linear model? a. 67.3% b. 36.22% c. 45.3% d. 1.49% e. 8.80%
c. 45.3%
The median of the lower canine tooth heights is: a. 10 b. 11 c. 12 d. 13 e. 14
d. 13
Which of the following indicates that an association between x and y is positive? a. A positive coefficient of determination b. A positive standard deviation about the least squares line c. A positive intercept of the least squares line d. A positve Pearson's correlation coefficient e. A positive residual sum of squares
d. A positive Pearson's correlation coefficient