Statistics - Final 1

Ace your homework & exams now with Quizwiz!

How do you calculate a z-score?

(Value-Mean)/Standard Deviation

Heights of adult women have a mean of 63.6 in. and a standard deviation of 2.5 in. What does Chebyshev's Theorem say about the percentage of women with heights between 56.1 in. and 71.1 in.?

89%

Adult IQ scores have a bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the Empirical Rule to find the percentage of adults with scores between 70 and 130.

95%

A normal distribution of scores has a standard deviation of 10. Find the z-scores corresponding to each of the following values: - A score that is 20 points above the mean - A score that is 10 points below the mean

- (20-0)/10 = 20/10 = 2 - (10-20)/10 = -10/10 = -1

The Welcher Adult Intelligence Test Scale is composed of a number of subtests. On one subtest, the raw scores have a mean of 35 and a standard deviation of 6. Assuming these raw scores form a normal distribution: - What number represents the 65th percentile (what number separates the lower 65% of the distribution)? - What number represents the 90th percentile?

- .385=(x-35)/6 x= 37.31 - 1.285=(x-35)/6 x= 42.71

Interpret the following z-scores 3.2 -4 1

- 3.2 above the mean - 4 below the mean - 1 above the mean

What are important components of a good experimenter design?

- Random Assignment: To ensure that the experiment does not systematically favor one experimental condition over another - Blocking: Using extraneous variables to create groups (blocks) that are similar. All experimental conditions + treatments are then tried in each block. - Direct Control: Holding extraneous variables constant so that their effects are not confounded with those of the experimental conditions. - Replication: Ensuring that there is an adequate number of observations for each experimental condition.

List and define sources of bias

- Selection/Undercoverage: Tendency for samples to differ from population as a result of systematic exclusion of some part of the population. (Often caused by volunteers or self-selected individuals) - Measurement/Response: Method of observation tends to produce values that differ from the true value. (Can be caused by weird wording of a question, appearance or behavior of person asking questions) - Nonresponse: Data are not obtained from all individuals selected for inclusion in sample...so sample differ from population. (Often caused by responses not obtained from all individuals)

When assessing the goodness of fit of a regression line, it is important to consider several pieces on information. No single characteristic of data is sufficient for a good assessment. Consider the characteristics be;low. How does each contribute to an assessment of fit? (What would indicate that you have a "good" best-fit line?) The shape of the scatter plot The correlation coefficient The standard deviation of the residuals The coefficient of determination Residual Plot

- The shape of the scatter plot = Linear pattern - The correlation coefficient = R is close to -1 or 1 - The standard deviation of the residuals = Low standard deviation - The coefficient of determination = R^2 is close to 1 - Residual Plot = No pattern (scattered)

List the steps of the Data Analysis Process

1. Understand the nature of the problem 2. Decide what to measure and how to measure it 3. Collect Data 4. Summarize data + perform preliminary analysis 5. Formal data analysis 6. Interpretation of results

Define Observation

A study in which the investigator observes changes in characteristics

Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)

Barchart Categorical Comment on interesting features

The gender of a person buying guitar strings.

Categorical

The order in which students hand in their tests.

Categorical or Discrete

Determine whether the following variables are categorical, discrete numerical, or continuous numerical

Categorical: Yes or no Discrete: Number associated with it Continuous: Range

What are Chebyshev's Rule and the Empirical Rule used for?

Chebyshev's = Used on any graph Empirical = Used on normal distributions

Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)

Comparative stem and leaf

Amount of fluid dispensed by a drink machine at Runza.

Continuous numerical

The height of a 1 year old Panda bear.

Continuous numerical

Number of students in Statistics who are traveling over break.

Discrete

A report gave average math and verbal SAT scores for three language groups shown in the following tables. Average Math SAT English - 521 English and another language - 513 Other language - 521 Average Verbal SAT English - 519 English and another language - 486 Other language - 462 Construct a comparative bar chart for the average verbal and math scores for the three languages.

Do it, then check on your practice final

Consider the following set of data. 46, 49, 62, 41, 19, 77, 71, 30, 53, 53, 67, 43, 48, 28, 54. a. Create a 5-number summary of the set of data. b. Construct a modified box plot of the data. Then create a histogram of the same set of data. Not how the center, shape, and spread of the data shows up in each graphical display. c. Are there any mild or extreme outliers in the data set? How do you know?

Do it, then check on your practice final

Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)

Dotplot Numerical Center about 500 Shape is normal Spread from 350-700 Some possible outliers

Write a few sentences describing the differences and similarities between the three language groups as shown in the bar chart.

English - closest together Math was higher than the other two There's a big gap between the 2 on a language other than English

True/False A study is an observational study of the investigator observes the behavior of a response variable when one or more factors are manipulated.

False

True/False By definition, a simple random sample of size n is any sample that is selected in a manner to guarantee every individual in the population has an equal chance of selection.

False

True/False Clusters are non-overlapping subgroups of a population that have been identified as homogeneous.

False

True/False In a well-designed experiment, the factors are confounded whenever possible.

False

True/False Increasing sample size will generally eliminate bias in a sample.

False

True/False Response bias can occur when responses are not actually obtained from all individuals selected for inclusion in the sample.

False

True/False Stratified sampling is a sampling method that in no way involves simple random sampling.

False

Name; displays categorical or numerical data; describe the distributions (center, shape, spread, and any unusual features)

Histogram Numerical

In a study of male/female differences in carnivores, the height of the canine teeth in the lower jaws were measured. The data below are graphic representations of these data.

Look at practice final

What are the most common measures of central tendency and variability used in statistics?

Mean and Standard Deviation

Suppose an article was published on the amount of time Americans spent drinking coffee. One person was asked to not his starting and completion time for enjoying 30 cups of coffee. The resulting times (in minutes) were summarized using the mean, median, and standard deviation. Mean = 7.854 Median = 7.423 s = 2.129

On average, he spends almost 8 minutes on each cup of coffee. Each day typically varies from this value by approximately 2 minutes.

Experiment

One or two variables are manipulated - Look at cause + effect

Define a simple random sample

Random sample of size n where each thing has an equal chance of being selected

A friend of yours, who is not taking statistics, wonders why it is that anyone would choose to take a sample. "Obviously," she says, "you would get better information from a census." In a short paragraph, explain why it is the statistician take samples rather than taking a census.

Samples are better than a census bc. - Measurements that require destroying the item - Difficult to find entire population - Limited resources Mention that it saves time, money, and can be just as accurate

Is the histogram symmetrical, skewed to the right, or skewed to the left.

Skewed to the left

Define variability

The extent to which data points differ from each other

Response Variable

The variable related to explanatory variable

Explanatory Variable

The variables that have values that are controlled by the experimenter (factors)

True/False A placebo is identical in appearance to the treatment of interest, but contains no active ingredients.

True

True/False Blocking is a technique that can be used to filter out the effects of extraneous factors.

True

True/False Selection bias can occur if volunteers only are used in a study.

True

By definition, strata are groups of population units that a. form well defined subpopulations. b. are selected for the study from the sampling frame. c. are selected for the study by a random sampling process. d. are typically heterogeneous. e. respond in characteristic ways to the explanatory variable.

a. form well defined subpopulations

When constructing a modified box plot, one must find the upper and lower mild outlier cutoffs. For these data, the upper mild outlier cutoff would be: a. 57.0 b. 58.5 c. 60.0 d. 61.5 e. 63.0

b. 58.5

Considering the graphic displays, the best description of these data would be: a. Skewed left b. Skewed right c. Symmetric d. Bimodal e. Light tailed

b. Skewed to the right

Approximately what percentage of the variation in umbilical lead concentrations can be explained by the linear model? a. 67.3% b. 36.22% c. 45.3% d. 1.49% e. 8.80%

c. 45.3%

The median of the lower canine tooth heights is: a. 10 b. 11 c. 12 d. 13 e. 14

d. 13

Which of the following indicates that an association between x and y is positive? a. A positive coefficient of determination b. A positive standard deviation about the least squares line c. A positive intercept of the least squares line d. A positve Pearson's correlation coefficient e. A positive residual sum of squares

d. A positive Pearson's correlation coefficient


Related study sets

Social Studies Lesson 32 Greek Philosophy

View Set

A & P - BIT OF THE BASICS (OVER-ALL LEARNINGS)

View Set

ATI RN Mental Health Online Practice 2023 B

View Set

Cuento 1-A: MEANING:Pregunta/Respuesta=Question/Response

View Set