Stats Midterm
After a tsunami, a disaster area is divided into 200 equal grids. Forty of the grids are selected, and every occupied household in the grid is interviewed to help focus relief efforts on what residents require the most.
Cluster sampling, since the area is divided into grids, and some of those grids are selected and everyone in those grids is interviewed. Bias: certain grids may have been much more severely damaged than others. The grids that are selected may not be representative in terms of damage. Certain grids may have been more severely damaged, fewer occupied households.
Questioning students as they leave an athletic facility, a researcher asks 357 students about their drinking habits.
Convenience sampling, because students are chosen due to convenience of location. Bias: personal nature, students are easy to get, members may not be representative of the population.
Mode
Data that occurs with the greatest frequency
Experiment vs. Observational study
Experiment: treatment is applied to part of a population and responses are observed. Observation study: researcher measures characteristics of interest of a part of a population but does not change existing conditions
Stratified sample
Guarantees that members of each group within a population will be sampled.
After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?
If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1.
What are some benefits of using graphs of frequency distributions?
It can be easier to identify patterns of a data set by looking at a graph of the frequency distribution
Skewed data
Median is the best source of measurement
What are some benefits of representing data sets using frequency distributions?
Organizing the data into a frequency distribution can make patterns within the data more evident.
Registration numbers for a marathon
Qualitative because registration numbers are attributes or labels.
Species of bird at a feeder
Qualitative, because species are attributes or labels.
Time to finish a marathon
Quantitative because Time is found by measuring or counting
Price of a new computer
Quantitative because price is found by measuring or counting
Times to run 100 meters
Quantitative, because times are numerical measurements.
Replication
Repetition of an experiment under the same or similar conditions. Replication is important because it enhances the validity of the results.
The heights of Half of the students in a class
Sample, because the collection of heights of Half of the students is a subset of all students in the class.
The number of pets for 20 households in a town with 300 households
Sample, because the collection of the number of pets for 20 households is a subset of all households in the town.
In a poll of a sample of 12,000 adults in a certain city, 12% said they left for work before 6am.
Statistic, because the data set of a sample of 12,000 adults in a city is a sample.
Explain how the interquartile range of a data set can be used to identify outliers.
The interquartile range (IQR) of a data set can be used to identify outliers because data values that are greater than Q 3 +1.5 (IQR) or less than Q 1-1.5 (IQR)
A survey of 2208 adults in a country found that 76% think that militant terrorists are a major threat to the well-being of their country.
The number is a sample statistic because it describes the people in a sample, which is a subset of all of the people in the country.
Explain the relationship between variance and standard deviation. Can either of these measures be negative?
The standard deviation is the positive square root of the variance. The standard deviation and variance can never be negative. Squared deviations can never be negative
A sample statistic will not change from sample to sample
The statement is false. A sample statistic can change from sample to sample
What are the two main branches of statistics?
The two main branches of statistics are descriptive statistics and inferential statistics.
Median
The value that lies in the middle of the data when the data set is ordered.
An outlier is any number above Q3 or below Q1.
This statement is false. A true statement is "An outlier is any number above Q3 + 1.5(IQR) or below Q1- 1.5(IQR) are considered outliers."
Systematic sample
To order a population in some way and then select members of the population at regular intervals.
The mean is the measure of central tendency most likely to be affected by an outlier.
True
Given a data set, how do you know whether to calculate sigmaσ or s?
When given a data set, one would have to determine if it represented the population or if it was a sample taken from the population. If the data are a population, then σ is calculated. If the data are a sample, then s is calculated
What is the difference between a parameter and a statistic?
A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic
How is a sample related to a population?
A sample is a subset of a population.
How is a stem-and-leaf plot similar to a dot plot?
Both plots show how data are distributed, can be used to ID unusual data values, determine specific data entries.
Census vs. sampling
Census includes the entire population. Sampling includes only part of the population
A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females 30-35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six months, the subjects' symptoms are studied and compared. Answer parts (a) through (c) below
a) Experimental units: 30-35 year old females being given the treatment. The treatment is the new allergy drug. b) There may be a bias on the part of the researcher if the researcher knows which patients were given the real drug. c) The study would be a double blind study if both the researcher and the patient did not know which patient received the real drug or the placebo.
In 1965, researchers used random digit dialing to call 1400 people and ask what obstacles kept them from attending sporting events
a) Simple random sampling was used, since each number had an equal chance of being dialed, so all samples of 1400 phone numbers had an equal chance of being selected b) Individuals may not have been available, they may have refused to participate, telephone sampling only includes people with telephones.
In a poll, 1,001 women in a country were asked whether they favor or oppose the use of "federal tax dollars to fund medical research using stem cells obtained from human embryos." Among the respondents, 4646% said that they were in favor. Identify the population and the sample
Population: all women in the country Sample: The 1,001 Women selected
Describe the relationship between quartiles and percentiles
Quartiles are special cases of percentiles. Q1 is the 25th percentile, Q2 is the 50th percentile, and Q3 is the 75th percentile.
The average score for a class of 28 students taking a calculus midterm exam was 72%
Parameter
In a study of all 2890 students at a college, it is found that 40% own a vehicle.
Parameter, because the value is a numerical measurement describing a characteristic of a population
The age of each resident in an apartment building
Population, because it is a collection of ages for all people in the apartment building
A polling organization contacts 2750 adult women who are 30 to 70 years of age and live in the United States and asks whether or not they had received a mammogram during the past year.
Population: Adult women who are 30-70 years of age and live in the US Sample: The 2750 women who are 30-70 years of age and live in the US
Advantage of using a stem-and-leaf plot instead of a histogram
Stem and leaf plots contain data values where histograms do not. Disadvantages: Histograms easily organize data of all sizes where stem-and-leaf plots do not.
Every 30th person entering a library is asked to choose his or her favorite author from a list of five different authors that includes a description of each.
Systematic sampling is used, because every 30th person is selected. Bias: The wording may direct someone to a particular author, If there is a regular pattern to people entering, the sample may not be representative.