Chapter 1 Introduction to Statistics
Which of the following describe discrete data? a. The numbers of people surveyed in each of the next several Gallup polls. b. The exact heights of individuals in a sample of several statistics students. c. The number of Super Bowl football games that must be played before one of the teams scores exactly 75 points
(A) A and C
Voluntary Response Sample:What is a voluntary response sample, and why is such a sample generally not suitable for a statistical study?
(A) A voluntary response sample is a sample in which the subjects themselves decide whether to be included in the study. A voluntary response sample is generally not suitable for a statistical study because the sample may have a bias resulting from participation by those with a special interest in the topic being studied.
Correlation and Causation: What is meant by the statement that "correlation does not imply causation"?
(A) Correlation is an association between two variables. For example, a statistical study may justify the statement that there is a correlation between the number of cigarettes smoked and pulse rate, but it would not justify a statement that the number of cigarettes smoked causes a person's pulse rate to change. Statements about causality can be justified by physical evidence, not by statistical analysis.
Source of data: In concluding a statistical study, why is it important to consider the source of the data?
(A) It's important to determine if the source of the data would be impacted favorably by the conclusions of the data. If so the conclusion is likely bias and invalid.
On the day of the last presidential election, ABC News organized an exit poll in which specific polling stations were randomly selected and all voters were surveyed as they left the premises.
Cluster
Currently the house of representatives has 435 members.
Discrete
Body temperatures ( in degrees Fahrenheit) listed in Data set 3 of Appendix B
Interval
A study was conducted of all 2223 passengers aboard the Titanic when it sank.
Parameter
Satellites are used to collapse sample data used to estimate deforestation rates. The Forest resources assessment of the UN food and agricultural Organization uses a method of selecting a sample of a 10 km wide Square at every 1 ° intersection of latitude and longitude.
Systemic
At a national conference of the American Appliances Association, a market researcher plans to conduct a survey of conference attendees. She uses the list of attendee names and selects every 20th name. Is the result a simple random sample? Why or why not? In general, what is a simple random sample.
(A) No it's not a simple random sample because it is a systematic sampling method. A simple sampling method a sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.
Falsifying Data: A researcher at the Sloan-Kettering Cancer Research Center was once criticized for falsifying data. Among his data were figures obtained from 6 groups of mice, with 20 individual mice in each group. The following values were given for the percentage of successes in each group: 53%, 58%, 63%, 46%, 48%, 67%. What's wrong with those values.
(A) The actual numbers for each percentage listed are not whole numbers. For example, 53/100X20=10.6
Statistical Significance versus Practical significance: What is the difference between statistical significance and practical significance? Can a statistical study have statistical significance, but not practical significance?
(A)Statistical significance is indicated when methods of statistics are used to reach a conclusion that some treatment or finding is effective, but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical. Yes, it is possible for a study to have statistical significance but not practical significance.
The associated press provided an article with a headline stating that ATV accidents killed 704 people in the last year. The article noted that this is a new record high and compares it to the 617 ATV deaths the year before that. Other data about the frequency or injuries were included. What important values was not included? Why is it important?
(A)Without knowing anything about the number of ATV's in use, or the number of ATV drivers, or the amount of ATV usage, the number of 740 fatal accidents has no context. Some information should be given so that the reader can understand the rate of ATV fatalities.
What is a parameter, and what is a statistic?
A parameter is a numerical measurement describing some characteristic of a population. A statistic is a numerical measurement describing some characteristic of a sample.
Determine if results have statistical and practical significance: In a study of the Gender Aide method of gender selection, 1000 user of the method gave birth to 540 boys and 460 girls. There is about a 1% chance that such extreme results would occur if the method had no effect.
Because there is a 1% chance that such results would occur if the method had no effect, is appears to have statistical significance, but it does not appear to have practical significance.
When making random guesses for difficult multiple-choice test questions with possible answers of a, b, c, d, and e, we expect to get about 20% of the answers correct. The Ashton Prep Program claims to have developed a better method of guessing. In a test of that program, guesses were made for 100 answers, and 23 were found to be correct. There is a 23% chance of getting such results if the program has no effect.
Because there is a 23% chance of getting such results if the program has no effect, it does not appear to have statistical significance. Because the success rate of 23% is not that much better than the 20% rate, it does not appear to have practical significance.
Most people have IQ scores between 70 and 130. For $32, you can purchase a computer program from Highiqpro.com that is claimed to increase your IQ score by 10 to 20 points. The program claims to be "the only proven IQ increasing software in the brain training market," but the author of your text could find no data supporting that claim, so let's suppose that these results were obtained: In a study of 12 subjects using the program, the average increase in IQ is 3 IQ points. There is a 25% chance of getting such results if the program has no effect.
Because there is a 25% chance of getting such results if the program has no effect, it does not appear to have statistical significance. Because the average increase in IQ from the study was only 3 IQ points, while the program claimed to improve your IQ by 10 to 20 points, there does not appear to have practical significance.
Determine if results have statistical and practical significance: In a study of the Marisa Waite diet, four subjects lost an average of 45 pounds. It is found that there is about a 30% chance of getting such results with a diet that has no effect.
Because there is a 30% chance of getting such results with a diet that has no effect, it does not appear to have a statistical significance, but the average loss of 45 pounds does appear to have practical significance.
For the study described above, blinding will be used. What is blinding, and why was it important in this experiment?
Blinding is a method whereby a subject (or a person who evaluates results) in an experiment does not know whether the subject is treated with the DNA vaccine or the adenoviral vector vaccine. It is important to use blinding so the results are not somehow distorted by knowledge of the particular treatment used.
You want to conduct a study to determine whether fruit consumption leads to reduced weight. Why would an experiment be better than an observational study?
By doing an experiment, the chance of having the results affected by some variable that is not included in the study is minimized.
The New York state Department of Transportation evaluated the quality of the New York State throughway by testing core samples collected at regular intervals of 1 mile.
Systemic
The author collected sample data by randomly selecting 12 different pages from Harry Potter and the Sorcerer's Stone and then finding the number of words in each sentence on each of those pages.
Cluster
The author collected sample data by randomly selecting 20 different pages from a printed version of the Merriam- Webster dictionary and then counted the numbers of defining words on each of those pages.
Cluster
Currently, there is no approved vaccine for the prevention of West Nile virus. A clinical trial of a possible vaccine is being planned to include subjects treated with the vaccine while other subjects are given a placebo.
Completely randomized design
From Data set 16 in Appendix B we see the an earthquake had a measurement of .70 on the Richter scale.
Continuous because it's a measurement
George Washington was 188 cm tall.
Continuous because it's a measurement
a. Exact braking distances of cars, measured on a scale from 100 ft to 200 ft.
Continuous because the number of possible values is infinite and not countable. In the wording of this sentence exact means fractions can be considered.
Discrete or continuous: In Data Set 13 of Appendix B, the measured chest deceleration of a Honda Civic in a crash test is 39G, where g is a force of gravity.
Continuous:because it's a measurement
In 1936, Literary Digest Magazine mailed questionnaires to 10 million people and obtained 2,266,566 responses. The responses indicated that Alf Landon would win the presidential election, but Franklin D Roosevelt actually won the election.
Convenience
The CBS news station in New York City often obtains opinions by interviewing neighbors of a person who is the focus of a news story.
Convenience
The sexuality of women was discussed in Shere Hite's book "Women and love": A Cultural Revolution. Her conclusions were based on sample data that consisted of 4500 mailed responses from 100,000 questionnaires that were sent to women.
Convenience
Data set 13 in Appendix B includes crash results from 21 different cars.
Discrete
From Data Set 17 in Appendix B we see that a male spoke 13, 825 words in one day.
Discrete
The Honda Civic has 4 cylinders.
Discrete because the number of cylinders are finite numbers that result from a counting process.
c. The numbers of students now in statistics classes.
Discrete because the number of possible values is finite
d. The number of attempts required to roll a single die and get an outcome of 7.
Discrete because the number of possible values is finite and countable
b. Braking distances of cars, measured on a scale from 100 ft to 200 ft and rounded to the nearest foot.
Discrete because the number of possible values is finite. In the wording of this sentence the rounding means that fractions can not be considered. That's what makes it discrete.
When collecting data from different sample locations in a lake, a researcher uses the "line transect method" by stretching a rope across the lake and collecting samples at every interval of 5 meters.
Systemic
You want to conduct a study to determine whether fruit consumption leads to reduce weight. Why would an experiment be better than an observational study?
Experiments are often better than observational studies because experiments typically reduce the chance of having the results affected by some variable that is not part of a study.
The author conducted a survey of the students in all of his classes. He asked the students to indicate whether they are left-handed or right-handed. Is this convenience sample likely to provide results that are typical of the population? Are the results likely to be good or bad? Does the quality of the results in this survey reflect the quality of convenience samples in general?
From the given information, the author conducted a survey of the students in all his classes. He asked the students to indicate whether they are left-handed or right-handed. Yes, the results in this survey reflect the quality of convenience samples in general. It reflects the quality in terms of cost. These results are neither good nor bad. Since the survey is about the right and left-handed. In this survey the students will give a response to their convenience.
Years in which U.S. presidents were inaugurated
Interval because there are years in between each presidential inauguration.
Lisinipril is a drug designed to lower blood pressure. In a clinical trial of Lisinipril, blood pressure levels of subjects are measured before and after they have been treated with the drug.
Matched pair
The HIV trials network is conducting a study to test the effectiveness of two different experimental HIV vaccines. Subjects Will consist of 80 pairs of twins. For each pair of twins, one of the subjects will be treated with the DNA vaccine and the other twin will be treated with the adenoviral vector vaccine.
Matched pair design
At a national conference of the American appliances association, a market researcher plans to conduct a survey of conference attendees. She uses the list of attendee names and selects every 20th name. Is the result of a simple random sample. Why are why not? In general, what is a simple random sample?
No not every sample of the same size has the same chance of being selected. For example, the sample with the first two names has no chance of being selected. A simple random sample of n items it is selected in such a way that every sample of the same size has the same chance of being selected. This is a systematic sample.
In a study sponsored by Coca-Cola, 12,500 people were asked what contributes most to their happiness, and 77% of the respondents said that it was their family or partner.
No treatment was given to the respondents. This is observational.
Car Models (Chevrolet Aveo, Honda Civic,...., Buick Lucerne) used for crash testing, as listed in Data Set 13 of Appendix B
Nominal because car models are labels
Determine if Nominal,ordinal,interval, or ratio: Colors of M&M's (red, orange, yellow, brown, blue, green) listed in Data set 20 in Appendix B
Nominal because colors are labels.
The mean IQ score for subjects taking the Wechsler Adult Intelligence Scale IQ test is 100.
Parameter
There are 50 states capitols in the United States.
Parameter
The average (mean) atomic weight of all elements in the periodic table is 134.355 unified atomic mass units.
Parameter because it's average is calculated over all elements.(I,e. over population)
Researchers at the National Cancer Institute Studied meet consumption and its relationship to mortality. Approximately one half million people were surveyed, and they were then followed for a period of 10 years.
Prospective study
The nurses health study was started in 1976 with 121,700 female registered nurses who were between the ages 30 and 55. The subjects were surveyed in 1976 and every two years thereafter. This study is ongoing
Prospective study
How do quantitative data and categorical data differ?
Quantitative (or numerical) data consist of numbers representing counts or measurements. While categorical data consists of names or labels that are not numbers representing counts or measurements.
In a Pew Research Center Poll, 1007 adults were called after their telephone numbers were randomly generated by a computer, and 85% of the respondents were able to correctly identify what Twitter is.
Random
In a clinical trial of the cholesterol drug Lipitor, subjects were partitioned into groups given a placebo or Lipitor doses of 10 mg, 20 mg, 40 mg, or 80 mg. The subjects were randomly assigned to the different treatment groups (based on data from Pfizer, Inc).
Random
A group of students develops a scale for rating the quality of cafeteria food, with 0 representing "neutral: not good and not bad." Bad meals are given negative numbers and good meals are given positive numbers, with the magnitude of the number corresponding to the degree of badness or goodness. The first three meals are rated as 2, 4, and -5. What is the level of measurement for such ratings? Explain your choice.
Rating scales are ordinal levels of measurement. Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
Depths (km) of earthquakes listed in data set 16 of Appendix B
Ratio because the measurement has a natural zero starting point and has all of the characteristics of an interval
In order to study the seriousness of drinking and driving, a researcher obtains records from past car crashes. Drivers are partitioned into a group that had no alcohol consumption and another group that did have evidence of alcohol consumption at the time of the crash.
Retrospective study
According to the state of New York unified court systems, names of potential jurors are selected from a variety of different sources. When a trial requires a jury, names from the list are randomly selected in away that is equivalent to writing the names on slips of paper, mixing them in a bowl, and selecting the required number of potential jurors.
Simple random sample
In the last general election,132,312 adults voted in Dutchess County, New York. You plan to conduct a post-election survey of 500 of those voters. After obtaining a list of those who voted, you number the list from 1 to 132,312, and then you use a computer to randomly generate 500 numbers between 1 and 132,312. Your sample consists of the voters corresponding to the selected numbers.
Simple random sample
Among the flights included in the sample of flights in Data set 15 of Appendix B, 21% arrived late.
Statistic
In a random sample of households, it was found that 47% of the sampled households had high-definition TVs.
Statistic
Statistic or Parameter: In an AAA Foundation for Traffic Safety survey, 21% of the respondents said that they recently texted or e-mailed while driving.
Statistic
The average (mean) volume of the brains included in Data Set 6 of Appendix B is 1126.0 cm.
Statistic
In a study of treatment for back pain, 641 subjects were randomly assigned to the four different treatment groups of individualized acupuncture, standardized acupuncture, simulated acupuncture, and usual care (based on data from "A randomized trial comparing Acupuncture, simulated acupuncture, and Usual Care for Chronic Low Back Pain, "by Cherkin et al ., archives of internal medicine, Vol.169, No. 9).
Stratified
In a clinical trial of the cholesterol drug Lipitor, 188 subjects were given 20 mg doses of the drug, and 3.7% of them experienced nausea (based on data from Pfizer, Inc).
Subjects were treated with medicine. This is an experiment.
In the Born Loser cartoon strip by Art Sansom, Brutus expresses joy over an increase temperature from 1 to 2 degrees. When asked what is so good about 2 degrees, he answered that "it's twice as warm as this morning." Explain why Brutus is wrong yet again.
Temperatures are an interval and as intervals, temperature does not have a meaningful true natural zero. so Brutus is wrong when he says it's twice as warm as this morning."
Identifying the population: In a Gallup poll of 1010 adults in the United States, 55% of the respondents said that they used local TV stations daily as a source of news. Is the 1010 value a statistic or a parameter? Is the 55% value a statistic or a parameter? Describe the population.
The 1010 is a sample of the total adult population, therefore, it is a statistic. The 55% is a subset of the 1010 sample size so it is a statistic also. The population would be the total number of adults in the United States.
In data set 5 in Appendix B lists, blood levels are represented as 1 for low, 2 for medium, and 3 for high. The average (mean) of the 121 blood levels is 1.53
The low, medium, high values are ordinals. You can not get the average from an ordinal.
As of this writing, the New York Yankees were the last team to win the World Series, and the numbers of the starting lineup are 2, 18, 25, 13, 20,55,24, 33, and 53. The average (mean) of those numbers is 27.0
The numbers are labels that represents the players. As such these are nominal numbers and have no numeric value. It makes no sense to calculate averages with nominal numbers.
In a preelection survey of likely voters, political parties of respondents are identified as 1 for a Democrat, 2 for a Republican, 3 for an Independent, and 4 for anything else. The average (mean) is calculated for 850 respondents and the result is 1.7.
The numbers assigned to each party are labels which make them nominals. It makes no sense to calculate the average of labels.
A student of the author listed his adult friends, and then he surveyed a simple random sample of them. What is the population from which the simple random sample was selected? Are the results likely to be representative of the general population of adults in the United States? Why or why not?
The population consists of the adult friends on the list. The simple random sample is selected from the population of adult friends on the list, so the results are not likely to be representative of the much larger general population of adults in the United States.
A student of the author listed his adult friends, and then he surveyed a simple random sample of them. What is the population from which the simple random sample was selected? Are the results likely to be representative of the general population of adults in the United States? Why are why not?
The population consists of the adult friends on the list. the simple random sample listThis simple random sample is selected from The population of adult friends on the list, so the results are not likely to be representative of the much larger general population of adults in the United States.
Arm Circumference: From data Set 16 in appendix B we see that a female had an arm circumference of 27.5 cm.
The value 27.5 is from a continuous data set because the arm circumference measured in cm of a female can be any value on a continuous scale and the number of possible values cannot be counted.
What's wrong with this picture? The Newport Chronicle ran a survey by asking readers to call in their response to this question:"Do you support the development of atomic weapons that could kill millions of innocent people?" It was reported that 20 readers responded and that 87% said "no," while 13% said "yes". Identify 4 major flaws in this survey.
The wording of the question is biased and tends to encourage negative responses. The sample size of 20 is too small. Survey respondents are self-selected instead of being selected by the newspaper. If 20 readers respond, the percentages should be multiples of 5, so 87% and 13% are not possible results.
The Milgram Research Company wants to study reactions to stress, so it administers surveys in which the person asking the questions pretends to become very angry with the survey subject. At one point, the surveyor screams at the subject and asks how anyone could have such "stupid" opinions.
There is the treatment of anger toward the subjects. This is an experiment.
Nine-year-old Emily Rosa was an author of an article in the Journal of the American Medical Association after she tested professional touch therapists. Using a cardboard partition, she held her hand above one of the therapist's hands, and the therapist was asked to identify the hand that Emily chose.
This is an observational study because the therapists were not given any treatment. Their responses were observed.
Bayer healthcare LLC produces low dose aspirin pills designed to contain 81 mg of aspirin. Because each pill contains other ingredients, including corn starch, talc, and propylene glycol, it is difficult to check whether manufactured pills contain 81 mg of aspirin. A quality control plan is to select every 1000th pill, which is then tested for the correct amount of aspirin.
This sample is not a simple random sample. Because every 1000th pill is selected, some samples have no chance of being selected. For example, a sample consisting of two consecutive pills has no chance of being selected, and this violates the requirement of a simple random sample.
Mall managers commonly research how customers use the malls. The author was approached by a pollster at the Galleria Mall in Dutchess County, New York. The pollster was obviously selecting subjects who appeared to be approachable.
This sample is not a simple random sample. Not every sample has the same chance of being selected. For example, a sample that includes people who do not appear to be approachable has no chance of being selected. This is an example of a convenience sample.
From Data set 1 in Appendix B we see that a female had an arm circumference of 27.5 cm.
continuous because it's a measurement
Data set 15 in Appendix B lists flight numbers of 48 different flights, and the average (mean) of those flight numbers is 11.0
flight number is a label for the route that the plane is taking. Because it is a label it's a nominal and makes no sense to calculate averages.
The movie Avatar was given a rating of 4 stars on a scale of 5 stars.
ordinal because it's a ranking
Blood lead levels of low, medium, and high used to describe the subjects in Data Set 5 of appendix B
ordinal because low, medium, and high are given numeric values and are rankings but the difference between the numbers have no meaning.
Volumes (cm) of brains listed in data set 6 of Appendix B
ratio because the measurement has a natural zero starting point and has all of the characteristics of an interval