Chapter 1 - Introduction to Statistics

Ace your homework & exams now with Quizwiz!

Replication

is the repetition of an experiment on more than one subject.

Observational study

observing or measuring specific characteristics without attempting to modify the subjects being studied.

Voluntary response (or self-selected) samples

often have bias (those with special interest are more likely to participate).

reduces plaque by over 400%, why wrong?

reduction of 100% would eliminate all plaque, so it is not possible to reduce by more than 100%

One goal of statistics

to describe and understand sources of variability.

Self-Interest Study

when the study is conducted by those who will benefit from the results.

Placebo effect

when untreated subject report improvement in symptoms (just because they expect to feel better)

Several studies showed that when eating a diet low in red meat ​, subjects had decreased cholesterol. High cholesterol levels have been associated with increased risk of heart disease and stroke. A poultry farmer's organization financed this research. What is wrong with this​ study?

​Self-interest study

In a study of a weight loss​ program, 5 subjects lost an average of 42 lbs. It is found that there is about a 25​% chance of getting such results with a diet that has no effect. Does the weight loss program have practical​ significance?

​Yes, the program is practically significant because the amount of lost weight is large enough to be considered practically significant.

"Random" Sample

"random" means each subject in a group has equal chance of being selected. ("equally likely")

Cluster vs. Stratified

- Stratified Sampling: Randomly chooses SOME from EACH GROUP. -- At a different school, an administrator randomly selects 10 students from each class. - Cluster Sampling: Randomly chooses WHOLE GROUPS. -- A school administrator randomly selects 10 classes from across the campus & administers a survey to all the students in those classes.

Basics of Collecting Data

- The method used to collect sample data influences the quality of the statistical analysis. - If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them!

EXAMPLE Illustrating Simple Random Sampling Suppose a study group has 5 students: Bob, Patricia, Mike, Jan, and Maria 2 of the students must go to the board to demonstrate a homework problem. Choose a "random" sample of size 2 from the 5 students: Patricia, Maria? Bob, Jan?

- To insure "equally likely" you need to create a method of choosing from the whole group equally: -- Put each name on a slip of paper, mix well and pick without looking...or... -- ASSIGN A NUMBER TO EACH name, then use a random number table or a random number generator on a computer/calculator to pick a number between one and five in "random" order

Potential Pitfalls - Misleading Conclusions

- Two variables that may seemed linked should not be assumed to "cause" each other. - We cannot conclude the one causes the other when in fact the variables are only correlated or associated together. Correlation does not imply causality.

Designed Experiment

- apply some treatment and then observe its effects on the subjects. (subjects in experiments are called experimental units)

Qualitative/Categorical (attribute) data

- consists of names or labels (representing categories) - Example: Smoker/ Non-smoker. - Example: Your favorite music artists. - NOTE: Categorical data are usually "words" but can be numbers - IF those numbers are just labels: **Example: Student ID number or phone number. **Example: Ticket number on raffle ticket stub.

Quantitative (or numerical) data

- consists of numbers representing counts or measurements. - Example: The weights of supermodels. - Example: Your travel time to school/work.

Blinding

- is a technique in which the subject doesn't know whether he or she is receiving a treatment or a placebo. - Blinding allows us to compare: Is the treatment significantly different from a placebo effect?

Continuous (numerical) data

- numbers that must be measured on some continuous scale that covers a range of values without gaps, interruptions, or jumps. - Example: The amount of milk that a cow produces per day. ie. The measured amount could be 2 gallons, or 3 gallons or anything in between: 2.34315 gallons

Confounding

- occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors. - Confounding happens when too many variables are not controlled for.

Sampling error

- the difference between a sample result and the true population result. - Some sampling error is natural - because with a sample, there are always people from the population we didn't include. - The worst sampling error happens with Non-random sampling: Using a sampling method that is not random, such as using a convenience sample or a voluntary response sample

Discrete data

- when possible values is either a 'countable' number or can be listed as individual values. - Example: The number of eggs that a hen lays (i.e. list of possible values is 0, 1, 2, 3, . . .) - Example: Adult shoe sizes (i.e. list of possible values is 5, 5 ½, 6, 6 ½ ...)

Non-sampling error

- when sample data is incorrectly collected, recorded, or analyzed. (Such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly) - This is the type of error we try to control by careful survey or experimental design.

Cluster Sampling

1) Divide the population area into sections (or clusters). 2) Randomly select some of those clusters, and 3) Survey all members from the selected clusters.

Stratified Sampling

1) Subdivide the population into at least two subgroups based on some shared characteristic, then 2) Randomly choose some from each group.

Conducting a Statistical Study

1. Determine the design of the study. a. State the question to be studied. b. Determine the population and variables. c. Determine the sampling method. 2. Collect the data. 3. Organize the data. 4. Analyze the data to answer the question.

Summary Three very important considerations in the design of experiments are the following:

1. Use randomization to assign subjects to different groups 2. Use replication by repeating the experiment on enough subjects so that effects of treatment or other factors can be clearly seen. 3. Control the effects of variables by using such techniques as blinding and a completely randomized experimental design.

What is a voluntary response​ sample?

A sample in which the subjects themselves decide whether to be included in the study.

A researcher was once criticized for falsifying data. Among his data were figures obtained from 8 groups of mice​, with 20 individual mice in each group. These values were given for the percentage of successes in each​ group: 53%,​ 58%, 63%, ​46%, 48%, 67%, 54%, 42%. ​What's wrong with those​ values?

All percentages of success should be multiples of 5. The given percentages cannot be correct. (1/20 * 100 = 5)

census

Collection of data from every member of a population

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. Observational Study or Designed Experiment?

Designed Experiment

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. The data variable recorded was? Number of cavities: Continuous or Discrete?

Discrete

Confounding: Researchers wanted to determine the long-term benefits of the influenza vaccine on seniors aged 65 or older. The research group looked at records of 36,000 seniors: Group 1 were seniors who chose to get a flu vaccine, Group 2 were seniors who chose not to get the flu vaccine. After observing the seniors for 10 years, it was determined that seniors who got flu shots were 27% less likely to be hospitalized for pneumonia or influenza, and 48% less likely to die from pneumonia or influenza. Possible Confounding Variables? (Are there characteristics that might influence the result that were not controlled for?)

Insurance/Income, General Health, Environment/Exposures

Survey questions may be misleading if they are​ "loaded." To what does​ "loaded" refer?

Intentionally worded to elicit a desired response

Determine whether the sampling method described below appears to be sound or is flawed. In a survey of 768 human resource​ professionals, each was asked about the importance of the appearance of a job applicant. The survey subjects were randomly selected by pollsters from a reputable market research firm.

It appears to be sound because the data are not biased in any way.

Determine whether the sampling method described below appears to be sound or is flawed. In a survey of 781 subjects, each was asked how often he or she drank milk. The survey subjects were internet users who responded to question that was posted on a news website

It is flawed because it is a voluntary response sample.

EXAMPLE Observational Study vs. Designed Experiment Researcher Joachin Schuz wanted "to investigate cancer risk among Danish cellular telephone users". His research group kept track of 420,095 people who first subscribed to a cell phone service between 1982 and 1995. In 2002, they recorded the number of people out of the 420,095 who had a brain tumor and compared the rate of brain tumors in this group to the rate of brain tumors in the general population. They concluded "cellular telephone use was not associated with increased risk for brain tumors." Observational Study or Designed Experiment?

Observational Study

EXAMPLE Observational Study or Designed Experiment? A total of 974 homeless women in the LA area were surveyed to determine their level of satisfaction with the healthcare provided by shelter clinics versus the healthcare provided by government clinics. The women reported greater quality satisfaction with the shelter and outreach clinics compared to the government clinics.

Observational Study

EXAMPLE Observational Study or Designed Experiment? A total of 974 homeless women in the LA area were surveyed to determine their level of satisfaction with the healthcare provided by shelter clinics versus the healthcare provided by government clinics. The women reported greater quality satisfaction with the shelter and outreach clinics compared to the government clinics. Satisfaction Level: Level of Measurement?

Ordinal

Descriptive Statistics

Organizing, summarizing & displaying the data you have.

Refer to the given table of measurements below. Is there some meaningful way in which the IQ scores are matched with the corresponding brain​ volumes? If they are​ matched, does it make sense to use the difference between each IQ score and brain volume that is in the same​ column? Why or why​ not? Subject IQ Brain_Volume_(cm^3) 1 98 1038 2 80 1142 3 90 1099 4 88 1113 5 102 1072 Is there some meaningful way in which the IQ scores are matched with the corresponding brain​ volumes? If they are​ matched, does it make sense to use the difference between each IQ score and brain volume that is in the same​ column? Why or why​ not?

Yes, each IQ score is matched with the brain volume in the same​ column, because they are measurements obtained from the same person. ​No, it does not make sense to use the difference between each IQ score and brain volume in the same​ column, because IQ scores and brain volumes use different units of measurement.

Given the data in the table​ below, what issue can be addressed by conducting a statistical analysis of the pulse​ rates? Pulse Rate​ (beats per​ minute) Male 62 74 65 60 61 Female 64 64 72 85 80

The data can be used to address the issue of whether males or females have pulse rates with the same average​ (mean) values.

Examples: Identify each quantitative variable as discrete or continuous.

The number of people in a car: DISCRETE - you count the number of people 0,1,2,3... The gallons of gas bought in a week: CONTINUOUS - you measure the gallons of gas. It takes on values in between: To fill from 12 gallons to 13 gallons we pass thru all the amounts in between. The time it takes to drive from home to school: CONTINUOUS - you measure the amount of time. It takes on values in between: 2 minutes, 20.5 minutes, 48.8 minutes The number of trips to school per week: DISCRETE - you count the number of trips 0, 1, 2,3...

Example 3: The Gallup corporation collected data from 1013 adults in the United States. Results showed that 66% of the respondents worried about identity theft. The study concluded that more than half of all U.S. adults worry about identity theft. What is the population of interest?

The population consists of all adults in the United States.

A particular country has 40 total states. If the areas of 30 states are added and the sum is divided by 30​, the result is 211,081 square kilometers. Determine whether this result is a statistic or a parameter.

The result is a statistic because it describes the characteristic of a sample

Example 3: The Gallup corporation collected data from 1013 adults in the United States. Results showed that 66% of the respondents worried about identity theft. The study concluded that more than half of all U.S. adults worry about identity theft. What was the sample?

The sample consists of the 1013 polled adults.

statistics

The science of ...planning studies and experiments, ...obtaining data, ... then organizing, summarizing, presenting & analyzing the data in order to draw conclusions or make predictions.

Example 3: The Gallup corporation collected data from 1013 adults in the United States. Results showed that 66% of the respondents worried about identity theft. The study concluded that more than half of all U.S. adults worry about identity theft. Is this last statement an example of descriptive or inferential statistics?

The statement is using the sample data as a basis for drawing a conclusion about the whole population. -> Inferential Statistics

Example In the largest public health experiment ever conducted, 200,745 children were given the Salk vaccine, while another 201,229 children were given a placebo. Observational study or Experiment?

The vaccine injections constitute a treatment, so this is an example of an experiment.

Determine whether the source given below has the potential to create a bias in a statistical study. A certain medical organization tends to oppose the use of meat and dairy products in our​ diets, and that organization has received hundreds of thousands of dollars in funding from an animal rights foundation.

There does appear to be a potential to create a bias. There is an incentive to produce results that are in line with the​ organization's creed and that of its funders.

Determine whether the source given below has the potential to create a bias in a statistical study. Washington University obtained word counts from the most popular novels of the past five years.

There does not appear to be a potential to create a bias. The organization would not gain from putting a spin on the results.

Example The Pew Research Center surveyed 2252 adults and found that 59% regularly use WiFi to connect to the internet. Observational study or Experiment?

This an observational study because the adults had no treatment applied to them.

What is the goal of learning​ statistics?

To learn to distinguish between statistical conclusions that are likely to be valid and those that are seriously flawed

Convenience Sampling

Use results that are easy to get.

Inferential Statistics

Using sample data to make inferences about the entire population.

Double-Blind

When blinding occurs at two levels: - The subject doesn't know whether he or she is receiving the treatment or a placebo. - The experimenter doesn't know whether he or she is administering the treatment or placebo.

Randomization (in Experimental Groups)

When subjects are randomly assigned to the different treatment groups.

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. The treatment was?

Xylitol

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. Did the experiment include blinding?

Yes, Double-Blinding

In a study of a weight loss​ program, 40 subjects lost a mean of 2.3 lbs after 12 months. Methods of statistics can be used to show that if this diet had no​ effect, the likelihood of getting these results is roughly 3 chances in 1000. Does the weight loss program have statistical​ significance? Does the weight loss program have practical​ significance? Does the weight loss program have statistical​ significance?

Yes, because the results are unlikely to occur by chance

parameter

a numerical measurement describing some characteristic of a population.

statistic

a numerical measurement describing some characteristic of a sample.

In the data table​ below, the​ x-values are the weights​ (in pounds) of cars and the​ y-values are the corresponding highway fuel consumption amounts​ (in mi/gal). Weight​ (lb) 4086 3392 4157 3678 3590 Highway Fuel Consumption​ (mi/gal) 26 32 28 28 30 Comment on the source of the data if you are told that car manufacturers supplied the values. Is there an incentive for car manufacturers to report values that are not​ accurate?

Yes, because​ consumers, in​ general, would prefer to buy a car with a higher level of fuel efficiency. In this​ case, the source of the data would be suspect with a potential for bias.

Step 3: Conclude

- Statistical Thinking: Using critical thinking to form conclusions and identify the practical implications of data.

population

The complete collection of all measurements or data that are being considered

Potential Pitfalls - Nonresponse/Missing Data

- Occurs when someone either refuses to respond to a survey question or is unavailable. - Non-Response: People who refuse to talk to pollsters often have a view of the world that is markedly different than those who will let pollsters into their homes. - Missing Data: Example - U.S. Census suffers from missing people (tend to be homeless or low income).

Potential Pitfalls - Order of Questions

- Questions can unintentionally biased by the order of the items being considered. - Example: - Would you say traffic contributes more or less to air pollution than industry? Results: traffic - 45%; industry - 27% - When order reversed. Results: industry - 57%; traffic - 24%

Analyzing Data - Potential Pitfalls

- Self-Reported Results - Small Samples - Loaded Questions - Conclusions about "Cause" - Order of Questions - Non-Response/ Missing Data Assuming Precise Numbers and Percentages are correct - We cannot conclude the one causes the other. Correlation does not imply causality.

Beyond the Basics of Collecting Data

- Sources of Data: -- Observational studies -- Designed experiments

EXAMPLE Qualitative vs. Quantitative Variables Determine whether the variables are Quantitative Continuous or Quantiative Discrete?

(a) Type of car a person drives: QUALITATIVE (b) Distance a car travels on one full tank of gas: QUANTITATIVE CONTINUOUS (c) Number of times your Internet service goes down in the next 30 days: QUANTITATIVE DISCRETE (d) Shirt numbers on athletes' uniforms: QUALITATIVE

EXAMPLE Qualitative vs. Quantitative Variables Determine whether the following variables are qualitative or quantitative.

(a) Type of car a person drives: QUALITATIVE. (b) Distance a car travels on one full tank of gas: QUANTITATIVE (c) Number of times your Internet service goes down in the next 30 days: QUANTITATIVE (d) Shirt numbers on athletes' uniforms: QUALITATIVE

EXAMPLE: Levels of Measurement Determine whether the following data types are Nominal, Ordinal, Interval or Ratio?

(a) Year a chosen book was published: INTERVAL (b) Price of college textbooks: RATIO (c) Gender (Male/Female): NOMINAL (d) High temperature for cities on Dec. 25: INTERVAL (e) Computer Experience (Beginner, Intermediate, Advanced) : ORDINAL (F) Customer rating of service: (Very Poor, Poor, OK, Good, Great): ORDINAL (g) The number of trips to school you make per week: RATIO

Methods of Sampling - Summary

- "Simple" Random - Systematic - Convenience - Stratified - Cluster

Simple Random Sample

- A simple random sample is an organized method to choose "randomly" from the WHOLE GROUP (with no sub-groupings) in such a way that: - Every possible subset (sample) is just as likely to be chosen as any other subset (sample).

Levels of Measurement

- Another way to classify data is to use levels of measurement. - Nominal - categories only - Ordinal - categories with some natural order - Interval - quantitative, but with no natural zero point -- Primarily: Temperature & Calendar year are quantitative data at the "interval" level of measurement - Ratio - quantitative with a natural zero point -- Most other quantitative variables are at "ratio" level

Potential Pitfalls - Precise Numbers

- Because a figure is precise, many people incorrectly assume that it is also accurate. - Example: The population consists of all 241,472,385 adults in the United States.

data

- Collections of observations, such as measurements, counts, or survey responses - A key aspect of data is that it varies.

Step 1: Prepare

- Context: What is the goal of the study?What data will we need? - Source of the Data/Sampling Method: Is the data source objective -or- biased? -- Does the method chosen greatly influence the validity of the conclusion?

Types of Studies (time perspective)

- Cross-sectional study (In Present) -- Data is observed, measured, or collected at one point in time. - Retrospective (or case control) study (Over the Past) -- Data is collected from the past by going back in time (examine records, interview, etc). - Prospective (or longitudinal or cohort) study (Into the Future) -- Data is collected into the future from groups sharing common factors (called cohorts).

Potential Pitfalls - Percentages

- Example - Continental Airlines ran an ad claiming "We've already improved 100% in the last six months" with respect to lost baggage. - Does this mean Continental made no baggage mistakes at all? - Example - In a sample of 20 people it was reported that 46.8% were born in California. - If the sample size is 20, what results are actually possible? 1 out of 20 = 5% 2 out of 20 = 10% 3 out of 20 = 15% 4 out of 20 = 20% etc.. ...is 46.8% possible from sample of 20??? No

Potential Pitfalls - Small Samples

- Example: When high school attendance data was broken down by region, one region had only 3 students that had been suspended from school. - The summary report said 67% of students suspended had been previously suspended at least twice. (2 out of a sample size of only 3)

Step 2: Analyze

- Explore the data with a graph. Every analysis should begin with appropriate graphs (Chapter 2). - Apply appropriate statistical methods. With technology, good analysis does not require strong computational skills, but it does require the choice of appropriate methods and correct use of sound statistical processes. (Chapters 3 -12)

Potential Pitfalls - Loaded Questions

- If survey questions have biased wording, the results of a study can be misleading. - Example: - 97% yes: "Should the Mayor have line item veto in city budgets to eliminate waste?" - 57% yes: "Should the Mayor have line item veto in city budgets?"

Potential Pitfalls - Self-Reported Data

- Inaccuracies can happen when asking subjects to report their own results. - Example- If you ask people what they weigh, you are likely to get their desired weight rather than their actual weight. - Example - People with low incomes are less likely to report their true incomes.

Errors

- No matter how well you plan and execute the sample collection process, there is likely to be some error in the results. - Sampling error - Non-Samping error

A primary goal of statistics

Learn about a large group by examining data from some of its members.

EXAMPLE Observational Study or Designed Experiment? A total of 974 homeless women in the LA area were surveyed to determine their level of satisfaction with the healthcare provided by shelter clinics versus the healthcare provided by government clinics. The women reported greater quality satisfaction with the shelter and outreach clinics compared to the government clinics. The data variable recorded was?

Level of satisfaction

In a study of a weight loss​ program, 40 subjects lost a mean of 2.3 lbs after 12 months. Methods of statistics can be used to show that if this diet had no​ effect, the likelihood of getting these results is roughly 3 chances in 1000. Does the weight loss program have statistical​ significance? Does the weight loss program have practical​ significance? Does the weight loss program have practical significance?

No, someone starting a weight loss program would likely want to lose considerably more than 2.3 lb.

In a study of a weight loss​ program, 5 subjects lost an average of 42 lbs. It is found that there is about a 25​% chance of getting such results with a diet that has no effect. Does the weight loss program have statistical​ significance?

No, the program is not statistically significant because the results are likely to occur by chance.

Summary - Levels of Measurement

Nominal - no particular order (QUALITATIVE) Ordinal - categories with some natural order (QUALITATIVE) Interval -Temperature & Calendar year (QUANTITATIVE) Ratio - all other numerical data (QUANTITATIVE)

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. The data variable recorded was?

Number of cavities

EXAMPLE Observational Study vs. Designed Experiment Researcher Joachin Schuz wanted "to investigate cancer risk among Danish cellular telephone users". His research group kept track of 420,095 people who first subscribed to a cell phone service between 1982 and 1995. In 2002, they recorded the number of people out of the 420,095 who had a brain tumor and compared the rate of brain tumors in this group to the rate of brain tumors in the general population. They concluded "cellular telephone use was not associated with increased risk for brain tumors." Cross Sectional/ Retropective/Prospective?

Prospective

EXAMPLE Observational Study or Designed Experiment? A total of 974 homeless women in the LA area were surveyed to determine their level of satisfaction with the healthcare provided by shelter clinics versus the healthcare provided by government clinics. The women reported greater quality satisfaction with the shelter and outreach clinics compared to the government clinics. Satisfaction Level: Qualitative or Quantitative?

Qualitative

EXAMPLE Observational Study vs. Designed Experiment Xylitol is a food additive that has been tested in preventing dental caries (cavities). Unknown to the children, a total of 45 Peruvian children were given milk with and 35 children were given milk without xylitol. After one year of use, the number of cavities for each subject was recorded by dentists who were not informed of the children's groupings. The data variable recorded was? Number of cavities: Qualitative or Quantitative?

Quantitative

Working with Quantitative Data

Quantitative data can be further described by distinguishing between discrete and continuous types.

Which of the following is NOT a voluntary response​ sample?

Quiz scores from a college level statistics course are analyzed to determine student progress.

Systematic Sampling

Randomly select some starting point and then select every kth element in the population.

sample

Subcollection of members selected from a population

EXAMPLE 2: Parameter versus Statistic

Suppose a SAMPLE OF 250 CRC STUDENTS is obtained, and from this sample we find that 86.3% have a job. This value represents a STATISTIC because it is a numerical summary based on a SAMPLE

EXAMPLE 1: Parameter versus Statistic

Suppose the percentage of ALL STUDENTS on your campus who have a job is 84.9%. Population or Sample? -> POPULATION Parameter or Statistic? -> PARAMETER


Related study sets

Sem 2 Lower Respiratory conditions

View Set

CNT 120: Chapter 1, CNT 120 Chapter 2, CNT 120 Chapter 3, CNT CHAPTER 4, CNT 125: Chapters 3 & 4, CNT CHAP 5, CNT120 CHAP 6, CNT 125: Chapter 7, CNT 125: Chapter 8, CNT 125 Chapter 9, CNT 125 Chapter 10, CNT 125 Chapter 11, CNT 125 Chapter 12, CNT 12...

View Set

Chapter 4: Access Control, Authentication, and Authorization

View Set

Marine Biology MIDTERM 2 Questions

View Set

CH 23 nursing care for newborns with special needs

View Set