qweqwe
Random sample
(the best sample!) Each individual member has an equal chance of being selected. A random sample avoids bias, but usually is expensive
Discrete data
- result when the number of possible values is either a finite number or a 'countable' number (0, 1, 2, 3, . .). Example: The number of eggs that a hen lays
Nonsampling error
- sample data incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly).
Sampling error
- the difference between a sample result and the true population result; such an error results from chance sample fluctuations
Example: Suppose that you are in class of 20 students trying to select 10 of them to work with you on a group project. Imagine that students are seated in two rows of 10 students each. To select a sample (10 students), you toss a coin. If it comes up heads, you use the 10 students in the first row; if it comes up tails, you use 10 students in the second row.
1) Does every student have an equal chance of being selected? - Yes. You use a coin (random element), and the probability for each student to be selected is 50%. 2) Is it possible to have a sample with 5 students from the first row and 5 students from the second row? - No. The condition is: choose either the first row OR the second row. So, you cannot choose ANY sample of 10 students from this class (population). So, it is not SIMPLE random sample, it is just random. 3) How to make it SIMPLE random sample? - There are many possibilities. For example, assign a number to every student, write the numbers on cards, mix the cards in a box, and randomly select 10 cards. Then any student can be selected, and any combination of 10 students is possible.
Abuses of Percentages Ex: Decrease $200 by 10%, then increase the result by 10%
A decrease of a certain percentage followed by an increase of the same percentage (but using the formerly reduced value) is a different number than the one you started from. This happens because the reference value changed! Decrease $200 by 10%: 0.10 x $200 = $20 $200 - $20 = $180 Increase the result by 10%: 0.10 x $180 = $18 $180 + $18 = $198 - different number than the original $200!
Pie charts
A good indicator of something being wrong is when the percentages do not sum up to 100%, like in the pie chart below. Here, people were asked which potential candidates they viewed favorably, but they could name more than one. The categories are thus not mutually exclusive, and the chart makes no sense.
Which is NOT an example of continuous data? a. Temperature on a thermometer. b. Number of students in an algebra class. c. Mean weight of 100 flour sacks. d. Amount of water pumped from a pond per day.
B
Precise Numbers
Because as a figure is precise, many people incorrectly assume that it is also accurate. Giving results too many decimal places sounds scientific, but it is only an estimate. Example: Men's Health Magazine:"48.2% of men think that women should offer to pay their share on a first date". It means the same as "about half"
Questions on a survey are scored with integers 1 thru 5 with 1 representing Strongly Disagree and 5 Strongly Agree. This is an example of what kind of measurement? a. Nominal. b. Ratio. c. Ordinal. d. Interva
C
Multistage Sampling
Collect data by using some combination of the basic sampling methods
In a large lecture room class of 300 students, a sample of 10 was taken to determine the male/female make up of the class. Which misuse of statistics does this represent? A. Percentages. B. Precise numbers. C. Missing data. D. Small samples.
D. Small Samples
Decrease $200 by 10%, then increase the result by 10% Abuses of percentage
Decrease $200 by 10%: 0.10 x $200 = $20 $200 - $20 = $180 Increase the result by 10%: 0.10 x $180 = $18 $180 + $18 = $198 - different number than the original $200!
Cluster sample
Divide the population area into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters
Sampling Method Voluntary response samples
Does the method chosen greatly influence the validity of the conclusion? Voluntary response samples (the respondents select themselves) often have bias, because those with special interest are more likely to participate. Results are most of the time not representative. Do NOT use voluntary response samples in scientific work!
Pictographs
Exaggeration of a picture graph
Small Samples
Example: Basing a school suspension rate on a sample of only three students. Conclusions should not be based on samples that are far too small.
Order of Questions
Example: Would you say that traffic contributes more or less to air pollution than industry? 45% blamed traffic, 27% blamed industry Would you say that industry contributes more or less to air pollution than traffic? 24% blamed traffic, 57% blamed industry
Possibility of Lying
How many people would answer honestly the following questions? "Have you ever used illegal drugs?" "Do you favor a constitutional amendment that would outlaw most abortions?" "Have you had more than one sexual partner in the past 6 months?" "Have you ever driven a motor vehicle while intoxicated?"
Which of the statements are using percentages in a correct way and which ones in an incorrect way? 1. He was 180% sure that this was the correct answer. 2. The population on the island increased by 250% in 4 years. 3. 150% of ARCC Students had to pay a tuition increase. 4. With this exercise machine you can increase the amount of weight you are able to lift by 200%. 5. The new Honda Civic has 270% more trunk volume than the new Mazda Miata. 6. She lost 130% of body fat. 7. This glass of orange juice has 300% of the daily-recommended dosage of Vitamin C.
Incorrect use: 1, 3, 6; (in a strictly biological sense # 7 is probably incorrect.) Correct use: 2, 4, 5; mathematically, # 7 is a correct use of percentages
Source of data
Is the source objective or biased? Is there something to gain or lose by distorting results? Example: A car insurance company advertises that their new customers saved an average of $350 by switching to this company's policy.
Nonresponse
People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let poll-takers into their homes
The Hawaii State Senate held hearings when it was considering a law requiring that motorcyclists wear helmets. Some motorcyclists testified that they had been in crashes in which helmets would not have been helpful. Which important group was unable to testify?
People, who were killed in motorcycle crashes, when a helmet may have saved their lives, could not testify
USA today conducted a poll of 800 divorced people who were asked if they wanted to marry again. It was reported that "overall, 58% of divorced people say they don't want to get married again". Give the population and the sample in this survey What is the population and sample?
Population: All divorced people (in the U.S.) Sample: The 800 divorced people who were surveyed.
Triola received a survey from the investment firm Merrill Lynch. It was designed to gauge his satisfaction as a client, and it had specific questions rating the author's personal Financial Consultant. The cover letter included this statement: "Your responses are extremely valuable to your Financial Consultant...We will share your name and response with your Financial Consultant." What is wrong with this survey?
Since the name will be shared with the consultant, most people who have negative opinion of their financial consultants would not complete the survey
How to tell between statistic and parameter
Step 1: Ask yourself: is this obviously a fact about the whole population? Sometimes that's easy to figure out. For example, with small populations, you usually have a parameter because the groups are small enough to measure: 10% of US senators voted for a particular measure. There are only 100 US Senators; you can count what every single one of them voted. Step 2: Ask yourself: is this obviously a fact about a very large population? If it is, you have a statistic. For example, 45% of Jacksonville, Florida residents report that they have been to at least one Jaguars game. It's very doubtful that anyone polled in excess of a million people for this data. They took a sample, so they have a statistic. If in doubt, think about the time and cost involved in surveying an entire population. If you can't imagine anyone wanting to spend the time or the money to survey a large number (or impossible number) in a certain group, then you almost certainly are looking at a statistic.
Systematic sample
Surveying / Drawing every n^th person / item on the list or production line. (The first number should be selected at random.)
At a security checkpoint to a government facility, every 10th individual was more thoroughly searched than the others. What type of sampling is this?
Systematic
Stratified sample
The population is divided into groups that have a characteristic in common (stratum, plural - strata). For example: age, gender, college major, or income etc. Then a random sample from each group is taken
Voluntary response sample
The respondents select themselves. Do NOT use voluntary response samples in scientific work
Nonzero axes
To correctly interpret a graph, you must analyze the numerical information given in the graph, so as not to be misled by the graph's shape. READ labels and units on the axes.
Loaded Questions
Too little money is being spent on "welfare" versus too little money is being spent on "assistance to the poor." Results: 19% versus 63%
Convenience sample
Use results that are easy to get. Usually, the results will be affected by bias. Try to AVOID convenience samples in scientific work
Context of Data (determines the type of statistical analysis that should be used)
What do the values represent? Why were they collected? Example: In lists of male and female ages on marriage licenses it makes a difference whether the ages were collected for each gender at random or paired according to who was marrying whom.
Parameter What is the characteristic?
a numerical measurement describing some characteristic of a POPULATION
Statistic What is the characteristic?
a numerical measurement describing some characteristic of a SAMPLE
Bias Voluntary response samples
a systematic difference between the results obtained by the sample and the actual truth about the whole population. Voluntary response samples (the respondents select themselves) often have bias, because those with special interest are more likely to participate. Results are most of the time not representative. Do NOT use voluntary response samples in scientific work!
Which is an example of quantitative data? a. Weights of high school students. b. Genders of actors and actresses. c. Colors of the rainbow. d. Consumer ratings of a particular automobile (below average, average, and above average)
a. Weights of high school students
Experiment
apply some treatment and then observe its effects on the subjects; (subjects in experiments are called experimental units)
Which is NOT an example of continuous data? a. Temperature on a thermometer. b. Number of students in an algebra class. c. Mean weight of 100 flour sacks. d. Amount of water pumped from a pond per day.
b. Number of students in an algebra class
Questions on a survey are scored with integers 1 thru 5 with 1 representing Strongly Disagree and 5 Strongly Agree. This is an example of what kind of measurement? a. Nominal. b. Ratio. c. Ordinal. d. Interval.
c. Ordinal
Missing Data
can dramatically affect results. US Census suffers from missing people (homeless or low income).
Percentages: When can they exceed?
can exceed 100% only when the context is a change or comparison.
Nominal
categories only (Example: Survey responses yes, no, undecided)
Ordinal
categories with some order (Example: Course grades A, B, C, D, or F)
Census
collection of data from every member of a population.
Categorical (or qualitative or attribute)
data consists of names or labels (representing categories). Example: The genders (male/female) of professional athletes
Quantitative (or numerical) What can it be also?
data consists of numbers representing counts or measurements. Example: The weights of supermodels Quantitative data can be discrete or continuous
Continuous (numerical)
data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps or interruptions. Example: The amount of milk that a cow produces; e.g. 2.343115 gallons per day
Ratio
differences and a natural starting point (Example: Prices of college textbooks) Tip: try a "ratio" test: If one number is twice the other, is the quantity being measured also twice the other quantity? If yes, the data are at the ratio level.
Interval
differences but no natural starting point (Example: Years 1000, 2000, 1776, and 1492)
Causation Examples Pg 6
event B happens BECAUSE OF event A
experimental units
experimental units
Data
is a collection of information that has meaning. Data is the plural of the word "Datum - single piece of information".
Statistics descriptive statistics inferential statistics
is is a science which helps to answer two questions: 1) How can we extract meaning from collection of data? (that is organize, describe, summarize when we have ALL the data) - descriptive statistics 2) How do we infer data about the whole population when we only have SOME data? - inferential statistics
Population
is the entire group of individuals or objects that we want information about. (NOT just the ones we reach!)
Sample
is the part of the population that is actually observed /surveyed. (The ones we reach.)
Random Sample
members from the population are selected in such a way that each individual member in the population has an equal chance of being selected
Observational study
observing and measuring specific characteristics without attempting to modify the subjects being studied
Simple Random Sample
of n subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen
Probability Sample
selecting members from a population in such a way that each member of the population has a known (but not necessarily the same) chance of being selected
Self-Interest Study
when assessing validity of a study, consider whether the sponsor might influence the results.
Correlation What is it? What does it not imply? Examples Pg 6
whenever event A happens, it is highly likely that event B happens Does not imply causality for example a statement about causality can be justified by physical evidence not by statistical analysis.