STAT 2040 Test 1

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Pareto Charts

A special type of graph, often presented as vertical bar graphs or "pie charts" that help us to prioritize categories of data. They are helpful when we have many categories of data that we are trying to sort through, looking for what is most important or frequent versus what is less important or frequent. They always list the categories in order from the most frequent to the least.

Sample

A sub collection of members drawn from a "population" and used to draw conclusions or make inferences about the population. In statistics, the most common approach is to use data from a "sample" in order to make inferences or draw conclusions about the larger "population." Example: We might give an experimental drug to 1,000 people (our "sample") in order to draw conclusions about what would happen if we made the drug available to all people everywhere (our "population").

Frequency Table

A table in which we list classes (categories) of values, along with the frequencies (counts) of the number of values that fall into each class.

Center

A value that indicates where the middle of the data set is located

Mean (average)

AKA average. The number obtained by adding the values and dividing by the number of values.

Napolean I- Emperor of France (aka short, dead dude) 1769-1821

"A picture is worth a thousand words."

Isaac Newton (1642-1727)

"If I have ever made any valuable discoveries, it has been owing more to patient attention, than to any other talent."

Benjamin Disraeli (1804-81) British Prime Minister

"There are three kinds of lies: lies, damned lies, and statistics."

Frequency Tables

-Be sure the classes are mutually exclusive (i.e., they must not overlap) -Include all intermediate classes, even if the frequency is zero (i.e., don't skip a class/category just because the frequency is zero) -Use the same width for all classes

Steps in calculating Median

1. Arrange the data in order from low-to-high (or high-to-low) 2. Pick the middle value If there are an even number of data values, take the average of the two middle values If there are an odd number of data values, simply take the middle value by itself.

Measures of Variation

1. Range 2. Standard Deviation (the most important) 3. Variance

Methods for obtaining Random Samples

1. Simple Random Sampling 2. Systematic Sampling 3. Stratified Sampling 4. Cluster Sampling

Bad Statistics

1. The self-selected survey (voluntary response sample) 2. Studies that use small samples and/or samples that do not provide a true representation of the population being studied. 3. Surveys with confusing or misleading questions 4. Presentations that include misleading data and graphs 5. Using precise numbers that are not accurate 6. Distorted Percentages 7. Partial Pictures (not the whole story) 8. Deliberate Distortions or outright lies

Statistics

A Specific Number or A Method of Analysis Used to explore and explain things not explainable by the physical sciences. Things like: Human Behavior Nature Medicine

Variation

A measure of the amount that the values vary among themselves

Convenience Sampling

A nonrandom sampling method. Bad statistics. Consists of subjects that were selected simply because they were readily available or easy to get a hold of. You see this all the time on TV News were the reporter is on the street asking people what they think about something or other. This is very bad statistics and you should not pay much attention to information or data collected in this manner.

Specific Number

A number that represents some measure of a set of data Example: The average hourly wage in Washington County is $15/hour Example: 23% of the people polled believe there are too many polls

Statistic

A numerical measurement describing something about a "sample" Example: If our "sample" consists of 30 DSC students, then one "statistic" would be the average GPA of our sample of 30 DSC students. Another "statistic" would be the percentage of female students among our sample of 30 DSC students. As long as we are talking about only our sample, any numerical measurement that describes something about that sample (not the entire population) would be considered a "statistic".

Parameter

A numerical measurement describing something about an entire "population". Example: If our "population" consists of all DSC students, then one "parameter" would be the average GPA of all DSC students. Another "parameter" would be the percentage of female students among all DSC students. As long as we are talking about the entire "population" we are interested in, any numerical measurement (e.g. average, percentage, etc.) that describes something about the entire population (not just a sample from the population) would be considered a "parameter".

Histogram

A picture or graph of a Frequency Table. More specifically, it is a bar graph in which the horizontal scale represents the classes/categories (from our frequency table) and the vertical scale represents frequencies (from our frequency table)

Simple Random Sample

A sample selected in such a way that every possible sample of size n has the same chance of being chosen Example: If you wanted to have a simple random sample of 30 students to represent the population of all DSC students, you would have to get every student's name and then randomly pick 30 students from the total student body (e.g. put all student names in a barrel, shake it up and then draw out 30 names.) The key to a simple random sample is that you are taking the sample from all members of the population, which means that you have to identify every member of the population (no shortcuts)

Scatter Diagram

A simple point graph used to see the relationship between two factors/variables.

Data from an Experiment

An "experiment" requires that we apply some sort of treatment and then see what kind of effect we get. Experiments usually involve two groups; the treatment or test group, and the placebo or control group. Example: we conduct an experiment to test the effectiveness of a new drug by giving the drug to one group the treatment or test groups) and giving a "sugar pill" or something that looks like the drug but really isn't any kind of drug at all, to another group called the placebo or control group. We then see if the test group did significantly better than placebo group. Experiments need to be carefully planned or designed in order to get accurate results.

Medicine

Because we are all different, the physical science alone cannot fully explain how our bodies react to drugs and other stimuli, so statistics plays a valuable role in medicine

Ratio data

Like interval data with the addition that there is now an absolute or true starting point where zero truly means there is no quantity present. Ratio data can be fully manipulated using mathematics. We can add/subtract or multiply/divide, or whatever. Examples: Money (e.g. prices of college textbooks.) $50 is half of $100 and $0 is truly zero or no money. Distance (e.g. miles from home to school). 20 miles is really twice as far as 10 miles and zero distance is truly no distance.

Presentations that include misleading data and graphs

Take a look at the following graphs and pictures and see if you can find the deception in each Hint: focus on the numerical information given, which is usually accurate, versus the general shape of the graph/picture which often misleads us.

Data from Observation

The "observation" method of data collection suggests that we only observe what we are studying, without in any way trying to change or effect it. For example, we might use the "observation" method of study by making a survey and asking people in a mall questions about what kind of pizza they like best and why. Surveys and actually sitting there and watching things are types of "observation"

Sample Data 2 2 5 1 2 6 3 3 4 2 4 0 5 7 7 5 6 6 8 10 7 2 2 10 5 8 2 5 4 2 6 2 6 1 7 2 7 2 3 8 1 5 2 5 2 14 2 2 6 3 1 7

The classes of categories that can be setup: Class Frequency 0-2 20 3-5 14 6-8 15 9-11 2 12-14 1

Census

The collection of data from every member of a population. Although it is rare, due to the time and expense involved, to collect data from every member of a "population," when and if we do, this collection of data is called a "Census." Example: Once every decade the U.S. Government conducts a census of its citizens, attempting to collect data from everyone that lives in the United States.

Population

The complete collection (every member) of the things we are studying. Example: If we are studying Grizzly Bears, then the "population" is every Grizzly Bear everywhere. DO NOT confuse "population" with a "sample." We rarely have data on every member of a population, so we often use "statistics" (methods of analysis) to analyze a "sample" in order to understand things (make inferences) about the population. For example, we might study 12 grizzly bears (a "sample") in order to make inferences about all grizzly bears (the "population")

Steps in designing an experiment

a) Identify your objective and identify the relevant population b) Collect (representative) sample data from your test group and placebo group c) Use a random procedure in selecting subjects for your treatment and placebo group to avoid bias d) Analyze the data and form conclusions

parameter

population goes to ____________ as __________ goes to population

statistic

sample is to _________ as __________ is to sample

Interval Data

Like ordinal data but with the additional property that the difference between any two data values is meaningful (evenly spaced). However, there is no natural zero starting point at which there is zero quantity. Example: Calendar years. The difference between the year 2000 and the year 1980 (a difference of 10 years in each case). We can add and subtract years. However, we CANNOT SAY that the year 2000 is two times (or twice as much time) as the year 1000. Also, the year 0 does not represent the starting point of time. Fahrenheit temperature is another example of an interval scale because 100 degrees F is NOT twice as hot as 50 degrees F, and 0 degrees F does NOT represent the absence of all heat.

Standard Deviation

Measure of the average amount the data varies from the sample mean, with an adjustment for the size of the sample. Approximately the average deviation from the mean

Levels of Measurement

Nominal Ordinal Interval Ratio

Quantitative Data

Numbers that represent counts or measurements. If you can express the data as a number then it is usually (not always) quantitative data Example: income levels, ages, weights, and lengths can all be expressed as meaningful numbers. These are all examples of quantitative data. Gender, opinions, and relationships cannot be expressed as numbers and ARE NOT quantitative data. Even some numbers, such as zip codes or phone numbers ARE NOT quantitative data because you cannot mathematically manipulate them (e.g., add or subtract them)

Cans of Coke are opened and the volume measured a) Observation Study b) Experiment Study

Observation

Why Statistics? #1 Practical Reason

People will try to sell you all kinds of things using statistics, from vitamins to investments to political agendas. If you do not have a working knowledge of statistics you are fodder for the merciless.

Types of Data

Quantitative Data & Qualitative Data

Discrete data Continuous data

Quantitative Data can be subdivided into two groups:

The self-selected survey (voluntary response sample)

Respondents themselves decide whether to be included For example, you get an email asking you to respond to a survey on something (often in order to dave the world for evil)

Distorted percentages

For example: if an airline was losing 80% of the luggage they processed and they improved this to "only" losing 40%, they might claim something like a 100% improvement in baggage handling. This may sound very impressive and make you think they are doing a great job. Yet, in fact, if they are still loosing 40% of passenger's luggage they are actually doing a very poor job. Don't fall for misleading statistics, especially when someone is trying to sell you something.

Frequencies or counts

Get these by going back to the data list and counting how many zeros, ones and twos we had (20 of them), etc. Notice: if we add up all the frequencies (20+14+15+2+1) we get 52, which is exactly how much data we had.

Qualitative (or categorical or attribute) data

Can be separated into different categories that are distinguished by some nonnumeric characteristics (e.g., male/female)

Round-off Rule for Measures of Center

Carry one more decimal place than is present in the original set of values.

An ABC News reporter polls people as they pass him/her on the street

Convenience

1. Observation 2. Experiments

Data comes from one of two sources:

Ordinal data

Data that may be arranged in some order, but the precise differences between values either cannot be determined or are meaningless Examples: Poor/Average/Good/Excellent Letter Grades (A,B,C,D,F) Subcompact, Compact, Mid-size, and Full-size Automobiles

Nominal Data

Data the consists of names, labels, or categories Cannot be arranged in any meaningful order You cannot say that one value is bigger, better, or greater than any other value Examples: Gender (male/female) Party affiliation (Democrat/Republican/Independent) Zip codes

Partial Pictures (not the whole story)

Example: An overseas automaker make the (true) claim that "90% of all our cars sold in the USA in the last 10 years are still on the road." This claim is designed, of course, to make you think they have really good quality cars. And, in fact, there stated claim is true. However, what they don't tell you is that they have only been selling cars in the USA for the last 3 years. Watch out for misleading statistics, especially when someone is trying to see you something.

A new drug for treating insomnia is tested by recording its effects on students

Experiment

The effectiveness of multimedia teaching is tested using a sample of students who complete a course of study using the multimedia approach

Experiment

Data

Factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation

Using precise numbers that are not accurate

For example, if I gave you a number like 1257391.546 you might assume it was quite accurate because it seems so precise (not rounded). Yet, in fact, this number may have been generated from very poor/inaccurate data. Don't be fooled, investigate the actual data.

Method of Analysis

Have to do with: planning experiments, collecting data, organizing & summarizing data, analyzing & presenting data, and interpreting & drawing conclusions from data Example: Linear regression is one type of statistical analysis that is used to examine the relationship between things (variables)

Surveys with confusing or misleading questions

Here is an example of two different ways to ask the "same" question and how the phrasing of the question can alter the response. Is it really the same question being asked? a) Should the president have the line item veto? (response was 57% yes) b) Should the president have the item veto to eliminate waste? (response was 97% yes)

The Newport Chronicle, a newspaper in New England reported that pregnant mothers can increase their chances of having healthy babies by eating lobsters. That claim is based on a study showing that babies born to lobster-eating mothers have fewer health problems than babies born to mothers who don't eat lobster.

Hint: In statistics we can "prove" a relationship between two things, in this case, healthy babies and lobster-eating mothers, but that DOES NOT mean that one causes the other. In fact, this study did show a statistical relationship, but as we will learn later in the course, that relationship does NOT IMPLY causality. Can you think of any reasons why lobster-eating mothers might have healthier-than-average babies besides the fact that they eat lobsters? One theory that might explain this relationship is that lobster is quite expensive and therefore, those that eat lobster are probably well-off and can probably afford the best health care. This might be a better explanation of the results than to suggest that eating lobsters is good prenatal care. Perhaps you can think of other reasons for these results.

In a study of College campus crimes committed by students high on alcohol or drugs, a mail survey of 1875 students was conducted. A USA Today article noted, "8% of the students responding anonymously say they've committed a campus crime, and 62% of that group say they did so under the influence of alcohol or drugs."

Hint: They never told us the actual number of students who responded to the survey. By telling us they sent it to 1875 students it makes it sound like they had a large sample, but in fact, they never said how many responded. What if only 5 or 6 students responded - would that be representative of the population of all college students? Also, they use a percentage of a percentage (62% of 8% = 5%), but it kind of fools you, at first, into thinking that 62% of college students committed a crime while under the influence, while in fact only about 5% of those responding to the survey said they committed a crime while under the influence. This actually appeared in USA Today, yet it is quite misleading. Perhaps you can find even more things wrong with this study (e.g., what constitutes a "crime"? Does it include parking tickets?).

A survey includes this item: "Enter your height in inches _______" What might be some of the problems in asking this question?

Hint: You can probably think of a number of reasons yourself. One problem might be that people usually think of their height in terms of feet and inches (e.g., 5' 10") and may have trouble figuring out how many inches that is. Also, many people tend to exaggerate their heights, so if you were really looking for accurate information, you might want to actually measure people rather than ask them how tall they are.

Skewed

If data is not symmetrical and causes the mean, median, and mode to not be equal.

No Mode

If no value occurs more than once there is

Multimodal

If three or more values (but not all values) occur with exactly the same frequency, but more than any other values, then we say the data is

Bimodal

If two values occur with exactly the same frequency, but more than any other values, then we say the data is

Sample Error

If we had a sample of 30 DSC students and took the average GPA for all 30 students, and then compared it to the average GPA for our population (all DSC students), we would surely find at least some small difference. This exists no matter how well we do our study.

Non-sampling Error

Incorrectly collecting data, making a mistake recording the data, doing a poor job analyzing the data, etc. It's controllable and can be avoided by understanding how to do statistics and then doing it right.

Qualitative Data

Information that can be put into categories or distinguished by some nonnumeric characteristic. Examples: Gender (male/female) Age categories (not individual ages, but age brackets) Party affiliation (democrat/republican/independent) Zip codes Social Security numbers

Sampling

Is cost-effective and convenient, but there is a problem we have to live with. That is; there is always some difference between the sample result and true population result.

Outliers

Sample values that are a lot smaller or larger than most of the other sample values. A value located very far away from almost all of the other values. An extreme value. They can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is obscured.

The Gallup Organization plans to conduct a poll of NYC residents living within the "212" area code. Computers are used to randomly generate telephone numbers that are automatically called

Simple Random Sample

Stratified Sampling

Start by dividing the population into at least two subgroups or strata (example: divide people into male and female groups/strata). After we divide the population into subgroups we randomly select individual members from each group. Next, draw a random sample from each of the subgroups (example: randomly select 30 men and 30 women).

Cluster Sampling

Start by dividing the population into sections or clusters. We randomly select groups/clusters and then include every member of those selected groups in our sample. Next, randomly select some of those clusters. Finally, include in your sample all members of the clusters you selected. Example: I want to take a poll to understand how people in St. George are going to vote in the next election. Using the Cluster Sampling technique, I divide up all of St. George into neighborhood blocks. Next, I randomly select some number of blocks, like say 10 blocks. Finally, I visit or call every single person or home that is in each of the blocks I selected.

Systematic Sampling

Start with a list of every member of the population. Next, randomly pick a starting point (e.g. close your eyes, open the list and randomly point to a name.) Finally, select every Kth member (e.g. like every 10th name or every 100th name or whatever.) Example: If you wanted to have a systematic sample of 100 people living in St. George, you might casually open the phone book, randomly point to a name, start there and then pick every 50th name until you get your sample of 100 people.

A GM researcher has partitioned all registered cars into categories of subcompact, compact, mid-size, intermediate, and full-size. She is surveying 200 randomly selected car owners from each category.

Stratified

The Washington County Commissioner of Jurors obtains a list of 42,763 car owners and constructs a pool of jurors by selecting every 100th name on that list.

Systematic

Class Width

The difference between two consecutive lower class limits (or two consecutive class boundaries). IT IS NOT the difference between the upper and lower class limits Class Frequency 0-2 20 3-0=3 3-5 14 6-3=3 6-8 15 9-6=3 9-11 2 12-9=3 12-14 1

Ratio Level of Measurement

The interval level modified to include the natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are meaningful. Example: Prices of college textbooks

Upper Class Limits

The largest numbers that can actually belong to the different classes or categories. Example: These would be 2,5,8,11,14 Class Frequency 0-2 20 3-5 14 6-8 15 9-11 2 12-14 1

Median

The middle value. When the data values are arranged in order from low-to-high (or high-to-low)

Distribution

The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed)

Class Boundaries

The numbers exactly half-way between the classes or categories (no gaps). Example: These would be: -0.5, 2.5, 5.5, 8.5, 11.5, 14.5 Class Frequency -0.5 0-2 20 2.5 3-5 14 5.5 6-8 15 8.5 9-11 2 11.5 12-14 1 14.5

Class Midpoints

The numbers exactly in the middle of each class or category Class Midpoints Frequency 0-2 1 20 3-5 4 14 6-8 7 15 9-11 10 2 12-14 13 1

Range

The simplest measure of variation Range=highest value-lowest value

Lower Class Limits

The smallest numbers that can actually belong to the different classes or categories. Example: These would be 0,3,6,9,12 Class Frequency 0-2 20 3-5 14 6-8 15 9-11 2 12-14 1

Midrange

The value midway between the highest and lowest values in the data set =highest score + lowest score/2

Mode

The value that occurs the most frequently

Blinding (or "blind" study) Double-Blind Study

There are a couple of common methods for controlling for "effects" including the placebo effect

Double-Blind Study

This is when both the subject and those administering the study (i.e., giving out the drug or placebo) do not know whether the treatment is real or just a placebo. Only the researchers who do the statistics at the end of the study know who got the real treatment and who got the placebo.

One common "effect" or interference that can happen when doing an experiment is called the "Placebo effect"

This is when the experiment itself causes some effect. In other words, the mere fact that someone or something knows they are part of an experiment alters there normal behavior and thus gives us inaccurate results. If you saw the 2nd Jurassic Park movie you might remember the comment about the "Heisenberg principle," which, when applied to experiments suggests that it is impossible to study something without having some effect on what you are studying. This is another example of a sort of "placebo effect."

Blinding (or "blind" study)

This is when the subjects do not know whether they are getting the real "treatment" or just the placebo (like a sugar pill).

Random Sampling

To have a smaller set of subjects that we study in order to understand the larger populations.

Deliberate Distortions or Outright lies

Unfortunately, there are people willing to tell outright lies and use statistics to make those lies appear legitimate. If you are interested in learning how to better detect these untruths, here are some references you can check out. Tainted Truth by Cynthia Crossen How to Lie with Statistics by Darrell Huff The Figure Finaglers by Robert Reichard

Human Behavior

Useful for understanding marketing, business, consumer behavior, social science, psychology, and politics.

Nature

Useful for understanding natural phenomenon and animal behavior.

Measures of Center

When describing data, it is one of the simplest and most meaningful things we can know. 1. Mean (average) 2. Median 3. Mode 4. Midrange

Controlling for Effects

When doing an experiment we must be careful to avoid any interference from things outside of what we are studying. This basically means that the ONLY thing we want to be different between our test group and our control group is the treatment itself. We don't want anything (physical or psychological) to interfere. These "effects" or unwanted interference can be controlled through good experimental design

Discrete Data

When the number of possible values is 'countable' or finite Example: The number of eggs a chicken lays is "discrete". You get 1,2,3 or more eggs. The number of eggs is always finite. You can never get an in-between number of eggs, like 1.23654 eggs.

Continuous Data

When the number of possible values is infinite. Scales that cover a range of values, without gaps, produce continuous data. Example: a thermometer is an example of a scale without gaps that covers a range of temperatures. There can be an infinite number of temperatures between say 0 degrees and 120 degrees Fahrenheit. This is because you can have any number of in-between temperatures such as 101.54658 degrees.

Random Sample

When we select our sample from members of the population in such a way that each has an equal chance of being selected. Give us the best representation of a population. Therefore, only random samples are acceptable when doing good studies. The letter "n" is used to represent the number of subjects in a sample. If you have a sample that was NOT randomly selected then your data is unacceptable and totally useless

Classes of Categories

You decide what these classes are, but you should normally have an odd number of categories (5 or 7 is good). Notice how each category is the same size (difference between low-end and high-end is always 2 in this case). Also notice that none of the categories overlaps any other category- they are all separate.


संबंधित स्टडी सेट्स

Pharmacology Study Guide ***Exam 2*** Memorize

View Set

A&P - Chapter 22 Amphibian and Reptilian Anatomy and Physiology

View Set

Chapter 13: Marketing - Helping Buyers Buy

View Set

Mark Twain and Regionalism Study Guide

View Set

Textiles Final - Properties of Fibers

View Set

Ecosystems: Energy and Nutrient Flow Module

View Set