Stats Exam 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Sometimes, incorrect statistical conclusions are due to incorrect arithmetic. This happens particularly often when calculating what two things?

Arithmetic mistakes happen particularly often when calculating rates and percentages.

Bar charts and pie charts are most useful for what kind of variables?

Bar charts and pie charts are most useful for categorical variables.

What chart makes it easier for us to compare categories? What are some advantages of this type of chart?

Bar graphs make it easiest for us to compare categories. These are easy to draw, there is a natural way to order categories with a bar graph, and we can visually compare different categories, even those not positionally next to each other.

What are some important things to know about how the variables in a study were measured?

When looking at a study, it is important to know how the variables were defined, if the variable is a valid way to measure the property it claims to measure, and how accurate the measurements are.

What three things do we look for when studying line graphs?

When studying a line graph, we look for overall patterns/trends, deviations from that pattern, and seasonal variation.

One way to determine whether or not we can trust the numbers reported by a study is to check them against _____ sources to determine if they are ______.

data in reliable sources, plausible

In the U.S. Census Bureau document America's Families and Living Arrangements: 2011, we find these data on the marital status of American women aged 15 years and older as of 2011: 1. How many women were not married in 2011? 2. Would it also be correct to use a pie chart?

1. The number of women who were not married is 60,031. 2. Yes, it would also be appropriate to use a pie chart. We are displaying the distribution of a categorical distribution and the categories shown together represent the whole (they include all possible marital statuses, so it is not leaving out any categories).

8. Make sure you are able to read a pie chart. For example, given the pie chart on slide 7, you should be able to answer questions such as:

1. What percent of people aged 25 and older have less than a high school education? 13.3% of people aged 25 and older have less than a high school education. 2. The majority of people aged 25 and older have what level of education? The majority of people aged 25 and older have a high school education as their highest level of education.

One way to describe our data is to draw a density curve. What is a density curve and what is the general idea behind how to find the density curve for a set of data?

A density curve is a way of describing the overall pattern of a distribution with a smooth curve. In general, a density curve is created by drawing a smooth curve through the tops of the bars of a histogram, making sure to draw it such that the area under the curve is exactly 1.

What type of graph do we use to show how quantitative variables change over time?

A line graph shows how quantitative variables change over time.

What does it mean if a measure is reliable?

A measurement is reliable if its random error is small.

What does it mean if a measurement has predictive validity?

A measurement of a property has predictive validity if it can be used to predict success on tasks that are related to the property measured.

What is more valid, a rate or a count? Why?

A rate is generally more valid than a count. A rate lets us see how things are changing/compare values in different situations. If you were a doctor and wanted to know whether or not there was a flu outbreak, it doesn't help much to know there are 300 flu cases. You want to know what percent of people are sick with the flu. If there are 300 cases of the flu in the entire country, you wouldn't be that worried, but if there were 300 cases of the flu at TAMU, you may be much more concerned.

What does a standard score do? What is another name for a standard score? How do we calculate a standard score?

A standard score expresses an observation in terms of the number of standard deviations it is above or below the mean. Standard scores are also called z-score. We calculate a standard score by using this formula: (observation - mean)/standard deviation.

What does it mean if a variable is a valid measurement?

A variable is a valid measure of a property if it is relevant and appropriate as a representation of that property.

Eleanor flips a coin 6 times and gets HTHTTH. Brittany flips a coin 6 times and gets HHHTTT. Alec flips a coin 6 times and gets HHHHHH. Which of these outcomes was most likely to happen? Which of these outcomes was least likely to happen?

All of these outcomes have the same probability.

What does it mean if something has a probability of zero? Give an example of something that may have a probability of zero?

An outcome with a probability of 0 never occurs. An example of something with a probability of 0 is rolling a 6 sided die and getting a 7.

What does it mean if something has a probability of one? Give an example of something that may have a probability of one?

An outcome with a probability of 1 always occurs. An example of something with a probability of 1 is rolling a 6 sided die and getting a number between 1 and 6.

What does the area under a density curve represent?

The area under the density curve represent proportions of the total number of observations.

Pie charts and bar charts show us the distribution of a categorical variable. What charts can we use to show the distribution of numeric variables?

The chart we use to show the distribution of numeric variables is a histogram.

What is the distribution of a variable?

The distribution of a variable tells us what values it takes and how often it takes those values.

What are the four probability rules? Which two of these rules must we check to make sure we have a valid probability distribution?

The first probability rule is that any probability is a number between 0 and 1. The second probability rule is that all possible outcomes together must have a probability of 1. The third probability rule is that the probability that an event does not occur is 1 minus the probability that the event does occur. The fourth probability rule is that if two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. In order to check that we have a valid probability distribution, we must check the first two rules (all probabilities between 0 and 1 and that the sum of all the probabilities is 1).

When was the first time randomness was studied? When did we begin studying probability theory?

The first time randomness was studied was int he 17th century when gamblers in France wanted to know how they should bet. We began studying probability theory in the 17th century as well.

What number make up the five-number summary? What is the graphical representation of the five number summary called?

The five number summary is made up of the minimum, first quartile, median, third quartile, and maximum. We graphically represent this with a boxplot.

What are the 4 steps for exploring data with a single, quantitative variable?

The four steps for exploring data with a single, quantitative variable are: 1) plot your data; 2) look for overall patterns and striking deviation; 3) choose a numeric summary (five-number or mean/standard deviation) to describe the data; and 4) describing the overall pattern with a smooth curve.

What is the idea behind standard deviation? How do you calculate standard deviation?

The idea of the standard deviation is to give the average distance of the observations from the mean. It is calculated by taking the square root of the variance. That means that you find the distance each observation is from the mean, square each of these distances, add all of the distances up, divide by n-1, and take the square root.

A professor wants to measure how well students understood the topics given in lecture by using an in class quiz. Identify the instrument, unit of measurement, and variable.

The instrument is the quiz. The unit of measurement is points. The variable is score on the quiz in points.

The value we measure is actually made up of three parts. What are these three parts?

The measured value is made up of the true value, the bias, and the random error.

Using a density curve, how do we find the median and the quartiles?

The median is the point with half of the observations on either side. In a density curve, this is the point where half of the area lies to the right of it and half of the area lies to the left of it. The quartiles are found by determining the points that divide the area under the curve into quarters. The first quartile is the point where 25% of the area is to the left of it and the third quartile is the point where 75% of the area is to the left of it.

What is the most common numerical way to describe a distribution?

The most common numerical way to describe a distribution is the combination of the mean and the standard deviation.

A survey of college freshmen in 2007 asked what field they planned to study. Of those surveyed, 12.8%, were arts and humanities majors, 17.7% were business majors, 9.2% were education majors, 19.3% were engineering, biological sciences, or physical sciences majors, 14.5% were professional majors, and 11.1% were social science majors. Given the data presented (using no additionally categories) what type of chart (bar chart or pie chart) is appropriate to use for this data? What could we add to this data to make it appropriate to use either type of chart?

Because the categories given do not include all possible categories, it would not be appropriate to use a pie chart of this data as presented. It would only be appropriate to use a bar chart. If we added a category for other majors, then it would be appropriate to use either type of chart.

Who sold the most CDs on the Web just before listeners began to download their favorites rather than buy CDs? The pictogram below graphs the shares of the market leaders in 1997 and says what percentage of all online sales they had. Does the graph fairly represent the data? Explain your answer.

Because this is a pictogram, the graph doesn't fairly represent the data. The overall difference in area between the pictures is harder for the human eye to see and it does not do a good job of representing the data.

What do we know about chance behavior in the short run? What do we know about chance behavior in the long run?

Chance behavior is unpredictable in the short run, so we don't know anything about chance behavior in the short run. Chance behavior is regular and predictable in the long run.

Does correlation take into account the difference between the explanatory and the response variable? Will the correlation change if we switch the two?

Correlation does not take into account the difference between the explanatory and the response variable. It will not change if we switch the two.

Why do we use data tables?

Data tables are used to summarize large amounts of information. We use them to show us what is going on with the data overall, instead of what is going on with each individual.

When making classes for histograms, we need to make sure they are exclusive and exhaustive. What does this mean?

Exclusive means that there should be no overlap between groups (one individual can't be placed into multiple groups). Exclusive means that there is a place for every data point; every individual falls into a group.

Using a density curve, how do we find the mean?

Finding the mean with the density curve is slightly harder to find just by looking at it. The mean of the density curve is the balancing point: the point at which the curve would balance if it were made of a solid material.

For each of the following situations, do you expect the standard score to be greater than zero, equal to zero, or less than zero? 1. Theobservedvalueisthesameasthemean.Standardscoreequalszero. 2. Theobservedvalueislessthanthemean.Standardscoreislessthanzero. 3. Theobservedvalueisgreaterthanthemean.Standardscoreisgreaterthan zero.

For each of the following situations, do you expect the standard score to be greater than zero, equal to zero, or less than zero? 1. Theobservedvalueisthesameasthemean.Standardscoreequalszero. 2. Theobservedvalueislessthanthemean.Standardscoreislessthanzero. 3. Theobservedvalueisgreaterthanthemean.Standardscoreisgreaterthan zero.

If numbers are too consistent, this may lead us to suspect what?

If numbers are too consistent, we may suspect fraud.

What does it mean if the standard deviation is 0? What values can the standard deviation never be? What does it mean if one set of numbers has a larger standard deviation than another set of numbers?

If the standard deviation is 0, that means there is no spread and all the observations have the same value. Standard deviations can only be positive numbers (they must be greater than or equal to 0). If one set of number has a larger standard deviation than another, that means its values are more spread out.

If we change the mean of a normal distribution, what happens? If we change the standard deviation of a normal distribution, what happens?

If we change the mean of a normal distribution we change its location. If we change the standard deviation of a normal distribution we change its shape.

How do we draw a boxplot?

In a boxplot, a center box spans the quartiles. A line drawn across this box marks the median. Lines extend from the box out to the smallest and largest observations (the minimum and the maximum).

In each of the following situations, what type of chart or graph would be most appropriate: 1. Graphing the number of students at Texas A&M each year from 1950 to 2016? 2. You collect data from 8 different countries (United States, Iceland, Sweden, Canada, Greenland, Switzerland, Finland, and Denmark) about what percent of people in each country speak at least two languages fluently. bar graph 3. You do a survey of students at A&M and ask students whether or not they want the library to be open for longer hours. 50% say yes, 43% say no, and 7% are undecided.

In each of the following situations, what type of chart or graph would be most appropriate: 1. LINE GRAPH 2. bar graph 3. pie chart or bar graph

When we make a histogram, what do we do? What are the steps to making a histogram?

In order to make a histogram, we have to group nearby variables together to make the histogram easy to read. First we must divide the range of the data into classes or groups of equal width, then we count the number of individuals in each class/group, and finally we draw the histogram.

When making a data table, is it better to present data as counts(for example,the number of people in a category) or as rates (for example, the percent of people in a category)?

It is better to present the data as rates, because they are more informative to someone reading the data table.

What is more reliable, taking one measurement or taking the average of several measurements?

It is more reliable to take the average of several measurements.

Phil wins the lottery in 2008. He wins the lottery again in 2013. Is it unlikely that Phil won the lottery twice? Is it unlikely that someone won the lottery twice?

It is unlikely that Phil won the lottery twice, but it is not unlikely that someone won the lottery twice.

In order to use an association for predictive reasons, do we need to know that one variable causes the change in the other (is causation necessary)?

No, causation is not necessary for us to be able to use an association for predictive reasons.

With normal distributions, do we expect there to be many outliers?

No, we do not expect there to be outliers with normal distributions.

When defining classes, can the classes be of unequal widths?

No,theclasses can't be of unequal widths, because this changes how the graph is interpreted (our eyes respond to the areas of the bars in a histogram).

In a histogram, should there be any space between the class bars?

No,when drawing a histogram, there shouldn't be any space between the class bars.

What are three terms we use to describe normal curves (or normal distributions)

Normal distributions are symmetric, single-peaked, and bell-shaped.

What type of error do we commonly see in data tables?

One error we commonly see in data tables is roundoff error.

Give an example in which you would rely on a probability found as a long-term proportion from data on many trials. Give an example in which you would rely on your own personal probability.

One time you would rely on a probability found as a long-term proportion from data on many trials is if you wanted to know the probability of rolling a die and getting a 1. One time you would rely on your own personal probability is if you wanted to know the probability of you personally getting in a car crash.

The figure below is a line graph of the average cost of imported oranges each month from July 1995 to April 2012. These data are the price in U.S. dollars per metric ton. Looking at the graph, what is the overall pattern/trend in the prices of oranges? Are there any striking deviations from this overall pattern? Is there any seasonal variation? If you do believe there is seasonal variation, describe the kind of seasonal variation in the graph.

Overall, there appears to be an overall increasing trend in the prices of oranges over time. There appears to be a deviation from this pattern in the first five or so years, when the overall price seems to remain fairly constant. There does appear to be seasonal variation. The prices of the oranges appear to increase and then decrease within any given year, which most likely is related to the growing season of oranges.

A pictogram is a variation of a bar chart. Why is it generally a bad idea to use a pictogram?

Pictograms are misleading. Bar graphs are a better idea because in a bar graph all of the bars are the same width, which means when a person is reading it, they only have to compare the height of the different bars. However, in a pictogram, both the heights and the widths of the picture are different for each category, which makes it difficult for people to see the true difference between the categories.

What are some of the disadvantages of using a pie chart? What about pie charts makes them hard for people to visually read?

Pie charts are hard to draw by hand, have no natural way to order them and can be hard to compare the sizes of different categories. It is harder for us to visually compare angles (which is how pie charts are drawn) than lengths, so it is hard for people to visually read a pie chart. One way to help make a pie chart easier to read is to add the percentages falling into each category next to the wedge representing that category.

What are pie charts used to show? What do the wedges within the pie chart represent?

Pie charts are used to show how a whole is divided into parts. The entire circle represents the whole and the wedges within the pie chart represent the parts. The size of the pie chart represents what portion of the whole fall into that category.

Review chapter 13

Q 6 and 7 and 8 and 14 and 15 and 18 and 22

What does it mean if something is random?

Randomisaworldtodescribeevents that are unpredictable in the short run, but have a pattern in the long run. We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions.

What study design is the best for establishing causation? What can we do if this study type is not ethical or feasible?

Randomized comparative experiments are the best for establishing causation. If a randomized comparative experiment is not ethical or feasible, we have some criteria we can use to try and establish causation. This includes strong association, consistent association, a dose response (higher doses are associated with higher responses), the cause precedes the effect (cause happens before the effect), and the cause is plausible (some sort of biologic or scientific reason that it makes sense).

The Wechsler Adult Intelligence Scale (WAIS) is an IQ test. Scores on the WAIS for the 20 to 34 age group are approximately Normally distributed with mean 110 and standard deviation 15. Scores for the 60 to 64 age group are approximately Normally distributed with mean 90 and standard deviation 15. Sarah, who is 30, scores 130 on the WAIS. Her mother, who is 60, takes the test and scores 110. Express both scores as standard scores that show where each woman stands within her own age group. Who scored higher relative to her age group, Sarah or her mother?

Sarah's standard score is 1.3. Her mother's standard score is 1.3. Because their standard scores are the same, they performed the same relative to their age group (neither one scored higher relative to their age group).

What is seasonal variation? What is seasonal adjustment? What is an example of a time when you would need to use seasonal adjustment?

Seasonal variation is a pattern that repeats itself at known regular intervals of time. Seasonally adjustment is when the expected seasonal variation is removed before the data is published. Examples where you would need to use seasonal adjustment is graphing unemployment rates or prices of gasoline over time.

You know that your true height is 59 inches. You measure yourself using a tape measure three times, getting measurements of 60 inches, 63 inches, and 61.5 inches. Are these measurements biased? What is the variance of these measurements?

Since all of these measurements are above the true value, it seems like they may be biased (they are systematically different from the true value, in the same direction). The variance of these values is 2.25.

What is the 68-95-99.7 rule?

The 68-95-99.7 rule states that in any normal distribution, approximately 68% of the observations fall within one standard deviation of the mean, approximately 95% of the observations fall within two standard deviations of the mean, and approximately 99.7% of the observations fall within three standard deviations of the mean.

What is the most common type of regression line? How is this line drawn?

The most common regression line is the least-squares regression line. This line is drawn by minimizing the sums of the squared vertical distances from the line to the actual observed values.

What is a percentile? What percentile is the median? What percentile is the first quartile? What percentile is the third quartile?

The nth percentile of a distribution is a value such that c percent of the observations lie below it and the rest lie above. The median is the 50th percentile. The first quartile is the 25th percentile. The third quartile is the 75th percentile.

The temperature on Monday was 35 degrees. The temperature on Tuesday was 50 degrees. What is the percent change?

The percent change is 15/35 * 100, or a 42.86% increase.

What is the probability of an outcome happening?

The probability of any outcome of a random phenomenon is a number between 0 and 1 that describes the proportion of times the outcome would occur in a very long series of repetitions.

What does standard deviation measure? You should only use standard deviation when you use what to measure the center of a distribution?

The standard deviation measures spread around the mean. You should only use the standard deviation when you use the mean to measure the center of a distribution.

What are three possible explanations for an association between two variables (see slide 15)

The three possible explanations for an association between two variables are causation, common response, and confounding.

What are the three principals for making good graphs?

The three principals for making good graphs are making sure the graph has labels and legends (tell what variables are plotted, their units, and the source of the data), making sure the data stands out (do not use unnecessary grids or background art and make sure the placement of the labels doesn't interfere with reading the data), and paying attention to what people will see when they read the graph (be careful with scales and don't use pictograms or 3D effects that will confuse the reader).

What three things are necessary for a clear data table?

The three things that are necessary for a clear data table are a main heading giving the subject and the date of the data, labels within the table to identify the variables and the units they are measure in, and the source of the data.

What are the first and third quartiles? How do we calculate them?

Thefirstand third quartiles are the midpoints of each half. They divide the data in quarters. Like when finding the median, you start by arranging the observations in order from smallest to largest. The first quartile is the median of the observations that are to the left of the overall median. The overall median is not included in these numbers. The third quartiles is the median of the observations that are to the right of the overall median. Again, the overall median is not included in these numbers.

Three students take a quiz. Their scores are 9, 8, and 7.5. What are each of their standard scores?

Their standard scores are 1.1, -0.2, and -0.9, respectively.

In order to determine whether or not we believe the numbers posted by a study or experiment, we need to know what? (What are the things we should check for when deciding whether or not to believe reported numbers?)

There are a number of things we should check before believing reported numbers. We should check if there are missing parts of the story, whether or not the numbers are consistent and plausible, whether or not the numbers are too good to be true, if the arithmetic is correct, and whether or not the people who did the study have a hidden agenda.

Why do some people worry about risks that almost never occur, but ignore other risks that are much more plausible? Give an example of this occurring.

There are many reasons why some people worry about risks that almost never occur, but ignore more plausible risks. One reason is that we feel safer about risks we can control. Another is that humans are bad at comprehending small probabilities, so we tend to overestimate small risks and underestimate larger risks. Another reason is that sometimes these probabilities are determined from complex studies, which people find harder to trust. An example of this is that very few people would leave a sleeping infant home alone for ten minutes while they went to run errands, even though the risk of a car crash is higher than the risks the child would face sleeping at home.

How many classes should a histogram have? What happens if there are too many or too few classes?

There is no one correct way to determine how many classes a histogram should have. A good general rule is to use between 10 and 20 classes. Too few classes will give a "skyscraper" histogram, with all the values in a few classes with tall bars. Too many classes will give a "pancake" histogram, with most classes having one or no observations.

When choosing how to numerically describe a distribution, what is the first thing you should do?

When deciding how to numerically describe a distribution, the first thing you should do is start with a graph of your data.

Suppose you had data regarding the GPA's of all the students at Texas A&M University. Would this data be better displayed as a histogram or a stem and leaf plot?

This data would be better displayed as a histogram. There would be too many data points for a stem and leaf plot.

How do we calculate the percentage change?

To calculate the percentage change, we divide the amount of change by the starting value and then multiply by 100.

How do you make a stemplot? In a stemplot, what is the stem and what is the leaf?

To make a stemplot, you must separate each observation into a stem and a leaf. Then you write the stems in a vertical column with the smallest at the top and draw a vertical line at the right of this column. Finally, you write each leaf int he row to the right of its stem, in increasing order out from the stem. The stem consists of all but the final (rightmost digit) and the leaf is the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.

We can completely describe a normal distribution using what two things?

We can completely describe a normal distribution with the mean and the standard deviation.

What three things do we use to describe the overall pattern of a scatterplot?

We describe the overall pattern of a scatterplot by describing its form, direction, and strength.

What graph do we use to show the relationship between two quantitative variables? How do you make this graph? In this graph, what does each point represent?

We use a scatterplot to show the relationship between two quantitative variables. To make this graph, values of one variable are plotted on the horizontal axis and value of the other variable are plotted on the vertical axis. Each point on a scatterplot represents one individual in the data and their observed values for both of the variables.

When creating a line graph of how price of gas changes over time, what would you plot on the X (horizontal) and Y (vertical) axes?

When creating a line graph, you put time on the X (horizontal) axis and the variable you are measuring not he Y (vertical) axis.

You decide to study the average temperature in Chicago each month for many years. Do you expect a line graph of the data to show seasonal variation? Describe the kind of seasonal variation you expect to see.

Yes, you would expect a line graph to show seasonal variation. You would expect the average temperatures to be lowest in the winter and highest in the summer, so you expect to see average temperatures to increase during the first half of the year and decrease during the second half of the year.

How can you reduce the bias of a measurement?

You can reduce the bias of a measurement by using a better instrument.

How can you reduce the variability of a measurement?

You can reduce the variability of a measurement by taking the average of several measurements.

When should you pay special attention to ensure that the numbers reported are correct?

You should pay special attention to numbers if you believe the person publishing them had some sort of hidden agenda. Numbers can be manipulated to show what we want them to.

For each of the following sets of numbers, calculate the five-number summary and draw a boxplot (on the exam I will not ask you to draw a boxplot, since it is multiple choice, but you may be required to pick the correct boxplot for a set of values: 1. 4,12,7,19,10,7,5,8,12,13,21,5 2. 3,6,18,7,13,6,12,18,9,14 3. 9,4,13,18,15,7,12,11,9,8,14 4. 7,3,13,11,8,10,2,4,8

minimum=4, first quartile=6, median= 9, third quartile = 12.5, maximum = 21 minimum=3,first quartile=6,median=10.5, third quartile = 14, maximum = 18 minimum=4,firstquartile=8,median=11, third quartile = 14, maximum = 18 minimum=2,firstquartile=3.5,median=8,third quartile = 10.5, maximum = 13

Look @ chapter 10

question 21 and slides 16 & 17 FROM CH 10 POWERPOINT

When should you use the mean/standard deviation to describe a distribution? When should you use the five-number summary to describe a distribution?

use the mean/standard deviation to describe a distribution when the distribution is reasonably symmetric and there are no outliers. You should use the five-number summary to describe a distribution when the distribution is skewed or has outliers.


Ensembles d'études connexes

Tetracyclines, Aminoglycosides, and others CAQs pharm

View Set

questions assessment of muscular skeletal

View Set

Chapter 6. Entrepreneurship and Starting a Small Business

View Set

Google Ads - Measurement Certification

View Set

US HISTORY Nov 16 test review UNIT 5

View Set

A.D. Banker and Company Chapter 4 Life Policy Provisions and Options

View Set

MH Chapter 21 - Child, Partner, and Elder Violence, Mental Health Nursing Chapter 26: Children and Adolescents, Chapter 23 - Suicide (Thoughts and Behaviors), Chapter 22 - Sexual Violence, Varcarolis Ch. 24: Anger, Aggression, and Violence

View Set