Stats Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Ex: the distribution is right skewed

the mean is greater than the median

Ex: the distribution is left skewed

the mean is less than the median

Ex: the distribution is symmetric

the mean is the same as the median

what does standard deviation measure?

the spread around the mean

what percentile is the median

50th

one way to describe our data is to draw a density curve. what is a density curve and what is the general idea behind how to find the density curve for a set of data

A density curve is a way of describing the overall pattern of a distribution with a smooth curve. In general, a density curve is created by drawing a smooth curve through the tops of the bars of a histogram, making sure to draw it such that the area under the curve is exactly 1

what do standard scores do

A standard score expresses an observation in terms of the number of standard deviations it is above or below the mean.

how do you chose the classes for a stemplot?

The classes of a stemplot are the stems. You don't choose the classes, they are given to you. However, you can adjust the stems slightly by rounding the data differently. This is typically done when the data have too many digits.

what do the wedges within the pie chart represent?

The entire circle represents the whole and the wedges within the pie chart represent the parts. The size of the pie chart represents what portion of the whole fall into that category.

what are the first and third quartiles and how do we calculate them?

The first and third quartiles are the midpoints of each half. They divide the data in quarters. Like when finding the median, you start by arranging the observations in order from smallest to largest. The first quartile is the median of the observations that are to the left of the overall median. The overall median is not included in these numbers. The third quartiles is the median of the observations that are to the right of the overall median. Again, the overall median is not included in these numbers.

what number makes up the five number summary

The five number summary is made up of the minimum, first quartile, median, third quartile, and maximum

Advantage of a stemplot

The main advantage of a templet is that you don't lose the data: you still have access to each exact data value. They are also faster to draw than histograms and easier to make because we don't have to make a choice about the classes

Disadvantage of a stemplot

The main disadvantage is that it doesn't work well with large data sets because then there are too many leaves on each stem.

what are the three terms we use to describe normal curves (or normal distributions)?

they are -symmetric -single peaked -bell shaped

what is the idea behind standard deviation?

to give the average distance of the observations from the mean

what is the purpose of making graphs

to help us understand the data.

what are pie charts used to show?

to show how a whole is divided into parts

What three things are necessary for a clear data table?

*main HEADING giving the subject and the date of the data *LABELS within the table to identify the VARIABLES and the UNITS they are measure in, *and the SOURCE of the data.

When making classes for histograms, we need to make sure they are exclusive and exhaustive. What does this mean?

-Exclusive means that there should be no overlap between groups (one individual can't be placed into multiple groups). -Exhaustive means that there is a place for every data point; every individual falls into a group.

What are the steps to making a histogram?

-First we must divide the range of the data into classes or groups of equal width, -then we count the number of individuals in each class/group, and finally we draw the histogram.

A pictogram is a variation of a bar chart. Why is it generally a bad idea to use a pictogram

-Pictograms are misleading. -Bar graphs are a better idea because in a bar graph all of the bars are the same width, which means when a person is reading it, they only have to compare the height of the different bars. -However, in a pictogram, both the heights and the widths of the picture are different for each category, which makes it difficult for people to see the true difference between the categories.

using a density curve, how do we find the median and the quartiles?

-The median is the point with half of the observations on either side. In a density curve, this is the point where half of the area lies to the right of it and half of the area lies to the left of it. -The quartiles are found by determining the points that divide the area under the curve into quarters. -The first quartile is the point where 25% of the area is to the left of it and the third quartile is the point where 75% of the area is to the left of it.

when creating a line graph of how prices of gas change over time, what would you plot on the X and Y axis?

-Time on X axis -The variables you are measuring on the Y axis

What happens if there are too many or too few classes?

-Too few classes will give a "skyscraper" histogram, with all the values in a few classes with tall bars. -Too many classes will give a "pancake" histogram, with most classes having one or no observations.

What three things do we use to describe the overall pattern of a graph/distribution?

-center -spread -shape

what are some advantages of a bar graph

-easy to draw -there is a natural way to order categories with a bar graph -we can visually compare different categories, even those not positionally next to each other.

What is an example of a time when you would need to use seasonal adjustment?

-graphing unemployment rates -prices of gasoline over time

What are the three principals for making good graphs?

-making sure the graph has labels and legends (tell what variables are plotted, their units, and the source of the data) -data stands out (do not use unnecessary grids or background art and make sure the placement of the labels doesn't interfere with reading the data) -paying attention to what people will see when they read the graph (be careful with scales and don't use pictograms or 3D effects that will confuse the reader).

what things should we look for when making graphs of data?

-overall patterns and any deviations from those patterns (which could be a sign of an outlier)

What three things do we look for when studying line graphs?

-overall patterns/trends -deviations from that pattern -seasonal variation

what is the leaf

-the final digit -each leaf contains only a single digit

You decide to study the average temperature in Chicago each month for many years. Do you expect a line graph of the data to show seasonal variation? Describe the kind of seasonal variation you expect to see

-use a line graph to show seasonal variation -temps lowest in the winter and the highs in the summer -you can expect the avg temps to increase during the first half of the year and decrease the second

What are the four steps for exploring data with a single, quantitative variable?

1) plot your data; 2) look for overall patterns and striking deviation; 3) choose a numeric summary (five-number or mean/standard deviation) to describe the data; and 4) describing the overall pattern with a smooth curve.

why do we use data tables?

Data tables are used to summarize large amounts of information. We use them to show us what is going on with the data overall, instead of what is going on with each individual.

using a density curve, how do we find the mean?

Finding the mean with the density curve is slightly harder to find just by looking at it. The mean of the density curve is the balancing point: the point at which the curve would balance if it were made of a solid material.

Histograms on the exam

I won't ask you to draw a histogram on the exam, but given a set of data, you may be asked to determine which histogram is correct.

what does it mean if one set of numbers has a larger standard deviation than another set of numbers?

If one set of number has a larger standard deviation than another, that means its values are more spread out.

How do we draw a boxplot

In a boxplot, a center box spans the quartiles. A line drawn across this box marks the median. Lines extend from the box out to the smallest and largest observations (the minimum and the maximum).

how do you calculate standard deviation?

It is calculated by taking the square root of the variance. That means that you find the distance each observation is from the mean, square each of these distances, add all of the distances up, divide by n-1, and take the square root.

In a histogram, should there be any space between the class bars?

No

when defining classes, can the classes be of unequal widths?

No because this changes how the graph is interpreted (our eyes respond to the areas of the bars in a histogram)

what are some of the disadvantages of using a pie chart and what makes them hard for people to visually read?

Pie charts are hard to draw by hand, have no natural way to order them and can be hard to compare the sizes of different categories. It is harder for us to visually compare angles (which is how pie charts are drawn) than lengths, so it is hard for people to visually read a pie chart. One way to help make a pie chart easier to read is to add the percentages falling into each category next to the wedge representing that category.

what is the median

The median represents the midpoint of the data: it is the point that divides the data in half because half of the data points are below the median and half of the data points are above the median.

what values can the standard deviation never be?

Standard deviations can only be positive numbers (they must be greater than or equal to 0). NEVER negative

how do you interpret stemplots

Stemplots are simply histograms rotated by 90 degrees. You can interpret a stemplot the same way you interpret a histogram: look for patterns, outliers, and describe the center, shape, and spread.

what is the 68-95-99.7 rule?

The 68-95-99.7 rule states that in any normal distribution, approximately 68% of the observations fall within one standard deviation of the mean, approximately 95% of the observations fall within two standard deviations of the mean, and approximately 99.7% of the observations fall within three standard deviations of the mean.

what is a percentile

The nth percentile of a distribution is a value such that c percent of the observations lie below it and the rest lie above

what is the stem

The stem consists of all but the final (rightmost digit) -have as many digits as needed

How many classes should a histogram have?

There is no one correct way to determine how many classes a histogram should have. A good general rule is to use between 10 and 20 classes

Suppose you had data regarding the GPA's of all the students at Texas A&M University. Would this data be better displayed as a histogram or a stem and leaf plot?

This data would be better displayed as a histogram. There would be too many data points for a stem and leaf plot.

how do we calculate the median

To find the median, first arrange all observations in order of size from smallest to largest. After they are all in order, find the midpoint of the data values. If there are an odd number of observations, the median is the center observation (the (n+1)/2 observation). If there are an even number of observations, the median is the average of the two center observations.

how do you make a stemplot

To make a stemplot, you must separate each observation into a stem and a leaf. Then you write the stems in a vertical column with the smallest at the top and draw a vertical line at the right of this column. Finally, you write each leaf int he row to the right of its stem, in increasing order out from the stem.

how do we calculate standard score

We calculate a standard score by using this formula: (observation - mean)/standard deviation.

how do we define the center of a distribution?

We define the center of the distribution as its midpoint: the point int he graph where roughly half of the observations are smaller and roughly half are larger.

when looking at the spread of a distribution, what do we do about outliers?

We do not include outliers when looking at the spread of the distribution. We define the spread of a distribution by giving the smallest and largest values, ignoring any outliers.

bar charts and pie charts are most useful for what kind of variables?

categorical variables

what do we do when we see an outlier?

When we see an outlier, we should look for an explanation. Is there some reason we would expect this value to be an outlier? Is there a chance it is a mistake? -Once we determine it is an outlier, we should make note that it exists, but not include it when discussing the overall pattern of the data.

when looking at the graph "Marital status of women as of 2011," would it also be correct to use a pie chart? -never married -married -widowed -divorced

Yes, it would also be appropriate to use a pie chart. We are displaying the distribution of a categorical distribution and the categories shown together represent the whole (they include all possible marital statuses, so it is not leaving out any categories).

What is seasonal variation?

a pattern that repeats itself at known regular intervals of time

What kind of graph "You collect data from 8 different countries(UnitedStates,Iceland,Sweden, Canada, Greenland, Switzerland, Finland, and Denmark) about what percent of people in each country speak at least two languages fluently."

bar graph

what chart makes it easier for us to compare categories?

bar graph

look at slide 16 and 17 Chapter 10

better to plot the percentage increase from the previous period

what is the graphical representation of the five number summary called?

boxplot

how do you calculate a mean

by adding up all the values and dividing by n.

Pie charts and bar charts show us the distribution of a categorical variable. What charts can we use to show the distribution of NUMERIC variables?

histograms

what does it mean if the standard deviation is 0

it means that there is no spread and all of the observations have the same value

what kind of graph for Graphing the number of students at TexasA&M each year from 1950to2016?

line graph

what type of graph do we use to show how quantitative variables change over time?

line graph

symmetric but not normal

looks like two conjoined mountains

what two things can we use to completely describe a normal distribution?

mean standard deviation

Skewed to the left

means that the left side of the histogram extends out much further than the right (tail to the left).

Skewed to the right

means that the right side of the histogram extends out much further than the left (sometimes we call this a tail to the right)

Symmetric

means the right and left sides of the histogram are approximately mirror images of each other.

do we expect outliers with normal distributions

no

What kind of graph for "You do a survey of students at A&M and ask students whether or not they want the library to be open for longer hours. 50% say yes, 43% say no, and 7% are undecided"

pie chart or bar graph

When making a data table, is it better to present data as counts (like the number of people in a category) or as rates (the percent of people in a category)

rates, because they are more informative to someone reading the data table. *kind of gives a summary

what does the area under a density curve represent?

represents proportions of the total number of observations

what type of error do we commonly see in data tables?

roundoff error

For each of the following situations, do you expect the standard score to be greater than zero, equal to zero, or less than zero? 1. The observed value is the same as the mean

standard score equals zero

For each of the following situations, do you expect the standard score to be greater than zero, equal to zero, or less than zero? 1. The observed value is greater than the mean

standard score is greater than zero

For each of the following situations, do you expect the standard score to be greater than zero, equal to zero, or less than zero? 1. The observed value is less than the mean

standard score is less than zero

What three words do we use to describe the shape of a distribution.

symmetric (roughly symmetric/ approximately symmetric), skewed to the right, or skewed to the left.

what is a mean

the arithmetic average of a set of observations.

what is the most common numerical way to describe a distribution?

the combination of the mean and the standard deviation.

When choosing how to numerically describe a distribution, what is the first thing you should do?

the first thing you do is start with a graph of your data

If we change the mean of a normal distribution, what happens?

the location changes

You should only use standard deviation when you use what to measure the center of a distribution?

the mean

If we change the standard deviation of a normal distribution, what happens?

we change the shape

When we make a histogram, what do we do?

we have to group nearby variables together to make the histogram easy to read

When should you use the mean/standard deviation to describe a distribution?

when the distribution is reasonably symmetric and there are no outliers

when should you use the five number summary to describe a distribution

when the distribution is skewed or has outliers

What is seasonal adjustment?

when the expected seasonal variation is removed before the data is published

what is another name for standard score

z-score

seasonal variation no trend

zig zag up and down (looks like giant cones or mountains)


Conjuntos de estudio relacionados

Managerial Accounting, Midterm 3, Professor Johnson

View Set

NUR 236 PrepU Chapter 31: Health Supervision

View Set

6.1 Evaluate Theories of Cognitive Development

View Set

Chapter 12 - Business models - NPI

View Set