Unit 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The mean cost for a pint of strawberries is $1.80, with a standard deviation of $0.40. What is the value of the variance?

(0.40) squared = 0.16

Using the number of minutes per call in last month's cell phone bill, David calculated that upper quartile to be 19 minutes and the lower quartile to be 12 minutes. Any value greater than ______ minutes is an outlier

19-12 = 7. 19 + 1.5(7)=29.5

Rick is an engineer testing the stress required to break samples of steel. He measured the failure stress of 50 samples and found the mean failure stress to be 350 mpa, with a SD of 25 mpa. If the distribution is normal, the percentage of the data that lies within two SD's of the mean is approx.

95%. The normal distribution follows the empirical ruse, which tells us that within 2 SD's we should find 95% of the data.

Stack Plot

A bar graph or line chart that is subdivided into it's components so that the comparisons as well as the totals can be seen. (Totals together help to find the best graph)

Five Number Summary

A brief overview of a data set consisting of the minimum, the first quartile, the median, the third quartile and the maximum.

Dotplot

A distribution in which each data value is represented by a dot above that value on an axis.

Frequency Polygon

A distribution of data that shows both a histogram and it's line chart on the same set of axes.

Histogram

A distribution of data that shows the frequency of different ranges of values. Each frequency is the height of a bar.

Bar Graph

A distribution of qualitative data that displays bars that are proportional in length to the frequency or relative frequency of a particular data value.

Pie Chart / Circle Graph

A distribution of qualitative data that shows the relative frequency of each category as a sector of a circle.

Stem and Leaf Plot / Stemplot

A distribution of quantitative data that shows natural numerical breaks in the data as categories called stems and individual values as leaves.

Line Chart

A distribution of quantitative data that shows the frequency of different intervals of data. The frequencies are indicated by heights of dots which are connected to each other.

Multiple Line Charts

A distribution that shows more than one data sets values in line charts. This advantageous because it's clearer than trying to compare multiple histograms on the same set of axes.

Sampling Distribution of Sample Means

A distribution that shows the means from all possible samples of a given size. Example: Sampling distribution where you graph all the possible means for samples of size 4, you take the sample, calculate the mean and plot it with a dot. Then you take another sample, calculate the mean again and plot that and so froth.

Uniform Distribution

A distribution where all values are equally likely. All flat and had no peaks.

Unimodal/Single Peaked Distribution

A distribution where one value or bin contains more data than the other values or bins.

Skewed Left (Negatively Skewed) Distribution

A distribution where the majority of values are high and there are only a few low values that form a tail to the left of the median.

Skewed Right (Positively Skewed) Distributions

A distribution where the majority of values are low and there are only a few high values that form a tail to the right of the median.

Skewed Distribution

A distribution where the majority of values are on one side of the distribution and there are only a few values on the other.

Symmetric Distribution

A distribution where the mean and median are the same. It will appear to have a mirror line at the median of the distribution.

Multimodal Distribution

A distribution where there are many values or bins that contain more data than other nearby bins, usually separated by gaps.

Bimodal Distribution

A distribution where there are two distinct values or bins that contain more data than the others, usually separated by a gap.

Misleading Graphic

A graph meant to mislead a reader or make a reader feel or believe a certain way.

Pictograph

A graphic display that uses picture of physical objects rather than dots or bars to indicate relative size of numbers.

Modified Boxplot

A graphical display showing a modified version of the five number summary. If a distribution has outliers, then the whiskers only extend to the highest and lowest points that are not outliers.

Time Series Diagram

A graphical display that shows the values a variable takes over time.

Boxplot/Box and Whisker Plot

A graphical distribution of the 5 number summary. The box in the middle contains the middle 50% of the values and the whiskers extend out to the maximum and minimum values from the quartiles.

Stack Line Chart

A line chart where the lines represent cumulative amounts rather than individual amounts. These are typically done with different colored ribbons to make it clearer that we are talking about totals.

Variable

A measurable factor, characteristic, or attribute of an individual or a system. Example: Interested in the variable of height for a group of people. It could vary from person to person because people have different heights.

Standard Normal Distribution

A normal distribution of z-scores. The mean is zero, and the standard deviation is 1.

Summation Notation

A notation that uses the greek letter sigma to state that values should be added together.

Outlier

A point that is so large or small as to be unusual, given the rest of the data points.

68-95-99.7 Rule

A rule that applies to normal distributions, stating that 68% of all data points fall within one standard deviation of the mean, 95% of all data points fall within two standard deviations of the mean and 99.7% of all data points fall within 3 standard deviations of the mean.

Normal Distribution/Gaussian Distribution / Bell Curvebimo

A single Peaked, symmetric distribution that follows a specific bell shaped pattern. Mean and median will be the same as the mode. Not all centered at the same place nor all spread out. We need to know the mean and standard deviation in order to completely describe a normal distribution. A large portion of the data is located near the center in a normal distribution. FALSE: The normal distribution is an example of a bimodal distribution

Frequency Table

A table showing the values of the data, and their respective frequencies

Standard Normal Table/ Z-Table

A table that calculates the percent of values below a particular z score.

Central Limit Theorem

A theorem that explains the shape of a sampling distribution of sample means. It states that if the sample size is large (generally n>30) and the standard deviation of the population is finite, then the distribution of sample means will be approximately normal.

Standard Deviation

A typical amount by which we would expect a data point to differ from the mean. Typically about half to two thirds of the data points fall within one standard deviation of the mean. In Excel" =STDEV(highlight areas)

Standard Scores/ Z Scores

A value that explains how many standard deviations away from the mean an observation is. It can be positive (if the value is above the mean) or negative (if the value is below the mean.) One to One comparisons between a couple of different distributions.

Weighted Mean/Average

A way of calculating a mean when not all the values count for the same amount. Each value should be multiplied by it's weight and added together, then divide the sum by the sum of weights.

Weighted Mean/Average

A way of calculating a mean when not all the values count for the same amount. Each value should be multiplied by its weight and added together, then divided the sum by the sum of the weights.

Distribution

A way to visually display the values a variable takes and how often it takes each value. Each distribution has its own situation for which it is ideal. The data will determine which distribution is best to use. Examples: Frequency tables, Qualitative Data, Quantitative data, Mathematical rules.

Pancake Effect

Bins that are too narrow can create the pancake effect, displaying too many bins with almost nothing in them.

Skyscraper Effect

Bins that are too wide can create the skyscraper effect. Too few bins and lots of data in them, you don't get an accurate sense of what the shape of the distribution looks like.

Mean daily sales for the first month was $200, with SD of $30. On the 15th the shop sold $245. The mean daily sales for the second month was $220, with SD of $50. On the 15th the shop sold $270. Which month had a higher z-score for sales on the 15th, and what is the value of the z-score?

First month = 245-200, divided by 30 = 1.5 Second month = 270-220, divided by 50 = 1. First month had a higher score with 1.5.

Frequency

How often a data value, or range of values occurs.

Interquartile Range Formula

IQR = Q3-Q1

1.5 x IQR Rule

If a point is larger than Q3 + 1.5xlQR or smaller than Q1 - 1.5xlQR, then it is an outlier.

Ogive

Increasing from left to right. If there is no data in a particular bucket, you get a flat line or no increase.

Range Forumla

Maximum value - minimum value

Measure of Center

Mean : More versatile measure of center

Binning

Method of deciding what widths of categories should be used on a histogram

Data Set

Not just a list of numbers or values, there is some context associated with it. Usually units or what type of measurement is used. Perhaps some kind of descriptor.

Outliers

Points in a data set that are so high or so low as to be unusual, given the rest of the values.

Percentile

Relative Cumulative Frequency; the amount of data points at or below a particular value. Measures what percent of data points fall in a bin or below that bin.

Stem and leaf

Show the data in stem and leaf form. So the stem of 7 implies that this is in the tens, so 70 seconds. So the 1 and 2 mean 70 + 1 or 71 seconds and 70 +2 = 72 seconds. This means we can say 2 children took a bit over 70 seconds to complete the cube.

At a nearby frozen yogurt shop, the mean cost of a pint of frozen yogurt is $1.50 with a standard deviation of $0.10. Assuming the data is normally distributed, approximately what percent of customers are willing to pay between $1.20 and $1.80 for a pint of yogurt.

Since we know that the mean is 1.50 and the standard deviation is 0.10, we can note that both 1.20 and 1.80 are 3 standard deviations (3*0.10=0.30) away from 1.50. Assuming the data is normal, it should contain 99.7% of the data.

Measure of Spread

Standard Deviation : Because the mean is going to be your measure of center, use the standard deviation as the measure of spread.

Measures of Variation/Spread

Statistical measures that indicate how close values are to the center of the distribution. For every measure of variation, a large number indicates the data are very spread out, and a small number indicates the values are very close together. Examples: A high value means that the data set is not consistent, that it's more spread out. A low value indicates that the values are not very spread out, that they're tightly clustered together.

Weighted Mean Formula

Sum of (Weight . value) divided by the sum of weights.

Pie Chart: Relative Frequency

Take each number and divide it by the total number. Multiply each percent by 360.

Bar Graphs : Relative Frequency

The Percent of the values that are in each category. Take each number and divide it by the total number.

Mean of a Distribution of Sample Means

The average of all possible means from all possible samples of a given size. It will be equal to the mean of the original problem.

Mean

The average value of a data set. It is obtained by dividing the sum of the values by the number of values in the set. In the presence of outliers, the mean won't give an accurate representation of center. Use Median in cases like this.

Median Class

The bin that contains the median value. This is the most precise measurement we can obtain when we are looking at data that have already been categorized.

Sample Distribution of Yogurt Sales

The central limit theorem can be applied as long as the sample size is large enough and there are enough of those samples taken. If you have these criteria met, then the sampling distribution will tend toward normality.

Range

The difference between the largest and smallest number in a data set.

Interquartile Range

The difference between the third and first quartiles. It represents the range in which the middle 50% of the data points lie. Is better than the standard deviation to describe the skewed data sets.

Measure of Center x2

The mean and median can be used to summarize any quantitative data. the dataset if finite, there is always a defined mean and median. There is not always a mode.

Let x stand for the number of minutes spent at the mall. 100 people are sampled at a time. For the sampling distribution, the mean is 46 minutes and the standard deviation is 0.4 minutes. What is the mean and standard deviation of the population?

The mean of the sampling distribution is equal to the population mean, which is 46. The standard deviation of the sampling distribution is equal to the population standard deviation divided by the square root of the same size. 0.4 = population sd divided by the square root of 100. = 4

Center

The middle of the data set. There are many measures of center.

Mode

The most frequently appearing number in a set of quantitative data or more frequently occurring category in a set of qualitative data.

First/Lower Quartile

The number at which approx 25% of the data set falls at or below that value.

Second Quartile/Middle Quartile/Median

The number at which approx 50% of the data set falls at or below that value.

Third/Upper Quartile

The number at which approximately 75% of the data set falls at or below that value.

Cumulative Frequency

The number of data points that fall within or below a given bin of data. (How many times it happened)

Spread

The numerical description of how close the numbers are to the center.

Relative Cumulative Frequency

The percent of data points that fall within or below a given bin of data. Calculated the same was as relative frequency.

Relative Frequency

The perfect of the data points that take a particular value. this is obtained by dividing the frequency of each value by the total number of data points.

Shape

The qualitative description of the clustering of data points in a certain location when the data are graphed.

Outcomes

The singular result of a chance experiment.

Variance

The square of standard deviation. While it has some uses in statistics, it is not a practical unit of measurement. It is calculated the same way as standard deviation, but without the square root.

Stand Deviation of a Distribution of Sample Means

The standard deviation of all possible means from all possible samples of a given size. It will be equal to the standard deviation of the original population, divided by the square root of the sample size.

Data Analysis

The understanding of the key features of a set of data. Shape, center, spread and outliers.

Median

The value that is in the middle of a data set when the set is arranged from least to greatest.

Quartiles

The values that divide the data set into four equal partitions.

Scales

The way an axis on a graph is measured. Inappropriate scaling can lead to misleading graph.

Back to Back Stem and Leaf Plot

Two stem and leaf plots on the same set of stems. This allows us to compare the distributions of two different categories.

Perceptual Distortion

Using area or three-dimensional visual tricks to make certain values appear bigger or smaller than they are.

Mean Formula

X1 + X2 + X3 + ... + Xn , divided by n

Z Score Forumla

Z - Score for Sample = x-x(line over it) divided by S. Z- Score for Population = X-U divided by O. X = raw score, mu = mean, standard deviation is O. Raw score - mean , divided by the standard deviation.

Event

an outcome or set of outcomes


Ensembles d'études connexes

Global Issues: Achieving Sustainable Development

View Set

AG - Chapter 17.3 -Elections & Voting - Section 3 - Influences on Voters

View Set

Ohio Pre-Licensing Insurance Quiz Questions

View Set

Advanced Cell Biology Chapter 2 Oxford Insight

View Set