First 45

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

So, on the test he will ask to find the five number summary: in the following order: xmin, QL,M,QU,xmax

...

The intercept of a regression line tells a person the predicted mean y-value when the x-value is _______.

0

The complement of "at least one" is ___.

"none."

Value of P(A_)

1-P(A)

If E represents any event and Ec represents the complement of E, then P(Ec)=__________.

1-P(E)

Volume of water in a swimming pool..

Continuous because it is not countable

RATIO

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Ages of Children: 4, 5, 6, 7 and 8

Class

Each raw data value is placed into a quantitative or qualitative category called a class

THEORETCAL

IT IS BASED ON A PREDICTABLE OUTCOME

Define the Mode.

Is the value that occurs with the greatest frequency.

A small p-value does what?

It discredits the null hypothesis.

A ____________________ is a bar graph in which the bars are drawn in decreasing order of frequency or relative frequency.

Pareto chart

n

Sample size

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean.

The empirical rule

Listed below are the jersey numbers of 1111 players randomly selected from the roster of a championship sports team. What do the results tell​ us?

The jersey numbers are nominal data and they do not measure or count​ anything, so the resulting statistics are meaningless.

Upper Class Limits

The largest numbers that can belling to the different classes.

Under what conditions is the median preferred?

The median is preferred when the data is strongly skewed or has outliers.

Standardized Score

The number of standard deviations that a piece of data lies above or below the mean. Z = (X - μ) / σ

Is the length of a newborn baby discrete or continuous?

The random variable is continuous.

Is the length of a song discrete or continuous?

The random variable is continuous.

Σ is called and means

Uppercase sigma, and means the "sum of terms [xi]"

1. The same of paired (x,y) data is a random sample. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Remove outliers.

What are the requirements for a regression line?

In a symmetric and bell-shaped distribution, are the mean, median and mode the same?

Yes.

measure of center

a value at the center or middle of a data set: several way to determine the center; different definitions like mean, median, mode, and mid-range

The heights of the bars of a histogram correspond to

frequency values

σ2

population variance

The symbol for sample standard deviation is

s

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean

z-score.

Census

A census is the collection of data from every member of the population. It is not a measure of center.

Fill in the blank. A​ _______ histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

A relative frequency histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies.

POPULATION

ANY NUMBER FROM A PARAMETER IS A

Class Midpoints

Add the lower class limits and the Upper class limits and divide by 2 (ex 60+69=129 then divide by two and it will equal 64.5

Changing the width of bins in a histogram _______.

Changes the shape of the histogram

Determine whether the given value is from a discrete or continuous set "The height of 2-year-old maple tree is 28.3 ft."

Continuous

Statistic

Describes characteristics of a sample

Determine whether the value is from a discrete or continuous data: Number of cars owned is 7

Discrete

A(n) _______ is any collection of outcomes from a probability experiment.

Event

The histogram to the right represents the weights​ (in pounds) of members of a certain​ high-school math team. How many team members are included in the​ histogram?

In the histogram​ above, the​ y-axis is ticked for every​ 1-person increase in​ frequency, so the number of team members in each class is given by the height of the bar in ticks.

x

Is the variable usually used to represent the individual data values.

DESCRIPTIVE STATISTICS

Methods used that summarize or describe characteristics of data are called​?

What is the term for a group of objects or people to be studied? Estimator Sample Census Population

Population

A(n) __________ is a numerical measure of the outcome of a probability experiment.

Random variable

Which statement is NOT true regarding the mean?

The mean is always the best measure of center.

mode

The mode of a variable is the most frequent observation of the variable that occurs in the data set. *if no observation occurs more than twice then there is NO MODE

Data for two variables.

What is bivariate data?

When is a point influential?

When omitting the observation would result in a very different regression equation

sample artithmetic mean, x, (pronounced x bar)

computed using sample data, sample is a statistic

Quartiles

measures of location, denoted Q1, Q2, Q3, which divide a set of data into four groups with about 25% of the values in each group.

The measure of center that is the value that occurs with the greatest frequency is the

mode

A(n) - distribution has a "bell" shape.

normal

Frequency values

...

For a particular regression analysis, it is found that SST = 900.0 and SSE = 400.00. Calculate the coefficient of determination

0.555 (regression identity theorem SSE=SST-SSR)

Median

50% of data is above and 50% of data is below; resistant, little change

Coefficient of variation

CV= S/X *100

variance

DEALS WITH STANDARD DEVIATION.

Continuous

Many possible values

Which of the statements below is true concerning bar graphs?

The height of each bar represents the category's frequency or relative frequency.

Which measure of variation is very sensitive to extreme values?

range

3 important measures of vatiation

range, standard deviation, and variance.

Given a collection of paired sample data, the _________ y=b0 + b1x algebraically describes the relationship between the two variables, x and y.

regression equation

In a​ _______ distribution, the frequency of a class is replaced with a proportion or percent.

relative frequency​

n means

sample

You are given information about a straight line. Determine whether the line slopes upward, slopes downward, or is horizontal. The equation of the line is y = 10 - 12x.

slope is downward

z-score transformation

statistical technique that uses the mean and standard deviation to transform each raw score into a standard score

Fill in the blank. Class width is found by​ _______.

subtracting a lower class limit from the next consecutive lower class limit.

Class width is found by _____________

subtracting the lower class limit from the next consecutive lower class limit

(Σxi)/N means

sum of all x values / N - population

The mean measures..

the center of distribution

the higher the standard deviation

the more spaced out and dispersed the bell shape.

In a symmetric and​ bell-shaped distribution,

the​ mean, median, and mode are the same

Bars in a histogram_________?

touch

bimodal

two data values occur with the same greatest frequency

When a data is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a ______.

z-score

"At least one" is equivalent to ____.

"one or more"

The average score on an aptitude test is 80 with a standard deviation of 10. One person scored a 65. What is that person's z-score?

-1.5

Emperical rule is known as

68%-95%-99.7% rule

Fill in the blank. ​A(n) _______ uses line segments to connect points located directly above class midpoint values.

A(n) frequency polygon uses line segments to connect points located directly above class midpoint values.

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a​ _______.

Z-score

A data value is considered ___ if its z-score is less than -2 or greater than 2.

unusual

relative frequency polygon

variation of the basic frequency polygon. uses relative frequencies (proportions or percentages) for vertical scale. to compare two data sets, graph to relative frequency polygons on same axes

frequency distribution

helps us understand the nature of the distribution of a data set.

You roll a six-sided die 56 times and land on an ace (a one) 7 times. You want to test the hypothesis that the die does not come up with an ace one-sixth of the time. Determine the null hypothesis.

H0: p=1/6

Exists between two variables when the values of one variable are somehow associated with the values of the other variable.

Define correlation.

A​ _______ helps us understand the nature of the distribution of a data set.

Frequency Distribution

Determine which of the 4 levels of measure is the most appropriate: Years of elections: 1988, 1990, 1992, 1994, and 1996

Interval

Look at the #42 charts and answer the questions: Is there strong evidence suggesting that the data are not from a population having a normal distribution?

No, the distribution is not dramatically far from being a normal distribution with a "bell" shape, so there is not strong evidence against a normal distribution.

Is it OK to say "average" instead of mean?

No.

Determine whether the given description corresponds to an experiment or an observational study: A stock analyst selects a stock from a group of twenty for investment by choosing the stock with the greatest earnings per share reported for the last quarter.

Observational study

Identify the study as an observational study or a designed experiment. An educational researcher used school records to determine that, in one school district, 84% of children living in two-parent homes graduated high school while 75% of children living in single-parent homes graduated high school.

Observational study

Which of the following is always true? -For skewed data, the mode is farther out in the longer tail than the median. -The mean and median should be used to identify the shape of the distribution. -Data skewed to the right have a longer left tail than right tail. -In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

In a symmetric and bell-shaped distribution, the mean, median, and mode are the same

Identify which type of sampling is used: The name of each contestant is written on a separate card, the cards are placed in a bag, and three names are picked from the bag Simple Random Cluster Convenience Stratified Systematic

Simple Random

To determine customer opinion of their musical variety, Sony random selected 110 concerts during a certain week and surveys all concert goers. What type of sampling is this?

Cluster

Find the median for the given sample data. The salaries of ten randomly selected doctors are shown below. 150,000 143,000 165,000 238,000 215,000 129,000 139,000 723,000 217,000 166,000

$165,500

Z score rules

...

If the sample is collected without replacement, which of the following conditions regarding the population must be met to apply the Central Limit Theorem for Sample Proportions? 50 times bigger 5 times bigger 10 times bigger 100 times bigger

10 times bigger

Look at #23 chart and answer the question: The histogram represents - debate team members.

11

On a test, 74% of the questions are answered correctly. If 111 questions are correct, how many questions are on the test? 37 questions 67 questions 150 questions 82 questions

150 questions

How to know significant difference in coefficient of variation

5%

which percent of observations are expected to lie within 1 standard deviation of the mean?

68% >> (34% () 34%)

Suppose we have 10 exam scores for an introductory statistics course. The scores are 78, 98, 94, 89, 86, 77, 76, 80, 75, 68. The median score is

79

Determine whether the given value is from a discrete or continuous data set. When a car is randomly​ selected, it is found to have an engine with 6 cylinders an engine with 6 cylinders.

A discrete data set because there are a finite number of possible values.there are a finite number of possible values.

Define measure of center.

A value at the center or middle of a data set.

Which of the following is NOT a principle of​ probability

All events are equally likely in any probability procedure

No comma because the frequencies are roughly equal across the voltage classes.

Does the result appear to have a normal​ distribution? Why or why​ not?

​A(n) _______ uses line segments to connect points located directly above class midpoint values.

Frequency Polygon

You are receiving a large shipment of batteries and want to test their lifetimes. Explain why you would want to test a sample of batteries rather than the entire population. If you test all the batteries you cannot form any conclusions about the population. If you test all the batteries to failure you would have no batteries to sell. The percentage of defective batteries can change in the time it takes you to test all the batteries.

If you test all the batteries to failure you would have no batteries to sell.

Descriptive Statistics

Methods & tools that summarize or describe relevant characteristics of data.

descriptive statistics

Methods used that summarize or describe characteristics of data

Fill in the blank. ​_______ are sample values that lie very far away from the majority of the other sample values.

Outliers are sample values that lie very far away from the majority of the other sample values.

"Relative frequency" is the same as which of the following?

Proportion

Which of the following corresponds to the case when every sample of size n has the same chance of being​ chosen?

Simple Random Sample

Experiments used to produce empirical probabilities are called what?

Simulations

Why is it important to learn about bad​ graphs?

So that we can critically analyze a graph to determine whether it is misleading

kth percentile

The ... denoted, Pk , of a set of data is a value such that k percent of the observations are less than or equal to the value.

Census

The collection of data from every member of the population.

Computing GPA

The grading system assigns quality points to letter grades as follows: A= 4; B= 3; C= 2; D= 1; F= 0. - Use formula for Weighted Mean w= #s of credits x= replace letter grades with their corresponding quality points

In a relative frequency​ distribution, what should the relative frequencies add up​ to?

The relative frequencies should add up to 1

Lower Class Limits

The smallest numbers that can belong to the different classes.

properties of standard deviation

The units of the standard deviation are the same as the units of the original data, the standard deviation is a measure of variation of all data values from the mean, the value of the standard deviation is never negative

Look at #10 charts and answer the two questions: Based on the distribution, do the weights appear to be reported or actually measured? What can be said about the accuracy of the results?

The weights appear to be reported because there are disproportionately more 0s and 5s. They are likely not very accurate because they appear to be reported.

Close to 0.

What is the value of Σ(Zx*Zy) if the points follow no linear pattern?

NOT a property of the standard​ deviation

When comparing variation in samples with very different​ means, it is good practice to compare the two sample standard deviations.

What is the formula to determine the x-value from z-score?

X = mew + z times sigma (X = u + zo). (Mean plus (2 multiplied by standard deviation)

Look at #49 chart and answer the question: Does the graph distort the data? Why or why not?

Yes, because the graph incorrectly uses objects of volume to represent the data.

A value at the center or middle of a data set is a(n)

measure of center

OUTLIER

n modified​ boxplots, a data value is​ a(n) if it is above Q3plus ​(1.5)(IQR) or below Q1minus ​(1.5)(IQR)

A(n) ____ distribution has a "bell" shape.

normal

If the data points fall in a ____, the correlation is equal to zero.

random pattern

A ___ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.

scatterplot

A histogram aids in analyzing the - of the data.

shape of the distribution

LOOK AT REAL LINE WITH LOF, LIF, UIF, UOF NUMBERS, AND OUTLIERS IN SLIDE 87!!!

study for midterm

Class width is found by

subtracting a lower class limit from the next consecutive lower class limi

Class width

the difference between lower class limits 60-69 70-79 the width would be 10 80-89

It can be slope.

What else can the intercept (from the excel) be?

A range of values used to estimate a variable.

What is a prediction interval?

When the original data values are arranged in order of increasing (or decreasing) magnitude, the middle value is called the _________

median

The ___ for a procedure consists of all possible simple events or all outcomes that cannot be broken down further.

sample space

When determining whether there is a correlation between two variables, one should be a ______ to explore the data visually.

scatter-plot

Class width is found by​

subtracting a lower class limit from the next consecutive lower class limit

Quartile

Q1= 25th percentile; Q2= 50th percentile; Q3= 75th percentile

Inter-quartile range

Q3 minus Q1

1. Should not have any obvious pattern. 2. Should not become wider (or thinner) when viewed from left to right.

What is the criteria for a residual plot?

If the sum of the squares of the residuals is the smallest sum possible.

What is the least-squares property?

5 number summary

on a box plot, minimum, Q1, median, Q3, maximum

A magician claims he can cause a coin to come up heads more than 50% of the time. A coin is flipped 50 times, and 44 heads come up. Determine the null hypothesis.

p=0.50

A​ _______ is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

scatterplot

The symbol for population variance

sigma squaredσ2

The symbol for population standard deviation is

sigma- σ

A data value is considered​ _______ if its​ z-score is less than minus−2 or greater than 2.

significantly low or significantly high

the symbol for population standard deviation is

skyrimy o thing

the symbol of population variance is

skyrimy o thing^2

Standard deviation of a population

slightly different formula.

In the binomial probability formula, the variable x represents the ___.

the number of successes

We utilize statistical graphs

to look for features that reveal some useful or interesting characteristics of the data set

Which of the following is a common distortion that occurs in graphs? a. Using bars to represent the frequency of data values. b. Using points above the class midpoints at the heights of the class frequencies. c. Labeling both axes d. Using a two-dimensional object to represent data that are one-dimensional in nature.

d.

A - is a graph of each data value plotted as a point.

dotplot

When given the mean and SD...how to find if data is unusual?

find usual min and max and compare

Box-and-Whiskers Plot

graph representing information about the five-number summary and outliers for a given data set

dotplot

is a graph of each data value plotted as a point

Cluster sample

is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups

Side by Side Bar Graphs

is used when comparing data from two or more different data sets

For a distribution that is skewed​ right, the median is of the box.

left to the center

If the data set is skewed (left or right), and/or there are outliers, then

the best measure of the center: median the best measure of the dispersion is IQR/2= (Q3-Q1)/2

census

the collection of data from every member of the population. It is not a measure of center.

The standard deviation is used in conduction with the ______ to numerically describe distributions that are bell shaped

mean

Which of the following is NOT a value in the​ 5-number summary?

mean

The standard deviation is used in conjunction with the​ ______ to numerically describe distributions that are bell shaped. The​ ______ measures the center of the​ distribution, while the standard deviation measures the​ ______ of the distribution.

mean, mean, spread

Percentiles

measures of location, denoted by P_...which divide a set of data into 100 groups with about 1% of the values in each group. One type of quantiles or fractiles which partition data into groups with roughly the same number of values in each group. Measure of location.

The measure of center that is the value that occurs with the greatest frequency is the​ _______.

mode

the measure of center taht is the value that occurs with the greatest frequency is the _______?

mode

For a scatterplot, when the slope of the line in the plot is negative, the correlation is ____.

negative

What does w denote?

Denotes weights, which are assigned to different data values.

There is a positive linear correlation

If the scatterplot shows a distinct straight-line, or linear, pattern, what can we say? As the x-values increase, the corresponding y-values also increase. ex; r = .851

Look at #38 chart and question and answer the questions: Construct a scatterplot on the calculator. Does there appear to be a correlation between the president's height and his opponent's height?

No, there does not appear to be a correlation because there is no general pattern to the data.

Refer to the table summarizing service times​ (seconds) of dinners at a fast food restaurant. How many individuals are included in the​ summary? Is it possible to identify the exact values of all of the original service​ times?

No. The data values in each class could take on any value between the class​ limits, inclusive.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Favorite films

Nominal

μ

Represent the mean of all values in a population.

P(A) + P(mean of A) = 1 is one way to express the ____.

Rule of complementary events.

Statistics

The science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer question. In addition, statistics is about providing a measure of confidence in any conclusions

VARIANCE

The square of the standard deviation is called the​ __

What does Σx represent?

The sum of all data values. (All frequencies added together)

influential points.

What are outliers and special points called?

Weighted Mean

When different (x) data values are assigned different weights (w).

Methods used that summarize or describe characteristics of data are called______ statistics.

descriptive

A -- helps us understand the nature of the distribution of a data set.

frequency distribution

A ____ indicates the shape and nature of the distribution of a data set.

frequency distribution

A ________ helps us understand the nature of the distribution of a data set

frequency distribution

A(n) -- uses line segments to connect points located directly above class midpoint values.

frequency polygon

Coefficient of variation, or CV

it is for a set of nonegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean, and is given by the following: CV= (s/_x)*100% . Used when means are substantially different, or if the samples use different scales or measurement units. Round CV to one decimal place.

A ____ correlation means that if one variable gets bigger, the other variable tends to get smaller.

negative

Fill in the blank. ​A(n) _______ distribution has a​ "bell" shape.

normal

Outliers

numbers that lie far from all the other numbers(CVDOT)

A ___ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

relative frequency

Unbiased estimator

sample variance s squares of the population variance little sigma^2. Values of s^2 tend to target the value of sigma squared instead of systematically tending to overestimate or underestimate sigma squared.

What does the z-score number represent?

the number of standard deviations from the mean. Aka standardized scores.

no mode

when no data value is repeated

When a data value is converted to a standardized scale representing the number of standard deviation the data value lies from the mean, we call the new value a _________

z-score

The sum of the deviations about the mean always equals

zero

Qualitive

Categorical data

What is the most common trick to mislead readers of bar graphs?

Change the scale of the vertical axis so that it does not start at 0.

Identify the variablle as either continuous or discrete: The number of limbs on a randomly selected oak tree.

Discrete

The mean represents the typical value in a set of data for what type of distribution?

For distributions that are roughly symmetric

Round-off rules

For mean, median and midrange, carry one more decimal place than is present in the original set of values. For mode leave as is without rounding. Example: mean of 2, 3 and 5 is 3.3333333. Round to 3.3.

When events A and B are said to be independent, what does that mean?

Knowledge that event B occurred does not change the probability of event A occurring.

When analyzing two quantitative variables, what is the first thing that should be done?

Make a scatterplot.

Suppose a fair die is rolled ten times and the result is recorded each time. Does this constitute a binomial experiment? Why or why not?

No, because there are more than two outcomes for each trial.

Is this a property of the standard deviation? When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.

No, it is not a good practice to compare the two sample standard dev. in samples with very different means.

relationship between median, mean, and distribution shape... 1) Skewed left 2) Symmetric 3) Skewed right

1) mean < median 2) mean = median 3) mean > median

The computed mean and the actual mean are considered close if the difference is less than ____of the actual mean. Otherwise the means are said to be __________ different.

1. 5% 2. substantially.

How to Calculate Quartiles

1. Arrange data in ascending order 2. Determine Median (M)=Q2 3. Divide data set into halves: the observations below M and the observations above M The first quartile (Q1) is the median of the bottom half, and the third quartiles (Q3) is the median of the top half

A management survey for a company surveyed 235 employees. 44.7% of the employees surveyed were females. The number of males would be: 130 105 13 Unable to determine

130

The following frequency distribution analyzes the scores on a math test. scores: number of students 40-59: 2 60-75: 4 76-82: 6 83-94: 15 95-99: 5 Find the midpoint of of the class interval 40-59.

49.5

The following are speeds (mi/h) of cars measured with a radar gun. Determine the 5-number summary and boxplot for the data given below. 70, 70, 71, 72, 72, 73, 73, 75, 76, 76, 77, 78, 78, 79, 80 The 5-number summary is - - - - - Create a boxplot on the calculator

70, 72, 75, 78, 80

Below are 36 sorted ages of an acting award winner Find P90 using the method presented in the textbook. 18 19 20 22 25 27 27 33 38 41 42 43 46 51 53 54 55 56 57 58 62 63 65 69 70 71 72 72 74 74 74 76 80 80 80

76

UNUSUAL

A data value is considered __________ if its​ z-score is less than minus 2 or greater than 2

Since, in general, the longer a car is owned the more miles it travels one can say there is a _______ between age of a car and mileage.

A positive association

Z-Score

A z score​ (or standardized​ value) is the number of standard deviations that a given value x is above or below the mean. A negative z score corresponds to an x value less than the mean. A positive z score corresponds to an x value greater than the mean. The more negative the z​ score, the further the x value is below the mean. The more positive the z​ score, the further the x value is above the mean.

relative frequency

A​ ________________ __________________ histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

True hypothesis Which of the following describes the value of the z-test statistic that is likely to result? Explain your choice. (i) The z-test statistic will be close to 0. (ii) The z-test statistic will be far from 0.

Statement I is more accurate. The null hypothesis is true, so the test statistic is likely to be close to what the null hypothesis predicts.

An instructor at the College of Lake County is interested in the average number of days that CLC math students are absent from class during a semester. She selects a random sample of students from the college and for each measures the number of days that the student was absent. Her sample produces an average number of days absent of 3.5 days. This value is an example of a:

Statistic

Determine whether the given value is a statistic or a parameter. "A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is "

Statistic

The normal quantile plot shown to the right represents duration times​ (in seconds) of eruptions of a certain geyser from the accompanying data set. Examine the normal quantile plot and determine whether it depicts sample data from a population with a normal distribution.

The distribution is normal. The points are reasonably close to a straight line and do not show a systematic pattern that is not a straight dash line pattern.

Look at #51 chart and answer the questions: Compare the results.

The distribution of pulse rates for men is concentrated, centered around 60, whereas the distribution of pulse rates for women is more spread out, centered around 70.

If all the data values in a population are converted to z-scores, the distribution of z-scores will have what mean?

The mean of the z-scores will be zero.

Is the time required to download a file from the Internet discrete or continuous?

The random variable is continuous.

Is the number of bald eagles in the country discrete or continuous?

The random variable is discrete.

In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being​ linear?

The term linear refers to a straight​ line, and r measures how well a scatterplot fits a​ straight-line pattern.

Is a scatterplot of the (x,y) values after each of the y-coordinate values has been replaced by the residual value y-^y (where ^y ndeontes the predicted value of y). That is, a residul plot is a graph of the points (x,y-^y)

What is a residual plot?

Grouped Frequency Distribution

When the range of the data is large, the data must be grouped into classes that are more than one unit in width, in what is called a grouped frequency distribution.

dot plot

consists of a graph in which each data value is plotted as a point along a scale of values. dots representing equal values are stacked

A ___ random variable has infinitely many values associated with measurements.

continuous

A __________ random variable has infinitely many values which can be plotted on a number line in an uninterrupted fashion.

continuous

Find the sample variance and standard deviation. 23​, 11​, 5​, 9​, 10

do on calc

A national consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.30. The correlation between car weight and annual maintenance cost is 0.20. Which of the following statements are true? I. Heavier cars tend to be less reliable. II. Heavier cars tend to cost more to maintain. III. Car weight is related more strongly to reliability than to maintenance cost. a. I only b. II only c. III only d. I and II only e. I, II, and III

e.

In a television​ advertisement, a company called​ "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this​ claim, an independent​ company, called​ "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one​ month, each participant was weighed again. The percent of weight lost was recorded for each​ person, where negative values indicated a weight gain. What type of study was​ performed?

experiment

An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were ​$437​, ​$411​, ​$487​, and ​$248 . Compute the​ mean, median, and mode cost of repair.

mean: 395.75 median: 424

A value at the center or middle of a data set is a(n) ---

measure of center

A data value is considered ___ if its z-score is greater than or equal to -2, or less than or equal to 2.

ordinary

In a scatter-plot, a(n) _________ is a point lying far away from the other data points.

outlier

Assume that the proportion of people who live after suffering an aneurysm is 0.79. Suppose there is a new medicine that is used to increase the survival rate. Use the parameter p to represent the population proportion of people who survive after an aneurysm. For a hypothesis test of the medicine's effectiveness, researchers use a null hypothesis of p=0.79. What is the correct alternative hypothesis?

p>0.79

A ___ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.

random

_______ is used when subjects are assigned to different groups through a process of random selection

randomization

What measure of variation is sensitive to extreme values?

range

kth percentile

denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value. Percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. Ex. P1 divides the bottom 1% of the observations from the top 99%, P2 divides the bottom 2% of the observations from the top 98% and so on.

Method used that summarize or describe characteristics of data are called ______ statistics

descriptive

Methods used that summarize or describe characteristics of data are called - statistics.

descriptive

A _________ experiment allows the researcher to claim causation between an explanatory variable and a response variable

designed

Range is the

difference between the largest data value and the smallest

A ___ random variable has either a finite or a countable number of values.

discrete

A __________ random variable has either a finite or countable number of values.

discrete

________ result when the number of possible values is either a finite number or a 'countable' number

discrete data

Events that are ____ cannot occur at the same time.

disjoint (Disjoint events are mutually exclusive and cannot occur at the same time.)

Stratified sample

is obtained by dividing the population into homogeneous groups and random selecting individuals from each group

Resistant means

is the measure of central tendencies resitant to extreme values, does it alter the data significantly

The four levels of measurement that are commonly used for classifying data are ratio, _________, ________, and _________. interval, normal, ordinary nominal, ordinal, interval nominal, ordinal, categorical normal, ordinal, interval

nominal, ordinal, interval

The sample mean is a

statistic

Class width is found by​ _______.

subtracting a lower class limit from the next consecutive lower class limit

x with line above= weird Epison thing with x n

sum of all data values number of data values

Whenever a data value is less than the​ mean, _______.

the corresponding z-score is negative.

For data sets having a distribution that is approximately​ bell-shaped,_________ states that about 68% of all data values fall within one standard deviation from the mean

the empirical Rule

The Empirical Rule

the empirical rule can be used to determine the percentage of data that lie within k standard deviations of the mean. To help organize the empirical rule and make the analysis​ easier, draw a​ bell-shaped curve, as shown to the right. The line in the center of the curve represents the mean. The other lines are each​ 1, 2, and 3 standard deviations away from the mean.

percentiles

the percent of data below a point it is better to be in the 99th percentile because 99 percent of people are below you formula: P=B (# of data points below)/ T (total # of data points) * 100 ROUND NORMALLY

mean formula

the sum of all data values/ the number of data values

The bars in a histogram ___.

touch (without gaps)

Ogive

useful for determining the number of values below some particular value. it's a line graph that depicts cumulative frequencies. uses class boundaries along the horizontal scale (X) and cumulative frequencies along vertical scale (Y)

The square of the standard deviation is called the --

variance

population z-score

z = (x - µ) / σ

What is the formula for the z-score?

z = x value - mean or mew/ divided by standard deviation or sigma. The numerator X - mew is a *deviation score*. The denominator expresses deviation in standard deviation units.

The​ _______ represents the number of standard deviations an observation is from the mean.

z-score

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a_____.

z-score

When a data value is converted to a standardized scale representing the number of st. dev. the data value lies from the mean, we call the new value a __.

z-score.

Frequency Polygon

​A(n) __________________ ______________ uses line segments to connect points located directly above class midpoint values.

Which of the following is always​ true?

A. In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same.<-- Correct B.The mean and median should be used to identify the shape of the distribution. C.For skewed​ data, the mode is farther out in the longer tail than the median. D.Data skewed to the right have a longer left tail than right tail.

Identify the symbols used for each of the​ following: (a) sample standard​ deviation; (b) population standard​ deviation; (c) sample​ variance; (d) population variance.

A. The symbol for sample standard deviation is s. b. The symbol for population standard deviation is σ. c. The symbol for sample variance is s^2 d. The symbol for population variance is σ^2.

What is an influential point?

An influential point is a point that changes the regression equation by a large amount.

Which of the following is NOT a value in the​ 5-number summary?

Mean

Formula to Find the Mean

Mean= Sum of all data values/number of data values.

Mean

Most often called average. The measure of center found by adding the data values and dividing the total by the number of data values. Means drawn from the same population tend to vary less than other measures of center. Uses every data value. Disadvantage: just one outlier can change the value of the mean substantially. So it is NOT a RESISTANT measure of center.

Researchers collect data by interviewing athletes who have won Olympic gold medals from 1992 to 2016. Identify the type of study. Retrospective Cross-sectional Prospective None of these

Retrospective

Distribution Shape and Boxplot

Right Skewed: If the median is to the left of the center of the box, the right whisker is longer than the left one Symmetric: If the median is at or near the center of the box, the whiskers are of equal lengths Left Skewed: If the median is to the right of the center of the box, the left whisker is longer than the right one.

Rounding rule:

Round z-scores to 2 decimal places

The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.

S= RANGE/4

The _______ of a probability experiment is the collection of all possible outcomes.

Sample space

Days before a presidential election, an article based on a nationwide random sample of registered voters reported the following statistic, "52% (±3%) of registered voters will vote for Robert Smith." What is the "±3%" called?

The "±3%" is called the margin of error.

Look at the #43 charts and answer the question: Which graph is more effective in showing the relative importance of the causes of work-related deaths?

The Pareto chart is better because it more clearly draws attention to the main cause of work-related death.

Histogram

The histogram is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes.

Relative Frequency Graphs

The histogram, the frequency polygon, and the ogive shown previously were constructed by using frequencies in terms of the raw data. These distributions can be converted to distributions using proportions instead of raw data as frequencies. These types of graphs are called relative frequency graphs.

Determine which of the 4 levels of measurement is the most appropriate for the data below: Years in which a war was started

The interval level of measurement is the most appropriate because the data can be ordered, difference is no natural starting point

The accompanying data were collected from a statistics class. The column heads give the variable, and each of the rows represents a student in the class. Suppose you decided to code eye color using 1 for Blue and 0 for Not Blue. What would be the label at the top of the column?

The label would be blue.

Mode

The measure of center that is the value that occurs with the greatest frequency is the​ _______. The most frequently occurring score(s) in a distribution.

Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.

The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.

Which measure of center​ (mean or​ median) is​ resistant? Explain what it means for that measure to be resistant.

The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was​ doubled, for​ example, the median would not change since that largest value does not factor into its computation.

The median

The median of a variable is the value that lies in the middle of the data when arranged in ascending order.We use M to represent the median

6. When you need to compute a raw score, that represents the minimum or maximum score needed to answer a question, look for the percentage in the question e.g. "What raw scores form the boundaries of the middle 60% of the distribution:

The middle 60% straddles the mean & can be divided into 2 = percentages; 30% & 30%. You look for the value closest to .3000 in the *mean to z column* & locate the z-score in that row. Then you use that z-score in the formula we use to compute raw score: X=mew + z sigma

Ring sizes typically range from about 3 to about 14. Based on what you know about gender differences, if we randomly select a person, are the event that ring size is smaller than size 5 and that the person is a male independent or associated? Explain.

The two events are associated because men on average have larger hands than women and this affects the probability of being smaller than size 5.

Cumulative Frequency Distribution

A cumulative frequency distribution is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary).

Fill in the blank. A​ _______ helps us understand the nature of the distribution of a data set.

A frequency distribution helps us understand the nature of the distribution of a data set.

Identify the type of observational study used: A town obtains current employment data by polling 10,000 of its citizens this month. Prospective Retrospective Cross-sectional None of these

Cross-sectional

Interval

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Monthly temperatures: 65° F, 70° F, 75° F, 80° F, and 85° F Choose the correct answer below: Ratio Nominal Interval Ordinal

Determine whether the given value is from a discrete or continuous set " The total number of phone calls a sales representative makes in a month is 425."

Discrete

True or​ False: A data set will always have exactly one mode.

Fasle -The mode of a variable is the most frequent observation of the variable that occurs in the data set. To compute the​ mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no​ mode, one​ mode, or more than one mode. If no observation occurs more than​ once, the data have no mode.

A researcher hypothesizes more than 85% of Americans own a cell phone. Which of the following would be an example of researchers making a Type II Error?

From a study conducted, researchers failed to reject their null hypothesis. In fact 90% of Americans own cell phones.

Frequency

How many times a person falls into a category

Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y = 71.8 + 0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the slope?

IQ is predicted to increase by 0.0286 for every 1 cubic centimeter increase in brain size.

The length of the box in a boxplot is proportional to which of the following?

IQR

UNION

IS THE NUMBER FROM BOTH AND THE NUMBER THEY HAVE IN COMMON.

SAMPLE SPACE

IS THE SET OF ALL THE POSSIBLE OUTCOMES.

PROBABILITY

IT IS A PREDICTION OF A CERTAIN OUTCOME

A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data?

In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.

When a person stands trial for murder, the jury is instructed to assume that the defendant is innocent. Is this claim of innocence an example of a null hypothesis, or is it an example of an alternative hypothesis?

It is a null hypothesis, since it is assumed to be true until evidence can prove otherwise.

Upper Class limits

Largest numbers in each categories

For what types of associations are regression models useful?

Linear

Which measure of variation is very sensitive to extreme values?

Range

Which measure of variation is very sensitive to extreme​ values

Range

Which measure of variation is very sensitive to extreme​ values?

Range

which measure of variation is very sensitive to extreme values?

Range

The _______ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.

Regression equation

In a​ _______ distribution, the frequency of a class is replaced with a proportion or percent.

Relative Frequency Distribution

z-score (often called the standardized value)

Represents the distance that a data value is from the mean in terms of the number of standard deviations. (It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation) The z-score is unitless. It has a mean 0 and standard deviation 1. The z-score is often called the standardized value.

Cumulative Frequency

The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes (example pg.49).

Cumulative Frequency

The cumulative frequency is the sum of the frequencies accumulated up to the upper boundary of a class in the distribution.

Identify which of the following statements is not a requirement for a probability density function or state that they all are.

The curve must be symmetric and centered at zero

Look at #25 chart and answer the questions: Construct a histogram on the calculator. Are the data reported or measured?

The data appears to be measured. The heights occur with roughly the same frequency.

State whether the data described below are discrete or continuous and explain why: The exact ages in hours of different cockroaches found in a certain city

The data are continuous because the data can take any value in an interval

Look at #48 chart and answer the questions: What impression does the graph create? Does the graph depict the data fairly?

The graph creates the impression that men have salaries that are more than twice the salaries of women. No, because the vertical scale does not start at zero.

​No, there does not appear to be a correlation because there is no general pattern to the data.

The heights of a certain​ country's presidents and their main opponents in the election campaign have been constructed into a scatterplot (above). Does there appear to be a​ correlation?

One common system for computing a grade point average​ (GPA) assigns 4 points to an​ A, 3 points to a​ B, 2 points to a​ C, 1 point to a​ D, and 0 points to an F. What is the GPA of a student who gets an A in a 33-credit ​course, a B in each of two 2​-credit ​courses, a C in a 3​-credit ​course, and a D in a 2​-credit ​course?

The mean grade point average is a 2.7

Under what conditions is the mean preferred?

The mean is preferred when the data is relatively symmetric.

In determining the mean age of all students at your school, you survey 30 students and find the mean of their ages. Is this mean x or μ?

The mean is x.

If each monthly cell phone bill in the country were​ doubled, how would the mean of the cell phone bills be​ affected?

The mean of the cell phone bills would double.

Suppose, on the warmest day of the month, the daily high temperature in a city is accidentally recorded as 700 instead of 70 degrees Fahrenheit. Compare the effect this mistake will have on the mean monthly high temperature to the effect on the median monthly high temperature.

The mean will increase significantly, but the median will not change as a result of the mistake.

Definition of Mean (or Arithmetic Mean)?

The measure of center found by adding the data values and dividing the total by the number of data values.

What is mean of a set of data?

The measure of center found by adding the data values and dividing the total by the number of data values.

Definition of Midrange

The measure of center taht is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2.

Definition of Median

The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

What is the Median?

The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

With a data set that is so​ small, the true nature of the distribution cannot be seen with a histogram.

The population of ages at inauguration of all U.S. Presidents who had professions in the military is​ 62, 46,​ 68, 64, 57. Why does it not make sense to construct a histogram for this data​ set?

Suppose a researcher is testing someone to see if he or she can tell Soda X from Soda Y, and the researcher is using 20 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. The alternative is one-sided, Ha: p0>0.5. The person gets 13 right out of 20. The p-value comes out to be 0.090. Explain the meaning of the p-value.

The probability that a person will get 13 or more right, if the person is truly guessing, is about 9%.

Is the time it takes for a light bulb to burn out discrete or continuous?

The random variable is continuous.

Is the number of socks in a drawer discrete or continuous?

The random variable is discrete.

What makes the range less desirable than the standard deviation as a measure of​ dispersion?

The range does not use all the observations.

Interquartile range (IQR)

The range of the middle 50% of the observations in a data set. The difference between the upper quartile and the lower quartile. IQR = Q3 - Q1 Interpretation of the interquartile range is similar to that of the range and standard deviation. That is, the more spread a set of data has, the higher the interquartile range will be.

Determine which of the levels of measurement is most appropriate for the data below: Brain volumes measured in cubic cm

The ratio level of measurement is the most appropriate because the data can be ordered, differences can be found and are meaningful, and there is a natural starting point

What does a correlation coefficient of 0 indicate?

There is no linear relationship between the two quantitative variables.

A person was trying to figure out the probability of getting two heads when flipping two coins. She flipped two coins 10 times, and in 6 of these 10 times, both coins landed heads. On the basis of this outcome, she claims that the probability of two heads is 6/10, or 60%. Is this an example of an empirical probability or a theoretical probability? Explain.

This is an example of empirical probability because it is based on an experiment.

Fill in the blank. We utilize statistical​ _______ to look for features that reveal some useful or interesting characteristics of the data set.

We utilize statistical graphs to look for features that reveal some useful or interesting characteristics of the data set.

If the R^2 is 1 or very near 1, its a good fit. If its close to 0 its a poor fit.

What is a good fit for R^2? What isn't a good fit?

The prediction interval is used for estimate of a value of a variable. A confidence interval is used for an estimate of a value of a population parameter.

What is the difference between the confidence interval and the prediction interval?

The population of ages at inauguration of all U.S. Presidents who had professions in the military is​ 62, 46,​ 68, 64, 57. Why does it not make sense to construct a histogram for this data​ set?

With a data set that is so​ small, the true nature of the distribution cannot be seen with a histogram.

Look at #7 charts and answer the question: Do cigarette filters appear to be effective?

Yes, because the relative frequency of the higher tar classes is greater for nonfiltered cigarettes.

To describe the exact position of a score within a distribution, z-score must transform each x-value into a signed number; positive or negative.

all z-scores above the mean are positive and all z-scores below the mean are negative. The number tells the distance between the score and the mean in terms of the number of standard deviations.

Typically, the direction (>, <, or ≠) used in the _______ hypothesis is determined from the question of interest.

alternative

pictographs

drawings of objects, are often misleading because they can create false impressions that distort differences

​A(n) _______ uses line segments to connect points located directly above class midpoint values.

frequency polygon

quantitative data

measures how much. such as weights of high school students. ARE DOT-PLOTS, HISTOGRAMS, AND STEM PLOTS.

What measure of central tendency best describes the​ "center" of the​ distribution when the graph is skewed

median

________ are sample values that lie very far away from the majority of the other sample values

outliers

A -- histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies.

relative frequency

The ____ and the ____ of a correlation coefficient describe the direction and the magnitude of the relationship between two variables

sign; absolute value

Class width is found by -------.

subtracting a lower class limit from the next consecutive lower class limit

midrange

the price exactly in between the highest and lowest

P (A or B) indicates ____.

the probability that in a single trial, event A occurs, event B occurs, or they both occur.

P(A or B) indicates​

the probability that in a single​ trial, event A​ occurs, event B​ occurs, or they both occur

s

the sample variance symbol is

A data value is considered - if its z-score is less than -2 or greater than 2.

unusual

A data value is considered ______ if its z-score is less than -2 or greater than 2.

unusual

A data value is considered _________ if the z-score is less than -2 or greater than 2.

unusual

Inferential statistics

uses methods that generalize results obtained from a sample to the population and measure the reliability of the results

Weighted mean

when different x values are assigned different weights, w. Multiply each weight w by the corresponding value x, then add the products, and finally divide that toal by the sum of the weights, Sigma*w.

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a --

z-score

Find the population mean or sample mean as indicated. ​Population: 2​, 1​, 11​, 15​, 6

µ= 7

frequency polygon

​A(n) __________________ ______________ uses line segments to connect points located directly above class midpoint values.

In the data table​ below, the​ x-values are the weights​ (in pounds) of cars and the​ y-values are the corresponding highway fuel consumption amounts​ (in mi/gal). Weight​ (lb) 40884088 33583358 41334133 36503650 35453545 Highway Fuel Consumption​ (mi/gal) 2626 3131 2929 2929 3030 Comment on the source of the data if you are told that car manufacturers supplied the values. Is there an incentive for car manufacturers to report values that are not​ accurate?

​Yes, because​ consumers, in​ general, would prefer to buy a car with a higher level of fuel efficiency. In this​ case, the source of the data would be suspect with a potential for bias.

If the standard deviation of a variable is 10​, what is the​ variance?

100

Number of notes in a song...

Discrete b/c its countable

what values can't be probabilities?

0<P(A)<1 greater than or equal too

Listed below are the top 10 annual salaries (in millions of dollars) of TV personalities. Find the range, variance, and standard deviation for the sample data. Given that these are the top 10 salaries, do we know anything about the variation of salaries of TV personalities in general? 40 38 36 29 17 15 13 9 8.6 8.0 The range of the sample data is $-- million. The variance of the sample data is -----. The standard deviation of the sample data is $----- million. Is the standard deviation of the sample a good estimate of the variation of salaries of TV personalities in general?

32 168.94 13.00 No, because the sample is not representative of the whole population.

Median

"Middle value." The measure of cneter that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. Denoted by x-tilde. Sort values. If number of data values is odd, the median number located in the exact middle of list. Even? Mean of the two middle numbers. Properties: does not change by large amounts when we include just a few extreme values. It is a RESISTANT measure of center. Does not use every data value.

Varience is

(standard deviation)^2

Important Properties of the Median

- The median does change by large amounts when we include just a few extreme values (so the median is a resistant measure of center). - The median does not use every data value.

IQR (Interquartile Range)

-MEASURE OF DISPERSION (VARIABILITY) Remember, is data is symmetric: best measure of central tendency is the mean, while the best measure of dispersion is standard deviation. AND IQR (Q3-Q1) HOWEVER, if data is skewed or if it contains, best measure of central tendency is the median, and the best measure of dispersion is the IQR -DEFINITION: the range of the middle 50% of the observations in a data set. ===IQR=Q3-Q1 But if the data set is skewed and or has outliers:THE BEST MEASURE OF DISPERSION: IQR/2 = (Q3-Q1)/2

Check recording 12 minutes for a step by step process on how to approach a problem!!!!

...

The hights of the bars of a histogram correspond to ______ values?

...

If the probability that it will rain tomorrow is 0.30, the probability that it will not rain tomorrow is what?

0.70

How to check outliers with Quartiles Rule

1. Determine Lower (Q1=QL) and Upper Quartiles (Q3=QU) 2. Compute IQR 3. Determine Lower and Upper Fences a. LIF=QL -1.5(IQR) b. UIF=QU +1.5(1QR) c. LOF=QL -3(IQR) d. UOF=QU+3(IQR)

1. If the computed linear correlation coefficient r lies in the left/right tail beyond the leftmost/rightmost critical value. Or if the |r| exceeds value on table A-6. 2. If the computed linear correlation lies between the two critical values.

1. How do we know that there is a correlation (using r)? 2. How do we know if there is no correlation?

What are two important properties of x̃?

1. The median does not change by large amounts when we include just a few outliers. 2. The median does not use every data value.

Heights of adult males are known to have a normal distribution. A researcher claims to have randomly selected adult males and measured their heights with the resulting relative frequency distribution as shown here. Identify two major flaws with these results.

1. The sum of the relative frequencies is 125​%, but it should be​ 100%, with a small possible​ round-off error. 2. All of the relative frequencies appear to be roughly the same. If they are from a normal​ distribution, they should start​ low, reach a​ maximum, and then decrease.

median rules

1. if data values = odd, then median is in exact middle of list 2. if data values = even, median is found by computing mean of two middle numbers

A professor has recorded exam grades for 10 students in his​ class, but one of the grades is no longer readable. If the mean score on the exam was 82 and the mean of the 9 readable scores is 86​, what is the value of the unreadable​ score?

10 X 82 = 820 - 9 X 86= 774. 820 - 774 = 46 A= 46

Look at #24 chart and answer the questions: What is the class width? What are the approximate lower and upper class limits of the first class?

20 Lower class: 105 Upper class: 125

Refer to the data set of times, in minutes, required for an airplane to taxi out for takeoff, listed below. Find the mean and median. How is it helpful to find the mean? 13 19 12 39 13 36 36 47 12 19 18 26 13 45 28 15 14 47 42 17 38 16 41 13 18 43 50 28 33 17 17 38 12 17 48 34 41 25 42 10 28 39 28 48 46 36 18 17 Find the mean and median of the data set using a calculator or similar data analysis technology. The mean of the data set is --- minutes. The median of the data set is -- minutes. How is it helpful to find the mean?

28.2 28 The mean taxi out time is important for calculating and scheduling the arrival time.

Look at #2 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?

3 51, 54, 57, 60, 63, 66, 69 49.5, 52.5, 55.5, 58.5, 61.5, 64.5, 67.5, 70.5

Statistics are sometimes used to compare or identify authors of different works. The lengths of the first 10 words in a book by Terry are listed with the first 10 words in a book by David. Find the mean and median for each of the two samples, then compare the two sets of results. Terry: 2 2 11 5 2 3 3 2 9 4 David: 3 4 3 3 3 2 3 2 1 1 The mean number of letters per word in Terry's book is --- The median number of letters per word in Terry's book is - The mean number of letters per word in David's book is -- The median number of letters per word in David's book is - Compare the two sets of results. Does there appear to be a difference?

4.3 3 2.5 3 Yes. Based on the results, words in Terry's book are longer than the words in David's book.

empirical rule

68% of all data is within one standard deviation of the mean 95% of all data is within 2 standard deviations of the mean 99.7% of all data is within 3 standard deviations

Waiting times (in minutes) of customers at a bank where all customers enter a single waiting line and a bank where customers wait in individual lines at three different teller windows are listed below. Find the coefficient of variation for each of the two sets of data, then compare the variation. Bank A (single line): 6.5 6.5 6.6 6.8 7.1 7.3 7.3 7.7 7.7 7.8 Bank B (individual lines): 4.3 5.3 5.8 6.2 6.7 7.7 7.7 8.6 9.3 9.9 Is there a difference in variation between the two data sets?

7.2 25.3 The waiting times at Bank A have considerably less variation than the waiting times at Bank B

The following are the duration times (in minutes) of all missions flown by a space shuttle. Use the given data to construct a boxplot and identify the 5-number summary. 9 7459 8861 10,024 10,100 10,118 11,453 11,523 11,841 The five number summary is - - - - - Make a boxplot on the calculator

9, 8682, 10062, 11453, 11841

A Type II Error is made... A Type II Error is made when there's not enough evidence to reject the null hypothesis and the null hypothesis is true. A Type II Error is made when there's evidence to reject the null hypothesis, but the null hypothesis is true. A Type II Error is made when there's not enough evidence to reject the null hypothesis, but the null hypothesis is not true. A Type II Error is made anytime we do not reject the null hypothesis.

A Type II Error is made when there's not enough evidence to reject the null hypothesis, but the null hypothesis is not true.

What type of effect can outliers have on a regression line?

A big effect

Fill in the blank. A histogram aids in analyzing the​ _______ of the data.

A histogram aids in analyzing the shape of the distribution of the data.

What is a value at the center or middle of a data set?

A measure of center.

Whenever a data value is less than the​ mean, _______

A negative​ z-score indicates a data value is less than the mean.

MEASURE OF CENTER

A value at the center or middle of a data set is​ a(n) _________.

How do you find the midrange?

Add the Max and min data value and then divide the sum by 2.

Which word is associated with multiplication when computing probabilities?

And

Correlation does not imply: Linearity Bias Causation Significance

Causation

Center

Center value of data(CVDOT)

Which of the following is NOT a procedure for determining whether it is reasonable to assume that sample data are from a normally distributed population? a. Visual inspection of a Histogram to determine if its roughly "bell shaped" b. Constructing a probability plot (QQ) c. Identifying the outliers. d. Checking that the probability of an event is 0.05 or less.

Checking that the probability of an event is 0.05 or less.

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "An education researcher randomly selects 48 middle schools and interviews all the teachers at each school."

Cluster

Identify which type of sampling is being used: An education researcher randomly selects 48 middle schools and interviews all the teachers at each school.

Cluster

Standardized Distribution

Composed of scores that have been transformed to create predetermined values for mean standard deviation. They are used to make dissimilar distributions comparable.

Of the following, which is the only method of data collection suitable for making conclusions about causal relationships?

Controlled experiments

Identify which type of sampling is being used: A researcher interviews 19 work colleagues who work in his building.

Convenience

Discrete

Countable number

Determine whether the given value is a discrete or continuous variable: People are asked to state how many times in the last month they visited their family doctor Continuous Discrete

Discrete

The distribution appears to be skewed to the left ​(or negatively ​skewed).

Does the graph suggest that the distribution is​ skewed? If​ so, how?

What must be true for a sample to be considered a simple random​ sample?

Every possible sample of that size must have the same chance of being selected.

What does it mean if a statistic is​ resistant?

Extreme values​ (very large or​ small) relative to the data do not affect its value substantially

Identify the given statement as either true or false. The standard deviation is a resistant measure of spread.

False

Which of the following is true for a normal probability density curve?

For a normal probability density curve, as x gets larger and larger, the graph approaches but never reaches the horizontal axis.

It has been noted that people who go to church frequently tend to have lower blood pressure than people who don't go to church. Does this mean you can lower your blood pressure by going to church? Why or why not? Explain.

Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other.

A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. He noticed that 2 out of 15 of his friends in the class dropped before the midterm. Based on his sample, he performs a hypothesis test. Which of the following statements is true?

He should not make a conclusion about all students in the introductory Chemistry class since he took a convenience sample.

Difference in Pareto and Bar charts

In a Pareto​ chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important​ categories, which have the highest frequencies.

Suppose every student in a class is surveyed and it is found that​ 75% of the class plans to take another math class. It is reported that​ 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Inferential​ statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.

When computing the correlation coefficient, what is the effect of changing the order of the variables on r?

It has no effect on r.

What is the first step in almost every investigation of data?

Make an appropriate graph.

Which of the following is NOT a value in the 5-number summary? -Median -Mean -Minimum -Q1

Mean

Which of the following is NOT needed to construct a​ boxplot?

Mean

What is the formula to calculate mean?

Mean = Σx / n

What are the measures of center?

Mean, medium, mode and midrange.

The interquartile range tells us how much space the _____ of the data occupy.

Middle 50%

Class boundaries

Midpoint gap between the numbers 60-69 70-79 The class boundaries is 69.5 and 79.5

When summarizing graphs of categorical data, report the _______ and describe the _______.

Mode(s), variability

If events A and B are independent, what must be done to find the probability of event A AND B?

Multiply the probability of A and the probability of B.

When two events have no outcomes in common, they are called what?

Mutually exclusive

Under what conditions can extrapolation be used to make predictions beyond the range of the data?

Never

When can a correlation coefficient based on an observational study be used to support a claim of cause and effect?

Never

Are any of the measures of dispersion among the​ range, the​ variance, and the standard​ deviation, resistant? Explain.

No, all of these measures of dispersion are affected by extreme values.

Days before a presidential election, a nationwide random sample of registered voters was taken. Based on this random sample, it was reported that "52% of registered voters plan on voting for Robert Smith with a margin of error of ±3%." The margin of error was based on a 95% confidence level. Can we say with 95% confidence that Robert Smith will win the election if he needs a simple majority of votes to win?

No, because 50% is within the bounds of the confidence interval.

Three cards are drawn without replacement from a standard deck, and the number of kings is noted. Does this constitute a binomial experiment? Why or why not?

No, because the probability of getting a king is not the same for each of the three draws.

When two dice are rolled, the sum is between 2 and 12 inclusive. A student simulates the rolling of two dice and finding the sum by randomly generating integers between 2 and 12. Does this simulation behave in a way that is similar to actual dice? Why or why not?

No; The student's simulation will generate the sums with equal probability when in fact the sums are not equally likely.

A frequency distribution lists the ______ of occurrences of each category of​ data, while a relative frequency distribution lists the __________ of occurrences of each category of data.

Number; Proportion

What are two basic types of variables in statistics?

Numerical and categorical

Quantitative

Numerical data

Listed below are blood groups of O, A, B, and AB of randomly selected blood donors. Construct the relative frequency distribution. A O O O O A A O A O O A A O O AB O A B A AB O A O B O O AB A A AB A O A B O AB O O O Find the relative frequency for O, A, B, AB

O: 47.5% A: 32.5% B: 7.5% AB:12.5%

ADDITION

OR REFERS TO ______ RULE.

A study is conducted to measure​ children's growth rates without any treatment applied to the children. What best classifies this​ study?

Observational

​_______ are sample values that lie very far away from the majority of the other sample values.

Outlier

____ are sample values that lie very far away from the majority of the other sample values.

Outliers

Determine whether the given value is a statistic or a parameter. "After inspecting all 45,000kg of meat stores at the Wurst Sausage Company, it was found that 20,000kg of the meat was spoiled."

Parameter

Determine whether the given value is a statistic or a parameter. "After taking the first exam, 15 of the students dropped the class."

Parameter

What is a numerical value that characterizes some aspect of a population? Statistic Census Parameter Estimator

Parameter

Suppose a researcher is testing someone to see whether she or he can tell Soda X from Soda Y, and the researcher is using 22 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. Suppose person A gets 19 right out of 22, and person B gets 15 right out of 22. Which will have a smaller p-value, and why?

Person A will have a smaller p-value because that person's number of successes is further from the hypothesized number of successes.

Which of the following is a reason we can never draw cause-and-effect conclusions from observational studies?

Potential confounding variables may explain the differences between groups rather than the treatment variable.

A(n) ______can be used to compute probabilities of continuous random variables.

Probability density function

Favorite rock group is qualitative or quantitative?

Qualitative because it is an attribute classification

Classify the data as either qualitative or quantitative. The following table gives the top five movies at the box office this week. Rank-last week-movie title-studio-sales (millions$) What kind of data is provided by the information in the first column?

Quantitative

Determine whether the data are qualitative or quantitative. "the number of seats in a movie theater"

Quantitative

AND

REFERS TO MULTIPLICATION. PROBABILITY OF EVENTS( A AND B) FOR INDEPENDENT EVENTS P(A AND B) = P(A)*P(B)

Relative Frequency =

RF = Frequency / Sum of all Frequencies

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A manman experienced a tax audit. The tax department claimed that the man was audited because he was randomly selected from all the tax payers.

Random

Yes. The frequencies start low, reach a maximum, then become low again, and are roughly symmetric about the maximum frequency. The Histogram would be bell-shaped, and NOT skewed.

Refer to the frequency distribution (above) of 25 home voltage measurements below, with a lower class limit of 127.7 volts, and a class width of 0.2 volt. Does the result appear to have a normal​ distribution? Why or why​ not?

No. The data values in each class could take on any value between the class​ limits, inclusive.

Refer to the table summarizing service times​ (seconds) of dinners at a fast food restaurant. How many individuals are included in the​ summary? Is it possible to identify the exact values of all of the original service​ times?

A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. A hypothesis test is performed. Which of the following will result in a Type I error?

Researchers said the flu shot reduced the risk of developing the flu when it actually didn't.

In a poll of 50,000 randomly selected college students, 74% answered "yes" when asked "Do you have a television in your dorm room?" Identify the sample and population.

Sample: the 50,000 selected college students; population: all college students

A​ _______ is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

Scatterplot

A histogram aids in ______ the of data

Shape of the distribution

Jan performed a study and obtained a p-value of 1.24. What conclusion should Jan make?

She made an error since it is not possible to get a p-value of 1.24.

Why is it important to learn about bad graphs?

So that we can critically analyze a graph to determine whether it is misleading.

Standard deviation measures the _____ of the distribution

Spread

To compute the variance, what should one do?

Square the standard deviation.

A z-score represents how many ______________ a data value is above or below the ______________.

Standard deviations, mean

What is the standard deviation of the sampling distribution called?

Standard error

A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is 157 lb. Is this value a statistic or a parameter?

Statistic

What is an important difference between statistics and parameters?

Statistics are knowable, but parameters are typically unknown.

Checking for Outliers by Using Quartiles

Step 1 Determine the first and third quartiles of the data. Step 2 Compute the interquartile range. Step 3 Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence = Q1 - 1.5(IQR) Upper Fence = Q3 + 1.5(IQR) Step 4 If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.

Σ

Sum of all data values

IQ scores are measured with a test designed so that the mean is 108 and the standard deviation is 17. Consider the group of IQ scores that are unusual. What are the z scores that separate the unusual IQ scores from those that are​ usual? What are the IQ scores that separate the unusual IQ scores from those that are​ usual? (Consider a value to be unusual if its z score is less than −2 or greater than​ 2.) What are the z scores that separate the unusual IQ scores from those that are​ usual? What are the IQ scores that separate the unusual IQ scores from those that are​ usual?

The Lower z-score boundary is -2 The higher Z score boundary is 2 The lower bound IQ score is (formula is: X=μ+σ*Z) Lower Bound score is X= 108+17*-2 = 74 Lower bound score is 74 The Higher Bound score is (formula is: X=μ+σ*Z) higher bound score is X= 108+17*2 = 142

S=RANGE/4

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​

Fill in the blank. The bars in a histogram​ _______.

The bars in a histogram touch.

Touch

The bars in a histogram __________.

Class Midpoint

The class midpoint Xm is obtained by adding the lower and upper boundaries and dividing by 2, or adding the lower and upper limits and dividing by 2:

State whether the data described below are discrete or continuous: The number of programs installed on various computers

The data are discrete because the data can only take in specific value

Ordinal

The level of measurement of: Positions of runners in a race is ______________. Interval Ordinal Ratio Nominal

Which of the following is NOT a property of the linear correlation coefficient r? -The value of r is always between -1 and 1 inclusive. -The value of r is not affected by the choice of x or y. -The value of r measure the strength of a linear relationship. -The linear correlation r is robust. This is, a single outlier will not affect the value of r.

The linear correlation coefficient is robust. That is, a single outlier will not affect the value of r.

A highly selective boarding school will only admit students who place at least 2 standard deviations above the mean on a standardized test that has a mean of 200 and a standard deviation of 24. What is the minimum score that an applicant must make on the test to be​ accepted?

The minimum score that an applicant must make on the test to be accepted is 248

If we collect a large sample of blood platelet counts and if our sample includes a single​ outlier, how will that outlier appear in a​ histogram?

The outlier will appear as a bar far from all of the other bars with a height that corresponds to a frequency of 1.

Class Midpoints

The values in the middle of the classes. Each class midpoint is found by adding the lower class limit to the upper class limit and dividing it by 2 (example pg. 47)

Look at #40 and answer the questions: Construct a time-series graph (line graph) on the calculator. What is the trend? How does this trend compare to the trend for drive-in movie theaters?

There appears to be an upward trend, unlike drive-in movie theatres, which have a downward trend.

A friend flips a coin 10 times and says that the probability of getting a head is 40% because he got four heads. Is the friend referring to an empirical probability or a theoretical probability? Explain.

This is an example of empirical probability because it is based on an experiment.

True or​ False: Chebyshev's inequality applies to all distributions regardless of​ shape, but the empirical rule holds only for distributions that are bell shaped

True, Chebyshev's inequality is less precise than the empirical​ rule, but will work for any​ distribution, while the empirical rule only works for​ bell-shaped distributions

A categorical variable is only called bimodal under what circumstances?

Two categories are nearly tied for most frequent outcomes.

The existence of multiple mounds in a distribution is sometimes a sign of which of the following?

Two very different groups have been combined into a single collection

Construct a scatter diagram using the data table to the right. This data is from a study comparing the amount of tar and carbon monoxide​ (CO) in cigarettes. Use tar for the horizontal scale and use carbon monoxide​ (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO.

Use excel - highlight all the values - insert "scattergram"

Which of the following is a common distortion that occurs in​ graphs?

Using a​ two-dimensional object to represent data that are​ one-dimensional in nature

Which of the following is not something that one looks for when studying scatterplots?

Variation

The study of statistics rest on what two major concepts?

Variation and data

Which characteristic of data is a measure of the amount that the data values​ vary?

Variations

1. The sample of paired (x,y) data is a simple random sample of quantitative data. 2. Visual examination of the scatter plot must confirm that the points approximate a straight-line pattern. 3. Outliers must be removed if they are known to be errors.

What are the requirements that should be satisfied before finding r?

To make predictions for the value of one of the variables given some specific value of the other variable.

What can use the regression equation for?

Negative correlation

What can we say about r = -.965 As the x values increase, the y values decrease.

No correlation

What can we say about r = 0? No distinct pattern between x and y.

To measure how points are configured among four quadrants.

What can we use Σ(Zx*Zy) for?

3 sig fig.

What do we round bof1 and bof0 to?

What is a No Mode Data Set?

When no data value is repeated, we say that there is no mode.

What is the difference between a random sample and a simple random​ sample?

With a random​ sample, each individual has the same chance of being selected. With a simple random​ sample, all samples of the same size have the same chance of being selected.

Look at #39 chart and answer the questions? Construct a scatterplot on the calculator. Is there a relationship between cigarette tar and CO?

Yes, as the amount of tar increases the amount of carbon monoxide also increases.

When using the addition rule

always be careful to avoid​ double-counting outcomes.

A political pollster reports that her candidate has a 5% lead in the polls. This is an example of

an Observational Study

Outliers

are sample values that lie very far away from the majority of the other sample values.

Variables

are the characteristics of the individuals within the population

In a probability histogram, there is a correspondence between ___.

area and probability.

Why is range not a good measure?

because it doesn't give you how wide the data is talking about but not weather it's scrunched or dispersed or how many n or N is

The U.S. Department of Housing and Urban Development(HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?

because the data are skewed right

If the data set is symmetric or approximately symmetric, and no outliers, then

best measure of center: mean best measure of dispersion: standard deviation

round-off rule

carry one more decimal place than is present in original set of values; because values of the mode are the same as some of the original data values, they can be left as is without any rounding

A ________ is the collection of data from every member of the population. sample census placebo statistic

census

Which of the following is NOT a measure of center?

census

Height is a

continuous variable

Methods used that summarize or describe characteristics of data are called ___ statistics.

descriptive

sample space

for a procedure consists of all possible simple events or all outcomes that cannot be broken down any further

midrange

half way between highest and lowest formula: max+min/2

relative frequency histogram

histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies

A scatterplot

is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

How to know if a certain data is "unusual"?

it is more than two standard deviations below the mean

which measures of central tendencies are not resistent

mean, range and standard deviation

What are the four levels of measurement?

nominal ordinal interval ratio

p-values

only a small P-value, such as .05 or less (5% chance or less) suggests that the sample results are not likely to occur by chance when there is no linear correlation, so a small P-value supports a conclusion that there is a linear correlation between the two variables.

Raw score

original, unchanged scores that are the direct result of measurement. A test score that has not been transformed or converted in any way.

Population mean is a

parameter

Below are 36 sorted ages of an acting award winner. Find the percentile corresponding to age 59 using the method presented in the textbook. 16,17,17,21,22,27,30,33,37,37,40,42,43,48,54,56,57,59,59,60,60,62,62,64,65,65,68,70,70,72,72,73,74,77,78,80

percentile of value = number of values less than x Over total number of values times 100 For this problem x=59. How many values are less than 59​? 17 What is the total number of​ values? 36 59=17/36 x100

When drawings of objects are used to depict data, false impressions can be made. These drawings are called -.

pictographs

the 68-95-99.7% rule applies for

roughly all bell-shaped curves

What is the symbol for sample standard deviation?

s

nonzero axis

some graphs are misleading because on or both of the axes begin at a value other than zero, so differences are exaggerated

The ____ linear relationship is indicated by a correlation coefficient of -1 or 1.

strongest

The larger the standard deviation means...

that observations are more distant from the typical value, and therefore more dispersed

Sampling bias means

that the technique used to obtain the sample's individuals tend to favor one part of the population over another

Midrange

the value midway between the maximum and minimum values in the original data set. (Max data value + minimum data value)/2. Properties: it is very sensitive to extremes.

Mode

the value that occurs with the greatest frequency. Bimodal, multimodal, no mode. Only measure of center that can be used with data at the nominal level of measurement.

What is the square of the standard deviation called?

the variance. (s2)

What is the purpose of z-scores?

to describe the exact location of each score in a distribution; -always refers to population (must use a different formula for samples).

The bars in a histogram

touch

The bars in a histogram -.

touch

The bars in a histogram​ _______.

touch

A data value is considered​ _______ if its​ z-score is less than minus−2 or greater than 2.

unusual

The square of the standard deviation is called the _______.

variance

the square of a standard deviation is called the

variance

The square of the standard deviation is called the​ _______.

variance v=Standard dev^2

how to tell which histogram has the highest standard deviation

which ever graph is more spread out

How do you calculate Mean from a frequency distribution?

x̄ = Σ (f * x) / Σf

What is the formula to find a weighted mean?

x̄ = Σ(w*x) / Σw

What is the formula to find the mean of a set of sample values?

x̄ = Σx / n

Determine the regression equation for the data. Round the final values to three significant digits, if necessary. x= 0, 3, 4, 5, 12 Y= 8,2,6,9,12

y hat= 4.88 + 0.525x

symmetric data

you could fold graph down the middle and it would be the same on both sides; bell curve

sample z-score

z = (x - x̄) / s

What is the symbol for population variance?

σ2

A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar​ chart, and how does that characteristic help us in understanding the​ data?

In a Pareto​ chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important​ categories, which have the highest frequencies.

Listed below are the playing times (in seconds) of songs that were popular at the time of this writing. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Is there one time that is very different from the others? 444 237 232 246 246 297 277 223 239 211 262 254 258 a. The mean is -- seconds. b. The median is ---- seconds. c. The mode is ---- seconds. d. The midrange is ------ seconds. Is there one time that is very different from the others?

a. 258.7 b. 246 c. 211, 246 d. 327.5 Yes; the time of 444 seconds is very different from the others.

Listed below are head injury measurements from small cars that were tested in crashes. The measurements are in "hic," which is a measurement of standard "head injury criterion," (lower "hic" values correspond to safer cars). The listed values correspond to cars A, B, C, D, E, F, and G, respectively. Find the a. mean, b. median, c. midrange, and d. mode for the data. Also complete parts e. and f. 393 365 489 327 510 539 355 a. Find the mean . b. Find the median. c. Find the midrange. d. Find the mode. e. Which car appears to be the safest? f. Based on these limited results, do small cars appear to have about the same risk of head injury in a crash?

a. 425.4 b. 393 c. 433 d. There is no mode e. car D f. No, because the data values differ substantially.

Frequency Distribution

A frequency distribution is the organization of raw data in table form, using classes and frequencies.

Measure of Center

A value at the center or middle of a data set. There are several different ways to determine the center, so there are different definitions of measures of center, including the mean, median, mode, and midrange.

____ measure the strength of association between two variables

Correlation coefficients

Fill in the blank. The heights of the bars of a histogram correspond to​ _______ values

Frequency

Relative Frequency Distribution

Lists each category of data together with the relative frequency

Which of the following is NOT needed to construct a boxplot?

Mean

We utilize statistical​ _______ to look for features that reveal some useful or interesting characteristics of the data set.

graphs

Generally, the correlation coefficient of a ____ is denoted by r, and the correlation coefficient of a ____ is denoted by ρ or R.

sample; population

Frequency Distribution (or frequency table)

shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories

Descriptive statistics

summarize or describe relevant characteristics of data.

Distribution

the nature or shape of the spread of the data over the range of values (such as bell shaped, uniform or skwed(CVDOT)

Inferential statistics

used to make inferences, or generalizations, about a population

frequency polygon

uses line segments connected to points located directly above class midpoint values

Which characteristic of data is a measure of the amount that the data values vary? a. variation b. distribution c. time d. center

variation

Parameter

Describes characteristics of a population

The heights of the bars of a histogram correspond to ________ values

frequency

dotplot

A​ _______ is a graph of each data value plotted as a point.

Describe sampling without replacement.

Draw a notecard, note the name, do not replace the notecard and draw again. It is not possible the same student could be picked twice.

Arithmetic mean

of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations

When finding the mean of a set of data you should always do what first

put data in order!!!! median will be skewed otherwise

Methods used that summarize or describe characteristics of data are called​ _______ statistics.

descriptive

mode can be used for both

quantitative and qualitative

Bar Graph requires

title, axis labels, and clear scale

Which of the following is the probability that something in the sample space will occur?

1

Rejecting the null hypothesis when the null hypothesis is true is called _____________.

A type 1 error

Methods used that summarize or describe characteristics of data are called ______ statistics

Descriptive

After constructing any relative frequency distribution, what should be the sum of the relative frequencies?

1 or 100%

The variance for a sample was found to be 49. What is the sample standard deviation?

7 (square root of 49)

What does A overbear ​denote?

Event Upper A overbarA denotes the complement of event​ A, meaning that Upper A overbarA consists of all outcomes in which event A does not occur

Take x and x-intercept and add them together.

How do you find ^y?

Why is it important to learn about bad graphs?

So that we can critically analyze a graph to determine whether it is misleading

Determining outliers

Standardized values (z-scores) can be used to identify outliers. It is recommended to treat any data value with a z-score less than -3 or greater than +3 as an outlier. Such data values can then be reviewed for accuracy and to determine whether they belong in the data set.

A _____ exists between two variables when the values of one variable are somehow associated with the values of the other variable

correlation

A _______ exists between two variables when the values of one variable are somehow associated with the values of the other variable.

correlation

The value of a ____ ranges between -1 and 1.

correlation coefficient

midrange

data set = the MOC that is the value midway between maximum and minimum values in the original data set; found by adding maximum data value to minimum data value / 2

A​ _______ is a graph of each data value plotted as a point.

dotplot

How to find mean in odd N or n

find the middle value

For a scatterplot, the strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall exactly on a ____.

straight line

The greater the absolute value of a correlation coefficient, the ____ the linear relationship.

stronger

Class width is found by ___.

subtracting a lower class limit from the next consecutive lower class limit

How to find mean in even N or n

take the mean of the middle 2 values

relative frequency ​distribution

the frequency of a class is replaced with a proportion or percent.

mode

the most frequently occurring data value and is the appropriate measure of center for nominal data.

Does the frequency distribution appear to have a normal​ distribution? Explain.

​Yes, because the frequencies start​ low, proceed to one or two high​ frequencies, then decrease to a low​ frequency, and the distribution is approximately symmetric.

The Empirical Rule applies to distributions that are ________.

Symmetric and unimodal

Suppose we have 10 exam scores for an introductory statistics course. The scores are 68, 88, 84, 99, 96, 77, 76, 80, 75, 68. One of the intervals of interest is the interval 70 to 90 (where a score of 70 is included in this interval, but 90 is not). Based on the given information, what is the relative frequency for the interval 70 to 90 in this particular class?

.60

Look at #3 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?

3 65.45, 68.45, 71.45, 74.45, 77.45, 80.45, 83.45, 86.45, 89.45, 92.45 63.95, 66.95, 69.95, 72.95, 75.95, 78.95, 81.95, 84.95, 87.95, 90.95, 93.95

Fill in the blank. A​ _______ is a graph of each data value plotted as a point.

A dotplot is a graph of each data value plotted as a point.

SUBSET

ALL THE NUMBER OF ONE SET BELONG TO ANOTHER.

In statistics, variables are ______.

Characteristics of people or things

Class Boundaries

Class Boundaries numbers are used to separate the classes so that there are no gaps in the frequency distribution.

Which sampling method divides the population up into​ sections, randomly selects some of those​ sections, then chooses all the members from the selected sections to​ study?

Cluster

Determine whether the given value is from a discrete or continuous data set. The time it takes a computer to complete a task. Continuous Discrete

Continuous

Determine whether the given variable is discrete or continuous: The weight of a randomly selected suitcase at O'Hare airport.

Continuous

Identify the variable as either continuous or discrete: The height of a randomly selected maple tree.

Continuous

Identify the variable as either discrete or continuous. The temperature of a randomly selected cup of coffee.

Continuous

Height of a child...

Continuous because it is not countable

Identify which type of sampling is used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day Systematic Stratified Convenience Cluster Simple Random

Convenience

A study of an association between which ear is used for cell phone calls and whether the subject is​ left-handed or​ right-handed began with a survey​ e-mailed to 5000 people belonging to an otology online​ group, and 717 surveys were returned.​ (Otology relates to the ear and​ hearing.) What percentage of the 5000 surveys were​ returned? Does that response rate appear to be​ low? In​ general, what is a problem with a very low response​ rate?

Convert to percentage 14%. It appears to be low. It creates a serious potential for getting a biased sample that consists of those with a special interest in the topic.

z-score

Describes the exact location of a score in a distribution relative to the mean. Aka Standard Score; how many standard deviations you are away from the norm. Used to make different distributions, or metric scales, comparable.

Suppose every student in a class is surveyed and it is reported that​ 75% of the class plans to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Descriptive​ statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.

What does it mean if a statistic is resistant?

Extreme values (very large or small) relative to the data do NOT affect its value substantially

outliers

Extreme values that don't appear to belong with the rest of the data.

Which measure of variation is very sensitive to extreme​ values?

Extreme values will affect the value of the range.

Here are 3 boxplots of weekly gas prices at a service station in the United States (price in $ per gallon). Compare the distribution of prices over the three years.

Gas prices have been increasing on average over the 3-year period, and the variation overall has been increasing as well. The distribution has been right-skewed, and there were 3 potential outliers in 2005.

We utilize statistical_____ to look for features that reveal some useful or interesting characteristics of the data set

Graphs

Piechart

Has sectors, each is proportional to each frequency of a category. Requires: key (legend), title, colors

A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. Suppose he performed a hypothesis test to test his claim. In the context of the problem, what would happen if the student made a Type I Error?

He claims that more than 10% of students in the introductory Chemistry class dropped before the midterm when, in fact, 10% (or less) actually dropped.

Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 0.720 ​in., and use a class width of 0.010 in. The screws were labeled as having a length of 3/4 in.

Length frequency 0.720-0.729 2 0.730-0.739 3 0.740-0.749 11 0.750-0.759 11 0.760-0.769 3

Frequency Distribution

Lists all categories of data and number of occurrences for each data category

Use the given qualitative data to construct the relative frequency distribution. The 2445 people aboard a ship that sank include 325 male survivors, 1661 males who died, 322 female survivors, and 137 who died. Find the relative frequency for male survivors, males who died, female survivors, and females who died.

Male Survivors: 13.3% Males who died: 67.9% Female survivors: 13.2% Females who died: 5.6%

The following data represent the weights​ (in grams) of a simple random sample of a candy. 0.90 0.87 0.83 0.92 0.90 0.86 0.86 0.87 0.81 0.84 Determine the shape of the distribution of weights of the candies by drawing a frequency histogram and computing the mean and the median. Which measure of central tendency best describes the weight of the​ candy?

Mean: 0.866 Median: 0.865 Which tendency described the weight of the candy better? A: Mean

The null hypothesis is always a statement about what?

Population parameter

EVENT

SUBSET OF SAMPLE SPACE.

The data are continuous because the data can take on any value in an interval.

State whether the data described below are discrete or​ continuous, and explain why. The exact ages in hours of different cockroaches found in a certain city.

The data are continuous because the data can take on any value in an interval (no set distance between chairs).

State whether the data described below are discrete or​ continuous, and explain why. The exact distances (in centimeters) between the chairs in a college classroom.

u

THE SYMBOL FOR THE POPULATION IS

TOUCH

The bars in a histogram​ __________.

Frequency

The frequency of a class then is the number of data values contained in a specific class.

Which of the following is NOT a characteristic of the mean?

The mean is called the average by statisticians.

What does P(B|A) represent?

The probability of event B occurring after it is assumed that event A has already occurred.

How to Find the Median

To find the median, first sort the values (arrange them in order), and then follow one of these two procedures: 1.) If the number of data values is odd, the median is the number located in the exact middle of the sorted list. 2.) If the number of data values is even, the median is found by computing the mean of the two middle numbers in the sorted list.

What is a designed experiment?

When a researcher assigns individuals to a certain group intentionally changing the value of an explanatory variable, and then recording the value of the response for each group

True or​ False: When comparing two​ populations, the larger the standard​ deviation, the more dispersion the distribution​ has, provided that the variable of interest from the two populations has the same unit of measure.

True, because the standard deviation describes how​ far, on​ average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical​ value, and​ therefore, more dispersed.

Three cards are drawn with replacement from a standard deck, and the number of kings is noted. Does this constitute a binomial experiment? Why or why not?

Yes, because there are three independent draws. For each draw there are two outcomes (king and not king) and a constant probability of getting a king.

Look at #26 chart and answer the questions: Construct a histogram on the calculator. Do the data appear to have a distribution that is approximately normal?

Yes. It is approximately normal.

Events that are disjoint

cannot occur at the same time

The heights of the bars of a histogram correspond to _____________ values.

frequency

The heights of the bars of a histogram correspond to - values.

frequency

A correlation of 0 does not mean zero relationship between two variables; rather, it means zero ____.

linear relationship

midrange formula

maximum data value + minimum data value / 2

Variance

measure of variation equal to the square of the stand deviation.

Variation

measurement of the values varying(CVDOT)

mode

most common number in a set of date

A magician claims he can cause a coin to come up heads more than 50% of the time. A coin is flipped 50 times, and 44 heads come up. Determine the alternative hypothesis.

p>0.50

variance

total spread of data; isn't very useful 2 types - population varience: sigma squared -sample variance: S-squared

A large amount of scatter in a scatterplot is an indication that the association between the two variables is _______.

weak

mean

what we expect to happen; average sample mean: bar x; pop mean: mu

when to use mode for best measure of central tendency

when data is nominal or ordinal

multimodal

when more than two data values occur with same greatest frequency; each one is a mode

Frequency formula

# of frequencies/total frequencies *100 =

Ordinal

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Ranks of scores in a tournament. Nominal Interval Ratio Ordinal

1. When you need to find a proportion between a negative (-) & positive (+) z-score:

Go to *mean-to-z column* for each Z.; Find proportions and add together.

What type of data values are quantitative and the number of values is finite or countable? Interval Discrete Categorical Continuous

Discrete

The empirical rule

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean.

Range rule of thumb

For many data sets, the vast majority of sample values lie within 2 standard deviations of the mean.

The coefficient of determination. It is proportion of the variation in y that is explained by the linear relationship between x and y. r^2 = explained variation/total variaion

What is r^2?

The regression line; regression equation.

What is the best fitting line (10.3) that fits a scatterplot of sample data? And its equation?

The frequency distribution refers to the a data set of 30 screw lengths. The screws had been labeled as having a length of 3-3/4 in. It begins with a lower class limit of 3.720 ​inches and uses a class width of 0.010 inches. If displayed in a Histogram format, the data would have a left tail, or would be "skewed to the left" or "negatively skewed".

What is the class width? Would a Histogram be normal (bell shaped) or described as another term? If so, please define.

In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as

a nonzero axis.

In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as​ _______.

a nonzero axis.

A conditional probability of an event is

a probability obtained with knowledge that some other event has already occurred

Z-scores are turned into

a standard score. The purpose of z-scores is to identify and describe the exact location of each score in a distribution & to standardize an entire distribution to understand & compare scores from different tests.

Measure of center

a value at the center or middle of a data set

measure of center

a value at the center or middle of a data set. There are several different ways to determine the​ center, so there are different definitions of measures of​ center, including the​ mean, median,​ mode, and midrange

numerical summary of data is said to be resistant if...

extreme values (very large or small) relative to the data do not affect its value substantially

Q scores that separate the unusual IQ scores from those that are​ usual

find min and max based on mean and SD

The heights of the bars of a histogram correspond to ___ values.

frequency

We utilize statistical - to look for the features that reveal some useful or interesting characteristics of the data set.

graphs

The margin of error is _____________ the width of the confidence interval.

half

The table shows the magnitudes of the earthquakes that have occurred in the past 10 years. Use the frequency distribution to construct a histogram. Does the histogram appear to be​ skewed? If​ so, identify the type of skewness.

has a longer right tail, ​, skewed to the right.

right skew

high outliers; most data is on the left EX: salaries in the United States; a lot of people make similar amounts of money but few people make millions or billions

measures of spread

how far off will we be? includes range, variation and standard deviation; all are nonresistant to outliers; use in symmetric data with no outliers

Two events A and B are ___ if the occurrence of one does not affect the probability of the occurrence of the other.

independent

Biased samples

internet polls, in which people online can decide whether to respond mail-in poll, in which subjects can decide whether to reply telephone call in polls, in which newspaper, radio, or television announcements ask that you voluntarily call a special number to register your opinion

standard deviation

is a measure of how much data values deviated from the mean. A measure of variability that describes an average distance of every score from the mean.

A parameter

is a numerical summary of a population

A statistic

is a numerical summary of a sample

Population arithmetic mean, μ(pronounced "mew")

is computed using all the individuals in a population.The population mean is a parameter

The Pearson product-moment correlation coefficient only measures ____ relationships.

linear

When performing a linear regression analysis, it is important that the relationship between the two quantitative variables be _______.

linear

The ________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.

linear correlation coefficient r

The __________ measures the strength of the linear correlation between the paired quantitative x- and y- values in a sample.

linear correlation coefficient r

A z score​ (or standard score or standardized​ value) is the number of standard​ deviations, s or σ​, that a given value x is above or below the​ mean x or μ. The z score is calculated by using one of the equations shown below.

look on desktop

left skew

low outliers; most data on the right side EX: GPA's most people are around the same GPA, few are very low

range

maximum-minimum

What measure of central tendency best describes the​ "center" of the​ distribution when the graph is symmetrical

mean

Population arithmetic mean, and it's symbol

mean computed by using all individuals in a population, symbol is "mew"

Sample arithmetic mean

mean using sample data, symbol is "x-bar"

measures of center

mean, med, mode, mid-point

A concrete mix is designed to withstand 3000 pounds per square inch​ (psi) of pressure. The following data represent the strength of nine randomly selected casts​ (in psi). 3970​, 4100​, 3200​, 3100​, 2950​, 3840​, 4100​, 4030​, 3650 Compute the​ mean, median and mode strength of the concrete​ (in psi).

mean: 3660 median: 3840 mode: 4100

A value at the center or middle of a data set is a(n) _____.

measure of center

A value at the center or middle of a data set is​ a

measure of center

A value at the center or middle of a data set is​ a(n) _______.

measure of center

When an odd number of data values are arranged in order, the _________ is the middle value.

median

What are two measures of the center of a distribution?

median and mean

Which measures of central tendencies are resistant

median and mode

Descriptive statistics

methods and tools that summarize or describe relevant characteristics of data.

inter-quartile range contains the

middle 50% of all observatoins

Formula for Midrange

midrange= (maximum data value + minimum data value) / 2

A certain group of test subjects had pulse rates with a mean of 84.1 beats per minute and a standard deviation of 14.0 beats per minute. Would it be unusual for one of the test subjects to have a pulse rate of 92.1 beats per​ minute? Recall that if the standard deviation is​ known, it can be used to find rough estimates of the minimum and maximum​ "usual" sample values by using the following equations.

minimum "usual" value = (mean)-2(standard deviation) maximum "usual" value = (mean)+2(standard deviation)

The measure of center that is the value that occurs with greatest frequency is the _________

mode

The measure of center that is the value that occurs with the greatest frequency is the -

mode

The measure of center that is the value that occurs with the greatest frequency is the ____.

mode

The measure of center that is the value that occurs with the greatest frequency is the _____.

mode

Sample Means of the same populations are more what?

more consistent than other types of measure of centers

A distribution of data is symmetric if

the left half of its histogram is roughly a mirror image of its right half. In this​ case, the​ mean, median, and mode are the same.

Two events A and B are independent if

the occurrence of one does not affect the probability of the occurrence of the other.

Standard deviation allows you

to see how spread out or concentrated the data in a bell curve is, should be able to pic which graphs go with which µ and "x-bar" and σ

The bars in a histogram​ _______.

touch A histogram is a graph consisting of bars of equal width drawn adjacent to each other​ (without gaps).​ Therefore, the bars touch

A data value is considered​ _______ if its​ z-score is less than −2 or greater than 2.

unusual

When calculating standard deviation

use the calculator and the handouts for ch. 3

Median is, symbol is

value that lies in the middle of the data when arranged in ascending order. M is the symble

Mode

variable that is most the most freequent observation, N or n's can be no mode, single mode, bimodal or multimodal

The square of the standard deviation is called the ____________

variance

The square of the st. dev. is called the ___.

variance.

The square of the standard deviation is called the

variance.

Which characteristic of data is a measure of the amount that the data values vary?

variation

The ____ linear relationship is indicated by a correlation coefficient equal to 0.

weakest

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a​ _______.

z-score.

Solve the problem.A variable x has a mean, μ, of 10 and a standard deviation, σ, of 7. Determine the standardized version of x.

z= (x-10)/7

Which z-score has the smallest p-value? z=0.51 z=−1.58 z=−2.37 z=−3.49

z=−3.49

Σxi

{sum of}{all x values}

What is the symbol used to represent the population​ mean?

μ

What is the formula to find the mean of all values in a population?

μ = Σx / N

What is the symbol for population standard deviation?

σ

Relative Frequency

Proportion of observations within a category

Is the number of hits to a website in a day discrete or continuous?

The random variable is discrete.

Is the number of people in line at a box office to purchase theater tickets discrete or continuous?

The random variable is discrete.

Is the number of people with blood type A in a random sample of 45 people discrete or continuous?

The random variable is discrete.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Length of the side of a square in cm

The ratioratio level of measurement is most appropriate because the data can becan be ordered commaordered, differences left parenthesis obtained by subtraction right parenthesisdifferences (obtained by subtraction) can becan be foundfound and areand are meaningful commameaningful, and thereand there is ais a naturalnatural startingstarting zerozero point.point.

explanatory variable; response variable.

The regression equation expresses a relationship between x (called the __________) and y (_______________).

A particular country has 60 total states. If the areas of all 60 states area added and then the sum is divided by 60, the result 193,950 square kilometers. Determine whether this result is a statistic or a parameter

The result is a parameter because it describes some characteristics of a population

Deviation score

score minus the mean = how much the score deviates from the mean.

A histogram aids in analyzing the ___ of the data.

shape of the distribution

A histogram aids in analyzing the _______ of the data.

shape of the distribution

Z-scores that seperate usual from unusual

-2 to 2

Identify the level of measurement of the data, and explain what's wrong with the calculation: In a survey, the respondents are identified as 100 for "yes", 200 for "no", 300 for "maybe", and 400 for anything else. The average is calculated for 652 respondents and the result is 256.1

-The data are at the nominal level of measurement -Such data are not counts or measures of anything, so it makes no sense to compute their average

Section 2.2 Homework

...

Identify the lower class​ limits, upper class​ limits, class​ width, class​ midpoints, and class boundaries for the given frequency distribution. Also identify the number of individuals included in the summary. 1. Identify the lower class limits. 2. Identify the upper class limits. 3. Identify the class width. 4. Identify the class midpoints. 5. Identify the class boundaries. 6. Identify the number of individuals included in the summary.

1. 100, 200, 300, 400, 500 2. 199, 299, 399, 499, 599 3. 100 4. 149.5, 249.5, 349.5, 449.5, 549.5 5. 99.5, 199.5, 299.5, 399.5, 499.5, 599.5 6. 140

Binomial probability distribution

1. 2 outcomes (yes or no answer) 2. fixed number of trials 3. same probability of success on each trial 4. independent trials (outcome of 1 doesn't effect another

TV viewing example: Compute Quartiles

1. Data in ascending order 2. Find quartiles a. Median=Q2 n=20 data values, so M=middle two data values/2 SO, Q2=M=30.5 b. Bottom half (n=10) so the median of that half=Q1 M=middle two data values/2 SO, Q1=23 c. Upper half (n=10) so the median of that half=Q3 M=middle two data values/2 SO, Q3=36.5

How do the five numbers describe data set:

1. Median describes middle of data set 2. Info about the spread: Having the IQR because you have Q3 AND Q1, you can get measure of dispersion(variation), by dividing IQR BY 2 3.xmin and xmax will give you info about the distribution, about whether or not you have outliers.

5 Number summary

1. Minimum 2. First​ quartile, Q1 3. Second​ quartile, Q2​ (same as the​ median) 4. Third​ quartile, Q3 5. Maximum

Cans of regular soda have volumes with a mean of 12.31 oz. and a standard deviation of 0.11 oz. It is unusual a can to contain 12.41 oz of soda? Minimum "usual" value= -- oz Maximum "usual" value= --- oz Is 12.41 oz an "unusual" volume?

12.09 12.53 No, because it is between the minimum and maximum "usual" value.

Find the population mean or sample mean as indicated. ​Sample: 22​, 18​, 6​, 13​, 6

13

95% of values in a normal distribution fall within

2 Standard deviations [95-68= 27/2 = 13.5] >> (13.5% | 34% () 34%|13.5%)

The following frequency distribution shows the number of years of service for employees of the Alpha Corporation: Class Limits (years of service): frequency (# of employees) 1-5: 5 6-10: 20 11-15: 25 16-20: 10 21-25: 5 26-30: 3 What is the class width?

5

Which of the following is used to summarize two potentially related categorical variables?

A two-way table

Relative Frequencies A histogram is a graph consisting of bars of equal width drawn adjacent to each other​ (without gaps). A relative frequency histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies​ (as percentages or​ proportions) instead of actual frequencies.

A​ _______ histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

scatterplot

A​ _______ is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

scatterplot

A​ _____________________ is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

A mutual fund rating agency ranks a​ fund's performance by using one to five stars. A​ one-star mutual fund is in the bottom​ 20% of its investment​ class; a​ five-star mutual fund is in the top​ 20% of its investment class. Interpret the meaning of a​ four-star mutual fund.

A​ four-star fund is in the 4th quintile of the funds. That​ is, it is above the bottom​ 60%, but below the top​ 20% of the ranked funds.

The manufacturer of a certain vehicle recovery system claims that the probability that a stolen vehicle using its product will be recovered is 87%. What is the probability that exactly 9 out of 10 independently stolen vehicles with this product will be recovered?

B(n,p,x) = b(10, .87, 9).

What are Pareto charts?

Bar charts that are sorted from most frequent to least frequent

Denotes the Median.

Fill in the blank. In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as​ _______.

In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as a nonzero axis.

Suppose babies born after a gestation period of 32 to 35 weeks have a mean weight of 2500 grams and a standard deviation of 600 grams while babies born after a gestation period of 40 weeks have a mean weight of 2900 grams and a standard deviation of 390 grams. If a 35​-week gestation period baby weighs 2750 grams and a 41​-week gestation period baby weighs 3150 ​grams, find the corresponding​ z-scores. Which baby weighs more relative to the gestation​ period?

The baby born in week 41 weighs relatively more since its​ z-score, .64 . 64​, is larger than the​ z-score of .42 . 42 for the baby born in week 35.

Categorical Frequency Distribution

The categorical frequency distribution is used for data that can be placed in specific categories, such as nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation, or major field of study would use categorical frequency distributions.

Which of the following statements correctly describes the complement of event E?

The complement of event E is the set of outcomes which are in the sample space but not in event E.

Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 51.1 miles per hour.

The computed mean is not close to the actual mean because the difference between the means is morethan​ 5%.

Whenever a data value is less than the mean,_____.

The corresponding z-score is negative.

Are the data reported or​ measured?

The data appears to be measured. The heights occur with roughly the same frequency or The data appears to be reported. Certain heights occur a disproportionate number of times.

State whether the data described below are discrete or continuous, and explain why: The temperatures (in degrees Fahrenheit) of pizzas fresh the from oven

The data are continuous because the data can take any value in any interval

If the standard deviation for a data set is zero, what can you conclude about the data?

The data values must all be equal.

Look at #31 chart and answer the questions: Construct a histogram on the calculator. Which part of the histogram depicts flights that arrived early, and which part depicts flights that arrived late?

The two leftmost bars depict flights that have arrived early, and the other bars to the right depict flights that arrived late.

A magazine advertisement claims that wearing a magnetized bracelet will reduce arthritis pain in those who suffer from arthritis. A medical researcher tests this claim with 233 arthritis sufferers randomly assigned to wear either a magnetized bracelet or a placebo bracelet. The researcher records the proportion of each group who report relief from arthritis pain after 6 weeks. After analyzing the data, he fails to reject the null hypothesis. What are valid interpretations of his findings?

There were no statistically significant differences between the magnetized bracelets and the placebos in reducing arthritis pain. There's insufficient evidence that the magnetized bracelets are effective at reducing arthritis pain.

Why are percentages or rates often better than counts for making comparisons?

They take into account possible differences among the sizes of the groups.

Which of the following is NOT true about statistical graph. a. Similar graphs can be constructed in order to compare data sets. b. They utilize areas or volumes for data that are one-dimensional in nature. c. They can be used to consider the overall shape of the distribution. d. They can be used to identify extreme data values.

They utilize areas or volumes for data that are one-dimensional in nature. (Utilizing 2-or 3- dimensional pictures to represent 1- dimensional data is poor practice and distorts the data.

The interquartile range (IQR) is the difference between the _______ quartile and the _______ quartile.

Third, first

With a height of 70 ​in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1 in and a standard deviation of 2.4 in. a. What is the positive difference between Roger​'s height and the​ mean? b. How many standard deviations is that​ [the difference found in part​ (a)]? c. Convert Roger​'s height to a z score. d. If we consider​ "usual" heights to be those that convert to z scores between −2 and​ 2, is Roger​'s height usual or​ unusual?

To find the positive difference between Roger​'s height and the​ mean, subtract the mean from Roger​'s height and find the absolute value of the difference. 70 cm - 75.1 cm =5.1 in b. To determine how many standard deviations the difference​ is, compare the​ difference, 5.1, to the standard​ deviation, 2.4 5.1 Over 2.4 ≈2.13 standard deviations c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. Sample Population z= x- x overbar Over s or z=x- μ over σ The club is a population.​ Therefore, to convert Roger​'s height to a z​ score 70-75.1 divide 2.4 = -2.13

Why is random assignment used to assign people to treatment groups and control groups in a controlled experiment?

To make the groups as similar as possible, minimizing bias.

True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.

True. A relative frequency histogram will have a different scale on the y-axis but the same shape as a regular histogram.

Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. An experiment was conducted to determine whether a deficiency of carbon dioxide in the soil affects the phenotype of peas. Listed below are the phenoytype codes where 1=smooth-yellow, 2= smooth-green, 3=wrinkled-yellow, and 4= wrinkled-green. Do the results make sense? 1 1 4 4 1 4 1 1 3 2 2 2 1 2 (a). The mean phenotype code is -- (b). The median phenotype code is - (c). The mode phenotype code is - (d). The midrange of the phenotype codes is --. Do the measures of center make sense?

a. 2.1 b. 2 c. 1 d. 2.5 Only the mode makes sense since the data is nominal.

Listed below are the top 10 annual salaries (in millions of dollars) of TV personalities. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data in millions of dollars. Given that these are the top 10 salaries, do we know anything about the salaries of TV personalities in general? Are such top 10 lists valuable for gaining insight into the larger population? 39, 35.5, 34.3, 27.5, 15, 12.4, 11.1, 9.8, 10.4, 7.6 a. The mean is --- b. The median is -- c. Select the correct choice below and fill in any answer boxes in your choice. d. The midrange is -- Given that these are the top 10 salaries, do we know anything about the salaries of TV personalities in general? Are such top 10 lists valuable for gaining insight into the larger population.

a. 20.26 b. 13.7 c. there is no mode d. 23.3 Since the sample values are the 10 highest, they give almost no information about the salaries of 10 personalities in general. No, because such top 10 lists represent an extreme subset of the population rather than the larger population.

Listed below are the durations (in hours) of a simple random sample of all flights of a space shuttle. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Is there a duration time that is very unusual? How might that duration time be explained? 69 98 240 197 167 258 187 379 262 240 382 328 226 244 0 a. The mean is -- hours. b. The median is -- hours. c. The mode is --- hours. d. The midrange is --- hours. Is there a duration time that is very unusual? How might that duration time be explained?

a. 218.5 b. 240 c. 240 d. 191 Yes, the time of 0 hours is very unusual. It could represent a flight that was aborted.

The graph to the right compares teaching salaries of women and men at private colleges and universities. What impression does the graph​ create? Does the graph depict the data​ fairly? If​ not, construct a graph that depicts the data fairly. a.What impression does the graph​ create? b.Does the graph depict the data​ fairly?

a. The graph creates the impression that men have salaries that are more than twice the salaries of women. b. ​No, because the vertical scale does not start at zero.

Identify the symbols used for each of the following: (a) sample standard deviation; (b) population standard deviation; (c) sample variance; (d) population variance. a. The symbol for sample standard deviation is - b. The symbol for population standard deviation is - c. The symbol for sample variance is - d. The symbol for population variance is -

a. s b. theta c. s^2 d. theta^2

Arithmetic mean

adding all values of variables and dividing by number of variables

mean

an average; arithmetic mean of a set of data = the measure of center found by adding the data values and dividing the total by the number of data values

the symbol for sample standard deviation is

s

The table below shows the frequency distribution of the rainfall on 52 consecutive Saturdays in a certain city. Use the frequency distribution to construct a histogram. Do the data appear to have a distribution that is approximately​ normal?

​No, it is not symmetric.

What measure of variation is very sensitive to extreme values?

The Range.

frequency

The _______ for a particular class is the number of original values that fall into that class.

Determine whether the data described below are qualitative or quantitative and explain why: The types of climates for different regions (tropical, arid, temperate, etc.)

The data are qualitative because they don't measure or count anything

Listed below are body temperatures ​(°​F) of healthy adults. Why is it that a graph of these data would not be very effective in helping us understand the​ data? 98.6 98.6 98.0 98.0 99.0 98.4 98.4 98.4 98.4 98.6

The data set is too small for a graph to reveal important characteristics of the data.

Range

The difference between the maximum data value and the minimum data value. Very sensitive to extreme values and isn't as useful as other measures of variation.

Class Width

The difference between two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution (example pg. 47).

1) Below are the range and standard deviation for a set of data. Use the range rule of thumb and compare it to the standard deviation listed below. Does the range rule of thumb produce an acceptable​ approximation? Suppose a researcher deems the approximation as acceptable if it has an error less than​ 15%. Range equals= 38 Standard Deviations= 11.045

The estimated standard deviation is 9.5. (to get this, take the range and divide by 4)

A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the faculty want to give the community the impression that they deserve higher salaries, should they advertise the mean or median of their current salaries?

The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.

A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over​ $100,000 per year. If the faculty want to give the community the impression that they deserve higher​ salaries, should they advertise the mean or median of their current​ salaries?

The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.

Suppose you are testing someone to see whether he or she can tell butter from margarine when it is spread on toast. You use many bite-sized pieces selected randomly, half from buttered toast and half from toast with margarine. The taster is blindfolded. The null hypothesis is that the taster is just guessing and should get about half right. When you reject the null hypothesis when it is actually true, that is often called the first kind of error. The second kind of error is when the null is false and you fail to reject. Report the first kind of error and the second kind of error.

The first kind of error is saying the person can tell butter from margarine when in fact he or she cannot. The second kind of error is saying the person cannot tell butter from margarine when in fact he or she can.

Yes, it is approximately normal. The bars in a Histogram are always touching, and an (approximately) normal Histogram is bell-shaped.

The frequency distribution (above) represents frequencies of actual low temperatures recorded during the course of a​ 31-day month. Use the frequency distribution histogram to determine if the distribution is approximately​ normal?

Look at the #46 charts and answer the questions: Applying a loose interpretation of the requirements for a normal distribution, does the data appear to be normally distributed? Why or why not?

The frequency polygon appears to roughly approximate a normal distribution because the frequencies increase to a maximum, then decrease, and the graph is roughly symmetric.

What is an ogive?

A graph that represents the cumulative frequency or cumulative relative frequency for the class

The ___ of a discrete random variable represents the mean value of the outcomes.

expected value

in a variable, is the amount that it changes when the other variable changes by exactly one unit.

What is marginal change?

The value that measures how much variation in the response variable is explained by the explanatory variable is called the _______.

Coefficient of determination

5-number summary

1. Minimum. 2. First quartile, Q1. 3. Second quartile, Q2 (same as the median). 4. Third quartile, Q3. 5. Maximum.

Standardizing a distribution has two steps:

1. Original raw scores transformed to z-scores. 2. The z-scores are transformed to new X values so that the specific mew or mean & sigma/standard deviation are attained.

Find the mean for the given sample data. Unless otherwise specified, round your answer to one more decimal place than that used for the observations. The grocery expenses for six families were 55.72, 55.08, 76.11, 54.18, 63.56, 85.72 Compute the mean grocery bill. Round your answer to the nearest cent.

$65.06

Scores of an IQ test have a​ bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the empirical rule to determine the following. ​(a) What percentage of people has an IQ score between 85 and 115​? ​(b) What percentage of people has an IQ score less than 55 or greater than 145​? ​(c) What percentage of people has an IQ score greater than 130​?

(a) 68% (b) .30% (c) 2.5%

What are three important properties of the Mean?

1. Samples means drawn fromt he same population tend to vary less than other measures of center. 2. The mean of a data set uses every data value. 3. A disadvantage of the mean is that just on outlier can change the value of the mean substantially.

Explain the meaning of the accompanying percentiles. ​(a) The 5th percentile of the head circumference of males 3 to 5 months of age in a certain city is 41.5 cm. ​(b) The 90th percentile of the waist circumference of females 2 years of age in a certain city is 49.8 cm. ​(c) Anthropometry involves the measurement of the human body. One goal of these measurements is to assess how body measurements may be changing over time. The following table represents the standing height of males aged 20 years or older for various age groups in a certain city in 2015. Based on the percentile measurements of the different age​ groups, what might you​ conclude?

(a)5​% of​ 3- to​ 5-month-old males have a head circumference that is 41.5 cm or less (b)90​% of​ 2-year-old females have a waist circumference that is 49.8 cm or less (c)At each​ percentile, the heights generally decrease as the age increases. Assuming that an adult male does not grow after age​ 20, the percentiles imply that adults born in 1990 are generally taller than adults at the same age who were born in 1950.

sample size

(n); the number of data values

z score

(or standardized value) the number of standard deviations that a given value x is above or below the mean. It converts a value to a standardized scale. Round-off to two decimal places.

3. When you need to find the P that is *greater* than a positive Z or a negative Z you will go to the:

*tail column*. Easy way to remember is it's the only one that doesn't include the mean.

Important Properties of the Midrange

- Because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes. - In practice, the midrange is rarely used, but it has three redeeming features: 1.) The midrange is very easy to compute. 2.) The midrange helps reinforce the very important point that there are several different ways to define the center of a data set. 3.) The value of the midrange is sometimes used incorrectly for the median, so confusion can be reduced by clearly defining the midrange along with the median.

Important Properties of the Mean

- Sample means drawn from the same population tend to vary less than other measures of center. - The mean of a data set uses every data value. - A disadvantage of the mean is that just one extreme value (outlier) can change the value of the mean substantially. (Since the mean cannot resist substantial changes caused by extreme values, we say that the mean is not a resistant measure of center.)

IQ scores are measured with a test designed so that the mean is 93 and the standard deviation is 19.atider the group of IQ scores that are unusual. What are the z scores that separate the unusual IQ scores from those that are usual? What are the IQ scores that separate the unusual IQ scores from those that are usual? (Consider a value to be unusual if its z score is less than -2 or greater than 2.) What are the z scores that separate the unusual IQ scores from those that are usual? The lower z score boundary is? The higher z score boundary is? What are the IQ scores that separate the unusual IQ scores from those that are usual? The lower bound IQ score is? The higher bound IQ score is?

-2 2 55 131

Five-Number Summary

-five numbers used to summarize the data set 1.SDV-MINIMUN=xmin 2.Lower quartile=QL=Q1=P25 3.MIddle quartile =Median= M =Q2=P50 4.Upper quartile=QU=Q3=P75 5.LDV=MAXIMUM=xmax

Section 2.3 Homework

...

Section 2.4 Homework

...

z-scores

... Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation

A frequency table of grades has five classes​ (A, B,​ C, D,​ F) with frequencies of 3​, 10, 14​, 8​, and 2 respectively. What are the relative frequencies of the five​ classes?

.08 .27 .38 .22 .05

In a recent year the magnitudes (Richter scale) of 10,594 earthquakes were recorded. The mean is 1.315 and the standard deviation is 0.589. Consider the magnitudes that are unusual. What are the magnitudes that separate the unusual magnitudes from those that are usual? (Consider a value to be unusual if its z score is less than -2 or greater than 2.) What are the magnitudes that separate the unusual magnitudes from those that are usual? The lower bound earthquake magnitude is The higher bound earthquake magnitude is?

.137 2.493

Characteristics of mean

1. The mean is relatively reliable. 2. The mean takes every data value into account. 3. The mean is sensitive to outliers.

3 Properties of Standard Scores

1. The mean of a set of z-scores is always 0. 2. The standard distribution of a set of standardized scores is always 1. 3. The distribution of a set of standardized scores has the same shape as the original scores, the scaling is just different.

Listed below are the amounts of mercury (in parts per million, or pprm) found in tuna sushi sampled at different stores. Find the range, variance, and standard deviation for the set of data. What would be the values of the measures of variation if the tuna sushi contained no mercury? 0.93 0.38 0.87 0.59 0.68 0.15 0.41 The range of the sample data is -- ppm. Sample variance = --- ppm^2 Sample standard deviation = ---- ppm What would be the values of the measures of variation if the tuna sushi contained no mercury?

.78 .078 .280 The measures of variation would all be 0.

The sum of the deviations about the mean always equals

0 because observations greater than the mean will offset the observations less than the mean and cancel out to zero or close to zero

The data represents the daily rainfall (in inches) for one month. Construct a frequency distribution beginning with a lower class limit of 0.00 and use a class width of 0.20. Does the frequency distribution appear to be roughly a normal distribution? 0.39 0 0 0.28 0 0.56 0 0.18 0 0 1.36 0 0.16 0 0.01 0 0.16 0 0.11 0.42 0 0.01 0 0.27 0 0.11 0 0 0.15 0 Find the frequencies for daily rainfall in ranges: 0.00-0.19 0.20-0.39 0.40-0.59 0.60-0.79 0.80-0.99 1.00-1.19 1.20-1.39 Does the frequency distribution appear to be roughly a normal distribution?

0.00-0.19----------- 24 0.20-0.39------------ 3 0.40-0.59------------ 2 0.60-0.79------------ 0 0.80-0.99------------ 0 1.00-1.19------------ 0 1.20-1.39----------- 1 No, the distribution is not symmetric, the frequencies do not start off low.

John sets up a one sample z-test for proportions with a significance level of 0.05. He then performs the test and rejects the null hypothesis. The probability he correctly rejected the null hypothesis is 0.80. What is the probability of a Type I Error occurring? 0.05 0.80 0.20 A Type I Error cannot occur when the null hypothesis is rejected.

0.05

Which of the following values of the correlation coefficient indicates the weakest linear relationship between two variables?

0.1 (zero has no linear correlation, 1 has percent postive linear correlation)

Compute the coefficient of determination. Round your answer to four decimal places. A regression equation is obtained for a set of data points. It is found that the total sum of squares is 26.961, the regression sum of squares is 15.044, and the error sum of squares is 11.917.

0.5580 (r^2: coefficient of determination = SSR/SST) (Total sum of squares, SST: the total variation in the observed values of the response variable) (error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression) (Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression)

Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained a 95% confidence interval of (0.52,0.68) for the proportion of all adults in the city rooting for North High. What proportion of the 150 adults in the survey said they were rooting for North High School?

0.60

Finding quartiles

1) Arrange the data in ascending order 2) Determine the median, M, or second quartile, Q2. 3) Determine the first and third quartiles, Q1 and Q3, by dividing the data set into two halves; the bottom half will be the observations below (to the left of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half.

Steps for determining a box plot

1) Determine the lower and upper fence Lower fence = Q1 - 1.5 (IQR) Upper fence = Q3 +1.5 (IQR) 2) Draw vertical lines at the Q1, M, and Q3. Enclose these lines in a box. 3) Label lower and upper fence 4) Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest value that is smaller than the upper fence. 5) Any data values that are outliers (less than the lower fence and greater than the upper fence) get marked with an asterisk (*)

Normal Distribution Characteristics

1) The frequencies start low, then increase to one or two high frequencies, then decrease to a low frequency. 2) The distribution is approximately symmetric, with frequencies preceding the maximum being roughly a mirror image of those that follow the maximum (example pg. 50).

Name procedures you could follow to obtain a simple random sample of 5 students?

1)List each name on a separate piece of paper; place them all in a hat and pick five 2) Number the names from 1 to 427 and use a random number table to produce 5 different three digit numbers corresponding to the names selected

How to draw a B&W Plot

1. Determine the five-number summary (xmin,QL,M,QU,xmax) 2. Determine the outliers using the quartiles method 3. Determine the adjacent values S=smallest data value that is larger than LIF L=largest data value that is smaller than UIF S= will be less than QL L= will be larger than QU 4. Draw a horizontal number line and mark : QL,M,QU,S, and L 5. Draw vertical lines at QL, M, QU, and enclose these lines in a box 6. Connect Ql to the S and QU to the L with whiskers 7. Plot Outliers: MO with * and EO with o If data set does not have outliers (simple b&w plot): S=xmin (smallest data value) L=xmax (largest data value)

Heights of statistics students were obtained by a teacher as part of an experiment conducted for the class. The last digit of those heights are listed below. Construct a frequency distribution with 10 classes. Based on the​ distribution, do the heights appear to be reported or actually​ measured? 1. What can be said about the accuracy of the​ results? 0 0 0 0 0 0 0 0 0 1 2 2 3 4 4 4 5 5 5 5 5 5 5 5 5 6 6 8 8 8 9 2. Based on the​ distribution, do the heights appear to be reported or actually​ measured? 3. What can be said about the accuracy of the​ results?

1. Frequency: 9 1 2 1 3 9 2 0 3 1 2. The heights appear to be reported because there are disproportionately more 0s and 5s. 3. They are likely not very accurate because they appear to be reported.

Properties of z Scores

1. The number of standard deviations that a given value x is above or below the mean. 2. expressed as numbers with no units of measurement. 3. a data value is unusual if its z score is less than -2 or greater than +2. 4. If an individual data value is less than the mean, its z score is a negative number.

In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.

1.5

Use the magnitude (Richter Scale) of the earthquakes listed in the data set below. Find the mean and median of this data set. Is the magnitude of an earthquake measuring 7.0 on the Richter scale an outlier (data value that is very far away from the others) when considered in the context of the sample data given in this data set? Explain. 0.73 2.49 1.03 0.36 2.34 2.32 2.97 1.34 1.12 2.16 1.69 2.87 0.02 1.17 0.23 0.87 2.31 0.89 2.39 2.58 2.99 1.36 2.31 2.12 1.11 1.96 0.11 0.17 2.15 2.42 0.89 0.43 1.54 2.37 0.13 0.66 2.86 1.77 0.55 2.32 1.38 2.05 1.53 0.55 1.89 0.55 2.85 2.98 2.19 0.15 Find the mean and median of the data set using a calculator or similar data analysis technology. The mean of the data set is ----- The median of the data set is -------- Is the magnitude of an earthquake measuring 7.0 on the Richter scale an outlier when considered in the context of the sample data given?

1.564 1.615 Yes, because this value is very far away from all of the other data values.

Find the sample standard deviation for the given data. Round your final answer to one more decimal place than that used for the observations. The manager of a small dry cleaner employs six people. As part of their personnel file, she asked each one to record to the nearest one-tenth of a mile the distance they travel one way from home to work. The six distances are listed below. 24.6 14.1 39.9 48.0 18.5 17.1

13.78 mi

Six different second-year medical students at Bellevue Hospital measured the blood pressure of the same person. The systolic readings (in mmHg) are listed below. Find the range, variance, and standard deviation for the given sample data. If the subject's blood pressure remains constant and the medical students correctly apply the same measurement technique, what should be the value of the standard deviation? Range= - mmHg Sample variance= --- mmHg^2 Sample standard deviation= --- mmHg What should be the value of the standard deviation?

15 34.4 5.9 Ideally, the standard deviation would be zero because all the measurements should be the same.

The data represents the body mass index (BMI) values for 20 females. Construct a frequency distribution beginning with a lower class limit of 15.0 and use a class width of 6.0. Does the frequency distribution appear to be roughly a normal distribution? 17.7 33.5 26.9 22.5 24.9 28.9 22.8 18.3 27.8 22.6 19.2 22.4 21.2 37.7 40.4 27.7 44.9 30.3 29.1 21.7 Find the frequency for body mass indexes between: 15.0-20.9 21.0-26.9 27.0-32.9 33.0-38.9 39.0-44.9 Does the frequency distribution appear to be roughly a normal distribution?

15.0-20.9 ---------- 3 21.0--26.9 ---------- 8 27.0-32.9 -------------- 5 33.0-38.9 ---------- 2 39.0-44.9 -----------2 No, although the frequencies start low, increase to some maximum, then decrease, the distribution is not symmetric.

Twenty percent of adults in a particular community have at least a bachelor's degree. Suppose x is a binomial random variable that counts the number of adults with at least a bachelor's degree in a random sample of 100 adults from the community. If you are using the binomial probability formula, which of the following is the most efficient way to calculate the probability that fewer than 98 adults have a bachelor's degree, P(x<98)?

1−P(x=98)−P(x=99)-P(x=100)

A nurse measured the blood pressure of each person who visited her clinic. Following is a relative-frequency histogram for the systolic blood pressure readings for those people aged between 25 and 40. Use the histogram to answer the question. The blood pressure readings were given to the nearest whole number. Given that 300 people were aged between 25 and 40, approximately how many had a systolic blood pressure reading between 140 and 149 inclusive?

24 (.08*300)

Fuel consumption is commonly measured in miles per gallon (mi/gal). An agency designed new fuel consumption tests to be used starting with 2008 car models. Listed below are randomly selected amounts by which the measured MPG ratings decreased because of the new 2008 standards. Find the range, variance, and standard deviation for the sample data. Is the decrease of .4 mi/gal unusual? Why or why not? 2 2 3 1 4 2 4 1 2 2 2 2 1 2 2 2 2 2 2 2 The range of the sample data is - mi/gal. The variance of the sample data is -- The standard deviation of the sample data is --- mi/gal. Is the largest decrease, 4 mi/gal, unusual? Why or why not?

3 .6 .8 The decrease of 4 mi/gal is unusual because it is more than two standard deviations from the mean.

Look at #4 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?

3 6.45, 9.45, 12.45, 15.45, 18.45 4.95, 7.95, 10.95, 13.95, 16.95, 19.95

Using the information in the table on home sale prices in the city of Summerhill for the month of June, determine the width of each class. Class limits(sale price in thousands of $): Frequency(# homes sold) 80.0-110.9: 2 111.0-141.9: 5 142.0-172.9: 7 173.0-203.9: 10 204.0-234.9: 3 235.0-265.9: 1

31

Given the following frequency distribution, how many data values were more than 28.5? Class Boundaries Frequency -------------------------------------------- 13.5-18.5 4 18.5-23.5 9 23.5-28.5 12 28.5-33.5 15 33.5-38.5 17

32

Listed below are the numbers of manatee deaths caused each year by collisions with watercraft. The data are listed in order for each year of the past decade. Find the range, variance, and standard deviation of the data set. What important feature of the data is not revealed through the different measures of variation? 90 73 83 96 73 87 75 66 98 66 The range of the sample data is -- deaths. The variance of the sample data is --- deaths^2. The standard deviation of the sample data is --- deaths. What important feature of the data is not revealed through the different measures of variation?

32 138.7 11.8 The measures of variation reveal nothing about the pattern over time.

Listed below are the arrival delay times (in minutes) of randomly selected airplane flights from one airport to another. Negative values correspond to flights that arrived early before the scheduled arrival time, and positive value represent lengths of delays. Find the range, variance, and standard deviation for the set of data. Some of the sample values are negative, but can the standard deviation ever be negative? -14 -10 5 4 -32 -11 -5 The range of the sample data is -- minutes. The variance of the sample data is ---- minutes^2 The standard deviation of the sample data is --- minutes. Some of the sample values are negative, but can the standard deviation ever be negative?

37 156.7 12.5 No, because the squared value in the standard deviation formula cannot be negative.

Listed below are the durations (in hours) of a simple random sample of all flights of a space shuttle program. Find the range, variance, and standard deviation for the sample data. Is the lowest duration time unusual? Why or why not? 80 96 234 198 164 270 199 370 259 230 380 335 225 247 0 The range of the sample data is -- hours. The variance of the sample data is ----. The standard deviation of the sample data is ---- hours. Is the lowest duration time unusual? Why or why not?

380 10987.3 104.8 Yes, because it is more than two standard deviations below the mean.

Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 50.8 miles per hour. Speed: 42-45 46-49 50-53 54-57 58-61 Frequency: 29 12 7 4 2 The mean of the frequency distribution is --- miles per hour. Which of the following best describes the relationship between the computed mean and the actual mean?

46.9 The computed mean is not close to the actual mean because the difference between the means is more than 5%.

A company advertises a mean lifespan of 1000 hours for a particular type of light bulb. If you were in charge of quality control at the​ factory, would you prefer that the standard deviation of the lifespans for the light bulbs be 5 hours or 50​ hours? Why?

5 hours would be preferable since a smaller standard deviation indicates more consistency.

The following are amounts of time (minutes) spent on hygiene and grooming in the morning by survey respondents. Determine the 5-number summary and construct a boxplot for the data given below. 5 6 7 9 10 10 11 15 19 19 21 23 35 39 43 46 57 64 The 5-number summary is - - - - - Make a boxplot on the calculator

5, 10, 19, 39, 64

Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 51.4 degrees. Low Temp: 40-44 45-49 50-54 55-59 60-64 Frequency: 1 6 13 4 1 The mean of the frequency distribution is --- degrees. Which of the following best describes the relationship between the computed mean and the actual mean?

51.6 The compound mean is close to the actual mean because the difference between the means is less than 5%.

Find the third quartile Q3 of the list of 24 sorted values shown below. 30 32 35 36 37 38 43 47 47 47 48 49 52 55 55 59 59 59 61 64 69 71 75 78 The third quartile Q3 is

60

Use the regression equation to predict the y-value corresponding to the given x-value. Round your answer to the nearest tenth. The regression equation relating attitude rating (x) and job performance rating (y) for ten randomly selected employees of a company is y hat = 11.7+1.02x. Predict the job performance rating for an employee whose attitude rating is 67.

80.0

Below are the range and standard deviation for a set of data. Use the range rule of thumb and compare it to the standard deviation listed below. Does the range rule of thumb produce an acceptable approximation? Suppose a researcher deems the approximation as acceptable if it has an error less than 15%. Range= 39 Standard Deviation= 10.949 The estimated standard deviation is ----- Is this an acceptable approximation?

9.750 Yes, because the error of the range rule of thumb's approximation is less than 15%.

Heights of men on a basketball team have a bell-shaped distribution with a mean of 176 cm and a standard deviation of 5 cm. Using the empirical rule, what is the approximate percentage of the men between the following values? a. 161 cm and 191 cm b. 171 cm and 181 cm a. ---% of the men are between 161 and 191 cm. b. ----% of the men are between 171 cm and 181 cm.

99.7% 68%

Explain the difference between a bar graph and a Pareto chart.

A Pareto chart is a particular type of bar graph in which the bars are drawn in decreasing order of height.

A manufacturer of bolts has a​ quality-control policy that requires it to destroy any bolts that are more than 2 standard deviations from the mean. The​ quality-control engineer knows that the bolts coming off the assembly line have mean length of 8 cm with a standard deviation of 0.05 cm. For what lengths will a bolt be​ destroyed?

A bolt will be destroyed if the length is less than 7.9 7.9 cm or greater than 8.1 8.1 cm.

A difference between two groups in an observational study that can explain why the outcomes were very different between the groups is called what?

A confounding variable

An indication of no linear relationship between two variables would be:

A correlation coefficient equal to 0

​Yes, as the weight increases the highway mileage decreases.

A given data table lists weights​ (pounds) and highway mileage amounts​ (mpg) for seven automobiles, and has been formatted into a scatterplot (above). Is there a linear relationship between weight and highway​ mileage?

Histogram

A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies. The heights of the bars correspond to the frequency​ values, and the bars are drawn adjacent to each other​ (without gaps). shows shape of distribution shows location of center shows the spread of data identifies outliers use class midpoints to separate each bar

A. The magazine has an interest in the survey​ results, so the source of the survey is questionable.

A magazine ran a survey about a web site for downloading music. Readers could register their responses on the​ magazine's web site. Identify what is wrong. Choose the correct answer below: A. The magazine has an interest in the survey​ results, so the source of the survey is questionable. B. The sample is a voluntary response​ sample, so there is a good chance that the results do not reflect the population. C. The sample is a​ census, so there is a good chance that the results do not reflect the population. D. It is likely that the survey used a loaded​ question, so the results of the survey are not reliable.

A. The sample is a voluntary response​ sample, so there is a good chance that the results do not reflect the population.

A magazine ran a survey about a web site for downloading music. Readers could register their responses on the​ magazine's web site. Choose the correct answer below. A. The sample is a voluntary response​ sample, so there is a good chance that the results do not reflect the population. B. It is likely that the survey used a loaded​ question, so the results of the survey are not reliable. C. The magazine has an interest in the survey​ results, so the source of the survey is questionable. D. The sample is a​ census, so there is a good chance that the results do not reflect the population.

How do a parameter and a statistic differ?

A parameter is a numerical measurement of a population; a statistic is a numerical measurement of a sample

What is a placebo and what purpose does it serve in an experiment?

A placebo is a fake treatment that looks like the treatment being tested in the experiment. Placebos blind subjects so they do not know whether or not they are receiving the treatment.

If foreign investment fell by 100%, it would be totally eliminated. It not possible for it to fall by more than 100 %.

A report about the decline of Western investment in third world countries included this: "After years of daily flights, several European airlines halted passenger service. Foreign investment fell 300 percent during the 1990s." What is wrong with this​ statement?

Systematic Sampling

A researcher selects every 732 th social security number and surveys the corresponding person. Which type of sampling did the researcher use?

What is a voluntary response sample?

A sample in which the subjects themselves decide whether to be included in the study

Fill in the blank. A​ _______ is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

A scatter-plot is a plot of paired data​ (x,y) and is helpful in determining whether there is a relationship between the two variables.

What is a scatterplot and how does it help​ us?

A scatterplot is a graph of paired​ (x, y) quantitative data. It provides a visual image of the data plotted as​ points, which helps show any patterns in the data.

What is a scatterplot? What type of data is required for a scatterplot? What characteristic of the data can be better understood by looking at a scatterplot?

A scatterplot is a plot of paired quantitative data, and each pair of data is plotted as a single point. The scatterplot required paired quantitative data. The configuration of the plotted points can help us determine whether there is some relationship between the two variables.

The Pareto chart is more effective, it displays the information in decanting order.

A study was conducted to determine how people get jobs. The table below lists data from 400 randomly selected subjects. Compare the pie chart to the Pareto chart given on the left. Can you determine which graph is more effective in showing the relative importance of job​ sources?

pie chart

a graph that depicts qualitative data as slices of a circle in which the size of each slice is proportional to the frequency count for the category

Measure of center

A value at the center or middle of a data set is​ a(n) _______

measure of center

A value at the center or middle of a data set is​ a(n) _______.

Relative Frequency Distribution

A variation of the basic frequency distribution. In a relative frequency distribution, the frequency of a class is replaced with a relative frequency (a proportion) or a percentage frequency (a percent). The sum of the relative frequencies in a relative frequency distribution must be close to 1 (or 100%). *NOTE: when percentage frequencies are used, the relative frequency distribution is sometimes called a percentage frequency distribution. (example pg.49)

A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar​ chart, and how does that characteristic help us in understanding the​ data? A bar chart uses bars of equal width to show frequencies of categorical data. The vertical scale represents frequencies or relative frequencies. The horizontal scale identifies the different categories of qualitative data. When one wants a bar chart to draw attention to the more important​ categories, one can use a Pareto​ chart, which is a bar chart for categorical​ data, with the added stipulation that the bars are arranged in descending order according to frequencies. The bars decrease in height from left to right.

A. In a Pareto​ chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important​ categories, which have the highest frequencies.

The table below shows the frequency distribution of red blood cell counts in 81 males. Red_blood_cell_count Frequency 3.00-3.49 1 3.50-3.99 6 4.00-4.49 11 4.50-4.99 19 5.00-5.49 20 5.50-5.99 15 6.00-6.49 9 6.50-6.99 3 Use the frequency distribution to construct a histogram. Using a loose interpretation of the requirements for a normal​ distribution, does the histogram appear to depict data that have a normal​ distribution? Why or why​ not?

A. The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then​ decrease, and the histogram is symmetric.

Which of the following is NOT a characteristic of the​ mean?

A. The mean is sensitive to outliers. B. The mean is relatively reliable. C. The mean takes every data value into account. D. The mean is called the average by statisticians.<--correct answer

Which of the following is NOT true about statistical​ graphs?

A. They utilize areas or volumes for data that are​ one-dimensional in nature.<-- Correct answer B. They can be used to identify extreme data values. C. Similar graphs can be constructed in order to compare data sets. D. They can be used to consider the overall shape of the distribution.

Which of the following is NOT a principle of probability? a. All events are equally likely in any probability procedure. b. The probability of any event is between 0 and 1 inclusive. c. The probability of an impossible event is 0. d. The probability of an event that is certain to occur is 1.

All events are equally likely in any probability procedure.

Frequency Distribution

All pieces of data for each category

What can be said about a set of data with a standard deviation of​ 0?

All the observations are the same value.

According to the Empirical Rule, ________ will be within two standard deviations of the mean.

Approximately 95% of the obesrvations

In the 2008 presidential election, 55% of the voters voted for a certain candidate. What is the probability that 75 out of 100 independently chosen voters voted for this candidate?

B(n,p,x) = b(100, .55, 75)

INTERSECTION

BOTH NUMBER HAVE IN COMMON IS _____

What are two commonly used graphs to display the distribution of a sample of categorical data?

Bar graph and pie chart

Why, in a frequency distribution, do we use the class midpoint when calculating mean?

Because we don't know the the exact values that fall into a particular class. So we just pretend that all values are equal to the class midpoint.

Which of the accompanying boxplots likely has the data with the larger standard​ deviation? Why?

Boxplot II likely has the data with the larger standard deviation because the boxplot appears to have a greater​ spread, which likely results in a larger standard deviation.

Look at #50 chart and answer the questions: In what way might the graph be deceptive? How much greater is the braking distance of Car A than the braking distance of Car C?

By starting the horizontal axis at 100, the graph cut off portions of the bars. The braking distance of Car A is about 30% greater than the braking distance of Car C.

Identify which type of sampling is​ used: random,​ systematic, convenience,​ stratified, or cluster. To determine customer opinion of their check dash in servicecheck-in service​, American Airlines randomly selects 60 flights during a certain week and surveys all passengers on the flight.

Cluster

which car would a costumer buy based on standard deviation, range, mean, median

Car 2, because it has a lower sample standard deviation, hence more predictable gas mileage

The histogram to the right represents the weights​ (in pounds) of members of a certain​ high-school math team. What is the class​ width? What are the approximate lower and upper class limits of the first​ class? Class width = max value-min value/# of classes

Class width is the difference between two consecutive lower class limits​ (or two consecutive lower class​ boundaries) in a frequency distribution. The lower​ (and upper) class limits are the smallest​ (and largest) numbers that can belong to the different classes. The lower​ (and upper) class limits are the smallest​ (and largest) numbers that can belong to the different classes. The first lower class limit is approximately 90, and the second lower class limit is approximately 120. Determine the distance between them. 120−90=30 ​Therefore, the class width is 30. The approximate lower class limit of the first class is the first approximate lower class limit found above​ (approximately 90). The upper class limit of the first class is approximately equal to the second lower class​ limit, 120. ​Therefore, the approximate lower and upper class limits of the first class are 90 and 120, respectively.

Identify the variable as either continuous or discrete: The number of freshmen entering a randomly selected college in a certain year.

Continuous

Identify which type of sampling is being used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day.

Convenience

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A researcher interviews 19 work colleagues who work in his building."

Convnience

Which of the following is NOT one of three common errors involving correlation? - Correlation does not imply causality. -The conclusion that correction implies causality. -The use of data based on averages. -Mistaking no linear correlation with no correlation

Correlation does not imply causality

The probability of event B​ occurring, given that event A has already occurred.

DESCRIBE WHAT THE P(B/A) MEAN.

Standard Deviation

Denoted by s, is a measure of how much data values deviate away from the mean. Most common measure of variation in statistics. Usually positive, zero only when all the data values are the same number. Never negative. Larger values of s indicate greater amounts of variation. Can increase dramatically with the inclusion of one or more outliers.

Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.

Descriptive statistics; the results of the class sample are described without making any generalizations about the population of all students at the school.

B. The data are qualitative because they don't measure or count anything.

Determine whether the data described below are qualitative or quantitative and explain why. The types of food served by restaurants (Italian, Chinese, fast, etc.) Choose the correct answer below. A. The data are quantitative because they don't measure or count anything. B. The data are qualitative because they don't measure or count anything. C. The data are quantitative because they consist of counts or measurements. D. The data are qualitative because they consist of counts or measurements.

A. The data are qualitative because they don't measure or count anything.

Determine whether the data described below are qualitative or quantitative and explain why. The types of movies (drama, comedy, etc.) Choose the correct answer below. A. The data are qualitative because they don't measure or count anything. B. The data are quantitative because they consist of counts or measurements. C. The data are qualitative because they consist of counts or measurements. D. The data are quantitative because they don't measure or count anything.

The given description corresponds to an observational study.

Determine whether the given description corresponds to an observational study or an experiment. In a study of 413 women with a particular​ disease, the subjects were photographed daily.

The given value is a PARAMETER for the month because the data collected represent a POPULATION.

Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average​ (mean) value is 113.3 volts.

The given value is a parameter for the month because the data collected represent a population.

Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average​ (mean) value is 139.8 volts.

Parameter because the value is a numerical measurement describing a characteristic of a sample.

Determine whether the given value is a statistic or a parameter. A sample of seniors is selected and it is found that 25% own a computer.

The value is a PARAMETER because the value is a numerical measurement describing a characteristic of a POPULATION (refers to "all").

Determine whether the given value is a statistic or a parameter. In a study of all 3473 professors at a college, it found that 50 % own a television.

The ordinal level of measurement is most appropriate because the data can be ordered, but differences cannot be found or are meaningless.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Explain why. Ratings of hotels on a scale from 0 stars to 4 starsRatings of hotels on a scale from 0 stars to 4 stars.

D. The interval level of measurement is most appropriate because the data can be ordered, differences can be found and are meaningful, and there is no natural starting point.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Please explain why. Body temperature in degrees Fahrenheit. Choose the correct answer below. A. The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless. B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural starting point C. The nominal level of measurement is most appropriate because the data cannot be ordered. D. The interval level of measurement is most appropriate because the data can be ordered, differences can be found and are meaningful, and there is no natural starting point.

Quartiles (most common percentiles) --> resistant to extreme values

Divide data sets into fourths, or four equal parts. The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile divides the bottom 50% of the data from the top 50%, so the second quartile is equivalent to the 50th percentile, which is equivalent to the median. Finally the third percentile divides the bottom 75% of the data from the top 25%, so that the third quartile is equivalent to the 75th percentile.

Yes, it appears that births occur on the days of the week with frequencies that are about the same.

Does it appear that births occur on the days of the week with equal​ frequency in the cumulative frequency (above)? Let the frequencies be substantially different if any frequency is at least twice any other frequency.

The histogram has a longer right tail, so the distribution of the data is skewed to the right.

Does the histogram appear to be​ skewed?

Describe sampling with replacement.

Draw a notecard, note the name, replace the notecard and draw again. It is possible the same student could be picked twice.

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean.

Empirical Rule

In regression, what is predicting outside the range of the x-values from the sample data called?

Extrapolation

Class Width

Finally, the class width for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class from the lower (or upper) class limit of the next class

Formula for Weighted Mean

First multiply each weight (w) by the corresponding value (x), then to add the products, and finally to divide that total by the sum of the weights.

Empirical rule

For data sets having a distribution that is approximately bell-shaped these properties apply: About 68% of all values fall within 1 standard deviation of the mean. About 95% of all values fall within 2 standard deviations of the mean. About 99.7% of all values fall within 3 standard deviations of the mean.

EMPIRICAL RULE

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean

Attempting to use the regression equation to make predictions beyond the range of the data is called _______.

extrapolation

The U.S. Department of Housing and Urban Development​ (HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the​ median?

HUD uses the median because the data are skewed right

Look at the #44 charts and answer the questions: If someone would like to get a job, what seems to be the most effective approach?

Help-wanted ads (H)

1. If there computed linear correlation coefficient r lies in the left tail beyond the leftmost critical value or if it lies in the right tail beyond the rightmost critical value, reject Ho and conclude that there is sufficient evidence to support the claim. |r| > crit. value 2. Reject if lies between the two crit. values. |r| ≤ crit. val

How do we know there is a correlation and when to reject Ho?

3

How many decimals do we round r to?

When examining the shape of a distribution of numerical data, which of the following is not one of the three basic characteristics of a distribution's shape?

How many numbers are in the data set.

VENN DIAGRAM

INTERSECTION, UNION, COMPLIMENT

Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y = 71.8 + 0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the y-intercept?

IQ is predicted to be 71.8 for a brain size of 0 cubic centimeters.

In a typical​ boxplot, the length of the box indicates which measure of​ spread?

IQR

COMPLIMENT

IS ALL THE NUMBERS THAT DON'T BELONG TO THE SET.

Class Width: 6 Class Midpoints: 6.95, 12.95, 18.95, 24.95, 30.95 Class Boundaries: 3.95, 9.95, 15.95, 21.95, 27.95, 33.95

Identify the class​ width, class​ midpoints, and class boundaries for the given frequency distribution (above).

Lower Class Limits: 25, 30, 35, 40, 45, 50, 55 Upper Class Limits: 29,34, 39, 44, 49, 54, 59 Class Width: 5 Class Midpoints: 27, 32, 37, 42, 47, 52, 57 Class Boundaries: 24.5, 29.5, 34.5, 39.5, 44.5, 49.5, 54.5, 59.5 Number of individuals included in the summary: 93

Identify the lower class​ limits, upper class​ limits, class​ width, class​ midpoints, and class boundaries for the given frequency distribution (above). Also identify the number of individuals included in the summary.

Systematic Sampling

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A researcher selects every 762th social security number and surveys the corresponding person. What type of sampling did the researcher use? Random Convenience Systematic Stratified Cluster

It is questionable that the sponsor is a candy company because this sponsor can be greatly affected by the conclusion.

Identify what is wrong: Several studies showed that after eating chocolate​, subjects had increased blood levels of antioxidants. Antioxidants have been associated with decreased risk of heart disease. A candy company financed this research.

Cluster

Identify which type of sampling is​ used: random, systematic, convenience,​ stratified, or cluster. To determine customer opinion of their check-in service​, American Airlines randomly selects 3030 flights during a certain week and surveys all passengers on the flight.

In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as​ _______.

a non-zero axis

Cluster

Identify which type of sampling is​ used: random,​ systematic, convenience,​ stratified, or cluster. To determine customer opinion of their check-in service, American Airlines randomly selects 60 flights during a certain week and surveys all passengers on the flights. Which type of sampling is​ used? Cluster Stratified Systematic Random Convenience

Determining z-score

If a data value is larger than the mean, the z-score will be positive. (occurs for observations with a value greater than the mean) If a data value is smaller than the mean, the z-score will be negative (occurs for observations less than the mean) If the data value equals the mean, the z-score will be zero Z-scores measure the number of standard deviations an observation is above or below the mean. Ex. A z-score 1.24 is interpreted as "the data value is 1.24 standard deviation above the mean." or GREATER than the mean. Ex. A z-score .5 or 1/2 , the standard deviation is LESS than the mean Ex. A z-score of 0 indicates that the value of observation is EQUAL to the mean

After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?

If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1.

After constructing a relative frequency distribution summarizing IQ scores of college​ students, what should be the sum of the relative​ frequencies?

If percentages are​ used, the sum should be​ 100%. If proportions are​ used, the sum should be 1

After constructing a relative frequency distribution summarizing IQ scores of college​ students, what should be the sum of the relative​ frequencies?

If percentages are​ used, the sum should be​ 100%. If proportions are​ used, the sum should be 1.

Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables? -Any outliers must be removed if they are known to be errors. -If r>1, then there is a positive linear correlation. -The sample of paired data is sample random sample of quantitative data. -A scatter-plot should be visually show a straight-line pattern.

If r>1, then there is a positive linear correlation

That the Zx*Zy tend to be positive. If its downhill, its the opposite.

If using z-score and Σ(ZxZy) approximate an uphill line, what does this tell us?

​No, a graph cannot help to overcome the deficiency. If the sample is a bad​ sample, there are no graphs or other techniques that can be used to salvage the data.

If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the​ Internet, can a graph help to overcome the deficiency of having a voluntary response​ sample?

C. ​No, a graph cannot help to overcome the deficiency. If the sample is a bad​ sample, there are no graphs or other techniques that can be used to salvage the data.

If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the​ Internet, can a graph help to overcome the deficiency of having a voluntary response​ sample? Choose the correct answer below. A. No, a graph cannot help to overcome the deficiency. Before​ graphing, all inaccurate values and outliers must be removed from the data set. B. ​Yes, a graph can help to overcome the deficiency. Certain graphs that hide any specific values in the​ data, such as pie​ charts, can be used to hide deficiencies in the sampling technique. C. ​No, a graph cannot help to overcome the deficiency. If the sample is a bad​ sample, there are no graphs or other techniques that can be used to salvage the data. D. ​Yes, a graph can help to overcome the deficiency. Any graph that is given with a sufficiently accurate description of any deficiencies in the sampling technique is no longer considered biased.

A bar chart and a Pareto chart both use bars to show frequencies of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data?

In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.

Random Sampling

In a poll conducted by a certain research​ center, 1175 adults were called after their telephone numbers were randomly generated by a​ computer, and 34% were able to correctly identify the president. Which type of sampling did the research center​ use? Cluster sampling Stratified sampling Convenience sampling Systematic sampling Random sampling

If we do not reject the null hypothesis, is it valid to say that we accept the null hypothesis? Why or why not?

No, we have only shown that we do not have enough evidence to reject it.

Random Sampling

In a poll conducted by a certain research​ center, 1288 adults were called after their telephone numbers were randomly generated by a​ computer, and 36% were able to correctly identify the secretary of state. Which type of sampling did the research center use? Random Cluster Stratified Systematic Convenience

Fill in the blank. In a​ _______ distribution, the frequency of a class is replaced with a proportion or percent.

In a relative frequency distribution, the frequency of a class is replaced with a proportion or percent.

A. The given description corresponds to an experiment.

In a study of 442 children with a particular​ disease, the subjects were given certain drugs to determine if the drugs have an effect on the disease. Does the given description correspond to an observational study or an​ experiment? A. The given description corresponds to an experiment. B. The given description corresponds to an observational study. C. The given description does not provide enough information to answer this question.

Yes, misconduct appears to be a major factor because the majority of retractions were due to misconduct.

In a study of retractions in biomedical​ journals: 405 were due to​ error, 194 were due to​ plagiarism, 888 were due to​ fraud, 291 were due to duplications of​ publications, and 273 had other causes. Does the Pareto chart (above) showing such retractions, appear to show misconduct​ (fraud, duplication,​ plagiarism) as a major​ factor? Please explain.

Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. Data skewed to the right have a longer left tail than right tail. c. The mean and median should be used to identify the shape of the distribution. d. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

Which of the following is always​ true?

In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same

a nonzero axis

In a​ graph, if one or both axes begin at some value other than​ zero, the differences are exaggerated. This bad graphing method is known as​ __ ______________ ________.

Explain the difference between a​ single-blind and a​ double-blind experiment.

In a​ single-blind experiment, the subject does not know which treatment is received. In a​ double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.

A(an) ______ is a person or object that is a member of the population being studied

Individual

IQR

Inner Quartile Range middle 50% of data Formula: Q3-Q1 resistant to outliers, measure of spread

What is a lurking variable?

Is an explanatory variable that was not considered in the study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study

No. The data values in each class could take on any value between the class​ limits, inclusive.

Is it possible to identify the exact values of all of the original service​ times?

Suppose the equation of a least-squares regression line is y = −3.17 −2.4x. What can be said about the correlation coefficient?

It is negative, but its exact value cannot be determined from the given information.

A student randomly sampled 15 senior male students and 15 senior female students and found their grade point average through their junior year. She obtained the accompanying scatterplot. The correlation coefficient between sex and grade point average is approximately -0.254. What does this mean?

It means nothing, as the correlation coefficient should not be interpreted when one or both of the variables are categorical.

In an editorial, the Poughkeepsie Journal printed this statement: "The median price- the price exactly in between the highest and lowest -...." Does this statement correctly describe the median? Why or why not?

No. It describes the midrange, not the median.

Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 2.470 ​in., and use a class width of 0.010 in. The screws were labeled as having a length of 2 1/2 in.

Length: 2.470 - 2.479 2.480 - 2.489 2.490 - 2.499 2.500 - 2.509 2.510 - 2.519 Frequency: 1 7 9 10 3

How to solve for a relative frequency table.

Look at both frequency tables and put 0 where data is missing. Find the total of both. Divide total by number given and multiply by 100.

Below are 36 sorted ages of an acting award winner. Find Upper P10 using the method presented in the textbook.

Look at notes for print out of help to work out this problem.

Find the third quartile Q3 of the list of 24 sorted values shown below. 27 31 35 35 36 38 39 45 46 48 49 51 52 52 54 56 57 64 68 71 78 79 80 8227 31 35 35 36 38 39 45 46 48 49 51 52 52 54 56 57 64 68 71 78 79 80 82

Look at notes for print out of help to work out this problem.

population z-score

M = Mean O = Standard Deviation

looking for B in percentile formula

MUST ROUND UP if you get 7.2, use the 8th percentile if you land on a whole number, not a decimal, use the average of that number and the one above it

All methods used for visualizing distributions are based on which of the following?

Make a mark that indicates how many times each value occurred in the data set.

Identify which of these designs is most appropriate for the given​ experiment: completely randomized​ design, randomized block​ design, or matched pairs design. A drug is designed to treat insomnia. In a clinical trial of the​ drug, amounts of sleep each night are measured before and after subjects have been treated with the drug.

Matched pairs design

Formula for the Mean From a Frequency Distribution

Mean from frequency distribution: - Find the class midpoint of each class limit - Multiply each frequency and class midpoint - Add products - The number gotten from this goes on the top of the equation. --------------------------- DIVIDED BY - Sum of frequencies

An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were ​$433​, ​$440, ​$495​, and ​$207 . Compute the​ mean, median, and mode cost of repair.

Mean: $393.75 Median:$436.5 Mode: None

A value at the center or middle of a data set is an _________?

Measure of center

There are many potential pitfalls that can cause problems when analyzing data. Which of these choices are not classified as a potential pitfall? Order of survey questions Nonresponse Self-reported data Measured data

Measured data

What is an observational study?

Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables

The value that would be right in the middle if you were to sort the data from smallest to largest is called the ______.

Median.

descriptive

Methods used that summarize or describe characteristics of data are called​ _______ statistics

A linear regression line was constructed relating two variables, x, and y where X is the independent variable and y is the response (dependent variable). The slope was found to be 20 and the intercept was found to be -4. Based on this information, predict the value of y when x = 2

NOT 12 ANSWER: C. 36

In publishing the results of some research work, the following values of the correlation coefficient were listed. Which one would appear to be incorrect? A. 1.2 B. 0.90 C. -0.8 D. 0

NOT C. (-0.8) ANSWER: A. 1.2

Look at #29 chart and answer the questions: Construct a histogram on the calculator. Do the data appear to have a distribution that is approximately normal?

No, it is not symmetric

Is it possible to identify the exact values of all of the original service​ times?

No, the data values in each class could take on any value between the class limits, inclusive.

Look at #5 chart and answer the question: Does the frequency distribution appear to have a normal distribution?

No, the distribution does not appear to be normal.

A magazine published a list consisting of the state tax on each gallon of gas. If we add the 50 state tax amounts and then divide by​ 50, we get 27.3 cents. Is the value of 27.3 cents the mean amount of state sales tax paid by all U.S.​ drivers? Why or why​ not?

No, the value of 27.3 cents is not the mean because the 50 amounts are all weighted equally in the​ calculation, but some states consume more gas than​ others, so the mean amount of state sales tax should be calculated using a weighted mean.

If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global​ temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global​ temperature?

No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.

A psychology student wishes to investigate differences in political opinions between business majors and political science majors at her college. She randomly selects 100 students from the 260 business majors and 100 students from the 180 political science majors. Does this sampling plan result in a random sample? Simple random sample? Explain.

No; no. The sample is not random because political science majors have a greater chance of being selected than business majors. It is not a simple random sample because some samples are not possible, such as a sample consisting of 50 business majors and 150 political science majors.

Can the variance of a data set ever be negative? Explain.

No; since the variance is based on the squared deviations from the mean and N, it cannot be negative.

A marketing firm does a survey to find out how many people use a product. To accomplish this, they select a random sample of one hundred people consumers and record how many use the product. Is this an observational study or an experiment?

Observational study

Suppose that you need to create a list of n values that have a specific known mean. Some of the n values can be freely selected. How many of the n values can be freely assigned before the remaining values are​ determined? (The result is referred to as the number of degrees of​ freedom.)

Of the n​ values, n−1 can be freely selected because the remaining​ value(s) can be expressed in terms of the assigned values and the known mean.

Comparing deviations

Only compare two sample standard deviations when the sample means are approximately the same.

Determine which of the four levels of measurement is most appropriate: Students' grades, A, B, or C, on a test. Interval Nominal Ordinal Ratio

Ordinal

In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier?

Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.

Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?

Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.

After inspecting all of 55,000 kg of meat stored at the Wurst Sausage Company, it was found that 45,000 kg of the meat was spoiled. Is this value a statistic or a parameter?

Parameter

Determine whether the given value is a statistic or a parameter. Thirty percent of all dog owners poop scoop after their dog. Statistic Parameter

Parameter

Determine whether the given value is a statistic or a parameter: In a study of all 3153 seniors at a college, it is found that 50% own a computer

Parameter because the value is a numerical measurement describing a characteristic of a population

The most common correlation coefficient, called the ____ correlation coefficient, measures the strength of the linear association between variables.

Pearson product-moment

Among fatal plane crashes that occurred during the past 70 years, 269 were due to pilot error, 54 were due to other human error, 665 were due to weather, 85 were due to mechanical problems, and 479 were due to sabotage. Construct the relative frequency distribution. What is the most serious threat to aviation safety, and can anything be done about it? What is the relative frequency for pilot error, other human error, weather, mechanical problems, and sabotage? Round to one decimal point. What is the most serious threat to aviation safety, and can anything be done about it?

Pilot Error: 17.3% Other Human Error: 3.5% Weather: 42.8% Mechanical problems: 5.5% Sabotage: 30.9% Weather is the most serious threat to aviation safety. Weather monitoring systems could be improved. Whi

When we used the z-score method, we found that 77 was the only outlier, and it was an extreme one. But, what if we use the quartiles method?

Q1=23 Q2= Q3=56.5 IQR=13.5 LIF=2.75 UIF=56.75 LOF=17.5 UOF=77 OBS. BETWEEN LIF AND UIF=USUAL OBS. BETWEEN LOF AND LIF=MO OBS. BETEEN UIF AND UOF=MO OBS. BEFORE LOF AND THOSE AFTER UOF=EO 77 is at the border between MO AND EO. We can consider it a mild or extreme outlier. This examples shows that z-score method is better than quartiles method because it is even more specific. Meanwhile, quartiles method gives you the chance to designate it one of the other.

Population of country of origin is qualitative or quantitative?

Quantitative because it is a numerical measure

_______ divide data sets in fourths.

Quartiles

Identify which of these types of sampling is​ used: random,​ systematic, convenience,​ stratified, or cluster. A large company wants to administer a satisfaction survey to its current customers. Using their customer​ database, the company randomly selects 60 customers and asks them about their level of satisfaction with the company.

Random

Identify which type of sampling is being used: A pollster uses a computer to generate 500 random numbers, then interviews the voters corresponding to those numbers.

Random

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A womanwoman is selected by a marketing company to participate in a paid focus group. The company says that the woman was selected because she was randomly chosen from all adults.

Random Sampling

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. In a poll conducted by a certain research​ center, 718718 adults were called after their telephone numbers were randomly generated by a​ computer, and 89 %89% were able to correctly identify the attorney general.attorney general.

Random sampling

______ is used when subjects are assigned to different groups through a process of random selection.

Randomization

What purpose does randomization serve in an experiment?

Randomization insures that the effect of factors whose levels cannot be controlled is minimized.

In statistics, what is true of randomness?

Randomness is hard to achieve without help from a computer or some other randomizing device.

Below are the jersey numbers of 11 players randomly selected from a football team. Find the​ range, variance, and standard deviation for the given sample data. What do the results tell​ us? 26, 49, 12, 77, 55, 59, 40, 92, 70, 99, 27

Range equals=87 ​ Sample standard deviation equals =27.9 ​ Sample variance equals=778.4 ​ Jersey numbers are nominal data that are just replacements for​ names, so the resulting statistics are meaningless.

The data are discrete because the data can only take on specific values.

State whether the data described below are discrete or​ continuous, and explain why. The numbers of employees working at different companies.

Represents the mean of a set of sample values.

N

Represents the number of data values in a population.

A distribution of a variable in which most of the values are relatively small but that also has a few very large values is called ________.

Right-skewed

B. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion.

Several studies showed that after regular exercise on a treadmillafter regular exercise on a treadmill​, subjects had loweredlowered blood pressure. High blood pressure has been associated with increased risk of heartblood pressure. High blood pressure has been associated with increased risk of heart disease and stroke.disease and stroke. A fitness equipment companyfitness equipment company financed this research. Choose the correct answer below. A. It is not possible to take accurate measurements. B. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion. C. The data used in the studies is not reliable because it was not measured by the administrator. D. Since the research is composed of voluntary response​ samples, there may be key data points missing.

When is B&W Plot simple or extended?

Simple: Data set does not contain outliers Extended: Data set contains outliers (MO or EO)

The​ x-values in the table to the right are the nicotine amounts​ (in mg) in different 100 mm​ filtered, non-"light" menthol cigarettes. The​ y-values are the nicotine amounts​ (in mg) in different​ king-size nonfiltered,​ nonmenthol, and​ non-"light" cigarettes. xx 1.11.1 0.80.8 0.90.9 1.01.0 1.11.1 yy 1.11.1 1.31.3 1.21.2 1.11.1 1.61.6 minus− minus− minus− minus− minus− minus− minus− If suitable methods of statistics are​ used, it can be concluded that the average​ (mean) nicotine amount of the 100 mm​ filtered, non-"light" menthol cigarettes is less than the average​ (mean) nicotine amount of the​ king-size nonfiltered,​ nonmenthol, and​ non-"light" cigarettes. Can it be concluded that the first type of cigarette is​ safe? Why or why​ not?

Since the first type of cigarette contains less nicotine than the second type of​ cigarette, the first type is safer.​ However, it cannot be concluded that it is safe.

Lower Class Limits

Smallest numbers in each categories

The data are discrete because the data can only take on specific values.

State whether the data described below are discrete or​ continuous, and explain why. The numbers of children in families.

A sample of 120 employees of a company is selected, and the average age is found to be 37 years. Is this value a statistic or a parameter?

Statistic

Determine whether the given value is a statistic or a parameter. "A sample of 120 employees of a company is selected, and the average age is found to be 37 years"

Statistic

Finding Quartiles

Step 1 Arrange the data in ascending order. Step 2 Determine the median, M, or second quartile, Q2 . Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q1 , is the median of the bottom half, and the third quartile, Q3 , is the median of the top half.

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "49,34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496,348, and 481 students respectively"

Stratified

Identify which of these types of sampling is​ used: random,​ systematic, convenience,​ stratified, or cluster. To determine her breathing ratebreathing rate​, Carrie divides up her day into three​ parts: morning,​ afternoon, and evening. She then measures her breathing rate at 4 randomly selected times during each part of the day.

Stratified

Identify which type of sampling is being used: 49, 34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496, 348, and 481 students respectively.

Stratified

To determine her air quality, Carrie divides up her day into three parts, morning, afternoon, and evening. She then measures her air quality at 4 randomly selected times during each part of the day. What type of sampling is this?

Stratified

Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each​ subdivision?

Stratified

What is meant by confounding?

Study occurs when the effects of TWO or MORE explanatory variable are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study

A graphical display of a data set is given. State whether the distribution is (roughly) symmetric, right skewed, or left skewed. Two dice were rolled and the sum of the two numbers was recorded. This procedure was repeated 400 times. The results are shown in the relative frequency histogram below.

Symmetric

A tax auditor selects every 1000th income tax return that is received. Identify which of these types of sampling is used Stratified Systematic Simple Random Cluster Convenience

Systematic

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A researcher selects every 221th social security number and surveys the corresponding person.

Systematic

Identify the type of sampling​ used: random,​ systematic, convenience,​ stratified, or cluster. To estimate the percentage of defects in a recent manufacturing​ batch, a quality control manager at ToshibaToshiba selects every 2020th laptoplaptop that comes off the assembly line starting with the secondsecond until she obtains a sample of 100100 laptopslaptops.

Systematic

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A sample consists of every 49th student from a group of 496 students."

Systematic

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A tax auditor selects every 1000th income tax return that is received."

Systematic

DISJOINT

THEY HAVE NOTHING IN COMMON. WHEN IT STATES (A OR B) P(A OR B) = P(A)+P(B)

Open-ended Distribution

That is, the class has no specific beginning value or no specific ending value. A frequency distribution with an open-ended class is called an open-ended distribution.

5. When you need to find the z-score that forms the boundary between 2 areas under the bell curve i.e. between top 20% & bottom 80% use:

The *Tail column* & find the proportion closest to the percentage e.g. the proportion closest to .2000; the z-score in that row is the z-score that forms that boundary.

Interquartile range

The ... IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula

Q1 Q2 Q3

The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile. The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median. The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.

Look at the #45 charts and answer the questions: Compare the pie chart found above to the Pareto chart given on the left. Can you determine which graph is more effective in showing the relative importance of job sources?

The Pareto char is more effective.

S=Range/4

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​ _______

Look at #47 charts and answer the question: Applying a strict interpretation of the requirements for a normal distribution, do the depths appear to be normally distributed? Why or why not?

The frequency polygon does not appear to approximate a normal distribution because the frequencies do not increase to a maximum and then decrease, and the graph is not symmetric.

Frequency Polygon

The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points.

The (frequency) distribution appears to be SKEWED TO THE RIGHT ​(or positively ​skewed).

The given data represent the number of people from a​ town, aged​ 25-64, who subscribe to a certain print magazine. The frequency polygon graph (above) suggests the distribution is ____________ ____ ______ __________?

Determine whether the given value is a statistic or a parameter: A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 131.6 volts

The given value is a parameter for the month because the data collected represented a population

Suppose you construct a graph to compare the student populations of the five largest high schools in your city and choose to depict the populations with school buildings of various sizes. If the school buildings are drawn so that the length and the width are each in proportion to the population of the corresponding schools, is the resulting graph misleading? Why or why not?

The graph will be misleading since the student populations are one-dimensional data, but the graph uses a two-dimensional school building to represent it.

In a study designed to test the effectiveness of a medication as a treatment for lower back​ pain, 1643 patients were randomly assigned to one of three​ groups: (1) the 547 subjects in the placebo group were given pills containing no​ medication; (2) 550 subjects were in a group given pills with the medication taken at regular​ intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the​ study?

The group sample sizes are all large so the researchers could see the effects of the treatment.

Heights of statistics students were obtained by a teacher as part of an experiment conducted for the class. The last digit of those heights are listed below. Construct a frequency distribution with 10 classes. Based on the​ distribution, do the heights appear to be reported or actually​ measured? What can be said about the accuracy of the​ results?

The heights appear to be reported because there are disproportionately more 0s and 5s. They are likely not very accurate because they appear to be reported.

Fill in the blank. The heights of the bars of a histogram correspond to​ _______ values.

The heights of the bars of a histogram correspond to frequency values.

The histogram has a LONGER RIGHT TAIL, so the distribution of the data is SKEWED TO THE RIGHT.

The histogram has a ____________ __________ ________, so the distribution of the data is ____________ ____ ______ __________.

The table shows the magnitudes of the earthquakes that have occurred in the past 10 years. Use the frequency distribution to construct a histogram. Does the histogram appear to be​ skewed? If​ so, identify the type of skewness.

The histogram has a longer right tail, so the distribution of the data is skewed to the right.

The histogram represents 17 debate team members.

The histogram (above) represents the weights​ (in pounds) of members of a certain​ high-school debate team. How many team members are included in the​ histogram (above)?

Look at #27 chart and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?

The histogram appears to depict a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is roughly symmetric.

Look at #30 charts and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?

The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease and the histogram is symmetric.

Look at #28 chart and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?

The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is symmetric.

The histogram represents 27 debate team members.

The histogram below represents the weights​ (in pounds) of members of a certain​ high-school debatedebate team. How many team members are included in the​ histogram?

The histogram to the right represents the weights​ (in pounds) of members of a certain​ high-school programming team. How many team members are included in the​ histogram?

The histogram represents 18 programming team members. (the x-axis shows weight in pounds, but the y-axis is frequency)(you would count the frequency from each bar in the histogram to get total members).

Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile​ range?

The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.

Name two measures of the variation of a distribution, and state the conditions under which each measure is preferred for measuring the variability of a single data set.

The interquartile range is preferred when the data is strongly skewed or has outliers. The standard deviation is preferred when the data is relatively symmetric.

Lower Class Limit

The lower class limit represents the smallest data value that can be included in the class.

Which of the following is NOT a characteristic of the mean? -The mean is relatively reliable. -The mean is called the average by statisticians. -The mean is sensitive to outliers. -The mean takes every data value into account.

The mean is called average by statisticians.

A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean of the median? Why?

The mean will be likely larger BECAUSE the extreme values in the right tail tend to pull up the mean in the direction of the tail

What is the Midrange of a data set?

The measure of center that is the value midway between the max and min values in the original data set.

Which statement is NOT true regarding the median?

The median is always one of the values in the data set.

How can you tell from a boxplot if the distribution is symmetric?

The median is in the center of the box, and the left and right whiskers are approximately the same length.

How can you tell from a boxplot if the distribution is​ symmetric?

The median is in the center of the​ box, and the left and right whiskers are approximately the same length.

Definition of Mode

The most frequently occurring data value and is the appropriate measure of center for nominal data. (A data set can have one mode, more than one mode, or no mode.

If you flip a fair coin repeatedly and the first four results are tails, are you more likely to get heads on the next flip, more likely to get tails again, or equally likely to get heads or tails?

The next flip is equally likely to be heads or tails because each flip is independent of the others and the coin does not "keep track" of the past results.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Social security numbers

The nominal level of measurement is most appropriate because the data cannot be ordered.ordered.

A teacher wants to find out whether the chance of drawing a Queen is 7.7%. In the last 5 minutes of class, he has all the students draw cards replacing the previous card and shuffling between each draw until the end of class and then report their results to him. Which condition(s) for use of the binomial model is/are not met?

The number of trials is fixed.

Class Boundaries

The numbers used to separate the classes, but with out the gaps created by class limits (example pg. 47).

If an observation has a z-score of 0, this means which of the following?

The observation is equal to the mean.

Ogive

The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution.

The accompanying data represent the percentage of recent high school graduates​ (graduated within 12 months before the given​ year-end) who enrolled in college in the fall. Construct a​ time-series plot and comment on any trends. Comment on any trends. Choose the correct comment below.

The percentage of high school graduates who enrolled in college has generally​ increased, though there have been some down years.

Suppose a researcher is testing someone to see whether she or he can tell Soda X from Soda Y, and the researcher is using 22 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. About how many should the researcher expect the person to get right under the null hypothesis that the person is guessing?

The person should get 11 right.

B. With a data set that is so​ small, the true nature of the distribution cannot be seen with a histogram.

The population of ages at inauguration of all U.S. Presidents who had professions in the military is​ 62, 46,​ 68, 64, 57. Why does it not make sense to construct a histogram for this data​ set? Choose the correct answer below. A. Adequate class boundaries for a histogram cannot be found with this data set. B. With a data set that is so​ small, the true nature of the distribution cannot be seen with a histogram. C. There must be an even number of data values in the data set to create a histogram. D. This data set would yield a histogram that is not​ bell-shaped.

Gaps

The presence of _______ can show that we have data from two or more different populations (example pg. 51).

A researcher is testing someone who claims to have ESP by having that person predict whether a coin will come up heads or tails. The null hypothesis is that the person is guessing and does not have ESP, and the population proportion of success is 0.50. The researcher tests the claim with a hypothesis test, using a significance level of 0.05. Fill in the blanks below with an accurate statement about the potential conclusion of this test.

The probability of concluding that the person has ESP when in fact she or he does not have ESP is 0.05.

What is the significance level of a test?

The probability of rejecting the null hypothesis when, in fact, the null hypothesis is true

Chebyshev's theorem.

The proportion of any set of data lying within K standard deviations of the mean is always at least 1-1/K^2, where K is any positive number greater than 1. Theorem applies to ANY data set. Results are only approximate. Results are lower limits ("at least"), so it has limited usefulness.

Which of the following is the best explanation to what should happen to the proportion of heads as the number of coin flips increases?

The proportion should get closer to 0.5 as the number of flips increases.

Which is relatively better: a score of 90 on a psychology test or a score of 47 on an economics test? Scores on the psychology test have a mean of 93 and a standard deviation of 12. Scores on the economics test have a mean of 52 and a standard deviation of 4.

The psychology test score is relatively better because its z score is greater than the z score for the economics test score.

Which is relatively​ better: a score of 58 on a psychology test or a score of 49 on an economics​ test? Scores on the psychology test have a mean of 8585 and a standard deviation of 10. Scores on the economics test have a mean of 58 and a standard deviation of 3.

The psychology test score is relatively better because its z score is greater than the z score for the economics test score.

Determine whether the description below corresponds to an observational study or an experiment. In a studystudy sponsored by a​ company, 11 comma 07911,079 people were asked what contributes most to their anxiety commaanxiety, and 37 %37% of the respondents said that it was their health.health.

The study is an observational study because the survey subjects were not given any treatment.

Which of the following does the confidence level measure?

The success rate of the method of finding confidence intervals

Determine whether the sample described below is a simple random sample. In the last yearyear​, 123 comma 423123,423 adults got marriedgot married in a county. A researcher plans to conduct a survey of 800800 of those newlyweds.newlyweds. After obtaining a list of those who got married commagot married, he numbers the list from 1 to 123 comma 423 comma123,423, and then he uses a computer to randomly generate 800800 numbers between 1 and 123 comma 423.123,423. His sample consists of the newlywedsnewlyweds corresponding to the selected numbers.

The sample is a simple random sample because every sample of size 800800 has the same chance of being selected.

Determine whether the sample described below is a simple random sample. In order to test for a difference in the way that workersworkers and non dash workersnon-workers purchase magazines commamagazines, a research institution polls exactly 638638 adult workersworkers and 638638 adult non dash workersnon-workers randomly selected from adults in the United States.

The sample is not a simple random sample because every sample of size 12761276 does not have the same chance of being selected.

Determine whether the sample described below is a simple random sample. A quality control engineer selects every 5000 thevery 5000th hairdryerhairdryer that isis produced.

The sample is not a simple random sample because every sample of the same size does not have the same chance of being selected.

The histogram to the right represents the weights​ (in pounds) of members of a certain​ high-school programming team. How many team members are included in the​ histogram?

The sample size can be found by adding the heights of all the bars in the histogram.

Which of the following conditions regarding sample size must be met to apply the Central Limit Theorem for Sample Proportions? The sample size is large enough that the sample expects at least 10 successes and 10 failures. The sample size must be at least ½ the population size. The sample size must be at least 1/10 the population size. The samples size is large enough that the sample expects at least 50 successes and 50 failures.

The sample size is large enough that the sample expects at least 10 successes and 10 failures.

What does n denote?

The sample size, which is the number of of data values.

Describe the sample standard deviation in words rather than with a formula.

The sample standard deviation is the square root of the quotient of the sum of the squared deviations from the mean and (n - 1).

A community college school board is negotiating a new contract with the college faculty. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the school board wants to give the community the impression that the faculty are already overpaid, should they advertise the mean or median of the faculty salaries?

The school board should use the mean to make their argument. The mean will be higher than the median since it will be influenced by the few high salaries.

A histogram aids in analyzing the​ _______ of the data.

The shape of the distribution

The standard deviation is used in conjunction with the _____ to numerically describe distributions that are bell shaped. The ____ measures the center of the​ distribution, while the standard deviation measures the ____ of the distribution.

The standard deviation is used in conjunction with the MEAN to numerically describe distributions that are bell shaped. The MEAN measures the center of the​ distribution, while the standard deviation measures the SPREAD of the distribution.

If all the data values in a set are identical, what can you conclude about the standard deviation?

The standard deviation is zero.

Allie calculated a correlation coefficient of -0.5. She made a mistake in her calculation since the correlation coefficient cannot be negative.

The statement is false.

Alex calculated a correlation coefficient of -1.5. He made a mistake in his calculation since the correlation coefficient has to be between -1 and 1.

The statement is true.

The lengths of the rows are similar to the heights of bars in a​ histogram; longer rows of data correspond to higher frequencies. Generally, stem-and-leaf plot(s) are a (visual) 90 degree rotation, representative of a histogram (lengths being equal to heights).

The stem-and-leaf plot (above) shows the test scores 67, 73, 85, 75, 89, 89, 88, 90, 98, 100. How does the​ stem-and-leaf plot show the distribution of these​ data?

The average on a exam is 72 with a standard deviation of 6. A student scores a 66 on the exam. Which of the following is correct?

The student's score is 1 standard deviation below the exam average

Indicate whether the study is an observational study or a controlled experiment. A group of boys is randomly divided into two groups. One group watches violent cartoons for one hour, and the other group watches cartoons without violence for one hour. The boys are then observed to see how many violent actions they take in the next two hours, and the two groups are compared.

The study is a controlled experiment.

Exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.

What is linear correlation?

Identify the requirements for a discrete probability distribution.

The sum of the probabilities must equal one. Each probability must be between zero and one inclusive.

Two symbols are used for the mean: μ and x. Which represents a parameter and which a statistic?

The symbols μ represents a parameter and x represents a statistic.

The distribution appears to be skewed to the right ​(or positively ​skewed).

The the frequency polygon (above), represents data from the frequency distribution of the number of people from a​ town aged​ 25-64, who subscribe to a certain print magazine. Does the graph (above) suggest that the distribution is​ skewed? If​ so, how?

Which of the following is not a requirement of the binomial probability distribution? a. Each trial must have all outcomes classified into two categories b. The trials must be dependent. c. The procedure has a fixed number of trails. d. The probability of a success remains the same in all trails.

The trails must be dependent (For a binomial distribution, the trials must be independent.)

Which of the following is not a criterion for the binomial distribution?

The trials must be dependent.

When two dice are rolled, is the event "the first die shows a number greater than 2 on top" independent of the event "the second die shows a number greater than 2 on top?"

The two events are independent because the result of the first die does not affect the result of the second die.

The tallest living man has a height of 243 cm. The tallest living woman is 234 cm tall. Heights of men have a mean of 173 cm and a standard deviation of 7 cm. Heights of women have a mean of 162 cm and a standard deviation of 5 cm. Relative to the population of the same​ gender, who is​ taller? Explain.

The two heights are from very different​ populations, so a comparison requires that the heights be standardized by converting them to z scores. To determine the z​ score, use one of the following expressions. The variables​ z, x, x, s, μ,σ correspond to the z​ score, data value in​ question, sample​ mean, sample standard​ deviation, population​ mean, and population standard​ deviation, respectively. Use the Z score for population formula: Men: Z= 243-173/7=10 Women: Z= 234-162/5= 14.4 Note that the highest relative height will have a greater z score. Relative to the population of the same​ gender, who is​ taller? Why? The woman is relatively taller because her z score is greater.

Upper Class Limit

The upper class limit represents the largest data value that can be included in the class.

When applying the Central Limit Theorem for Sample Proportions, which of the following can be substituted for p when calculating the standard error if the value of p is unknown? The value of the sample proportion The value of the sample standard deviation The value of the sample mean None of these. The standard error cannot be computed if the value for p is unknown.

The value of the sample proportion

What determines the exact shape of a Normal distribution?

The values of the mean and the standard deviation

One of the tallest living men has a height of 240 cm. One of the tallest living women is 227 cm tall. Heights of men have a mean of 177 cm and a standard deviation of 6 cm. Heights of women have a mean of 163 cm and a standard deviation of 5 cm. Relative to the population of the same gender, who is taller? Explain.

The woman is relatively taller because the z score for her height is greater than the z score for the man's height.

If your score on your next statistics test is converted to a z score, which of these z scores would you prefer. -2.00, -1.00, 0, 1.00, 2.00? Why?

The z score of 2.00 is most preferable because it is 2.00 standard deviation above the mean and would correspond to the highest of the five different possible test scores.

If your score on your next statistics test is converted to a z​ score, which of these z scores would you​ prefer: minus−​2.00, minus−​1.00, ​0, 1.00,​ 2.00? Why?

The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.

If your score on your next statistics test is converted to a z​ score, which of these z scores would you​ prefer: −2.00, −​1.00, ​0, 1.00,​ 2.00? Why?

The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.

If your score on your next statistics test is converted to a z​ score, which of these z scores would you​ prefer: −​2.00, −​1.00, ​0, 1.00,​ 2.00? Why?

The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.

unitless

The z-score is ... It has mean 0 and standard deviation 1.

If you calculate the z-score for your height in inches, what unit is used on the z-score?

The z-score will have no units.

If someone's gross annual income has a z-score of positive 2, what can be concluded?

Their income is 2 standard deviations above the mean income

If​ someone's gross annual income has a​ z-score of positive​ 2, what can be​ concluded?

Their income is 2 standard deviations above the mean income.

Given below are the numbers of indoor movie​ theaters, listed in order by row for each year. Use the given data to construct a​ time-series graph. What is the​ trend? How does this trend compare to the trend for​ drive-in movie​ theaters? What is the​ trend? How does this trend compare to the trend for​ drive-in movie​ theaters?

There appears to be an upward​ trend, unlike​ drive-in movie​ theaters, which have a downward trend.

Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?

There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.

Which of the following is NOT true about statistical​ graphs?

They utilize areas or volumes for data that are​ one-dimensional in nature.

For a data set of weights​ (pounds) and highway fuel consumption amounts​ (mpg) of six types of​ automobile, the linear correlation coefficient is found and the​ P-value is 0.025. Write a statement that interprets the​ P-value and includes a conclusion about linear correlation.

The​ P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 2.5%, which is low, so there is sufficient evidence to conclude that there is a linear correlation between weight and highway fuel consumption in automobiles.

For a data set of brain volumes ​(cm3​) and IQ scores of four ​males, the linear correlation coefficient is found and the​ P-value is 0.336. Write a statement that interprets the​ P-value and includes a conclusion about linear correlation.

The​ P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 33.6, which is high, so there is not sufficient evidence to conclude that there is a linear correlation between brain volume and IQ score in males.

Quartiles

This divides data sets into fourths, or four equal parts.

A magician claims that he has a fair coin—"fair" because both sides, heads and tails, are equally likely to land face up when the coin is flipped. He tells you that if you flip the coin eight times, the probability of getting eight heads is 1/256. Is this an example of a theoretical probability or an empirical probability? Explain.

This is an example of theoretical probability because it is not based on an experiment.

Indicate whether the following study is an observational study or a controlled experiment. Records of patients who have had broken ankles are examined to see whether those who had physical therapy achieved more ankle mobility than those who did not.

This is an observational study. Since the researchers did not assign subjects to the control or treatment group beforehand, they did not satisfy a key feature of controlled experiments

A company was conducting a survey to investigate​ people's spending habits and how they may have changed in recent years. One question on the survey​ was, "Did you spend​ more/less/the same amount of money this year as you did in​ 2007, the year the recession began in earnest in this​ country?" Is this question​ biased? If​ so, what answer does it​ favor?

This question is biased toward​ "spend less," since it mentions the recent recession. Many people would feel that they should answer that they spent​ less, since the country is in a recession.

Indicate whether the study is an observational study or a controlled experiment. A researcher was interested in the effects of exercise on academic performance in elementary school children. She went to the recess area of an elementary school and identified some students who were exercising vigorously and some who were not. The researcher then compared the grades of the exercisers with the grades of those who did not exercise.

This study is an observational study.

Stratified

To determine her air quality​, Samantha divides up her day into three​ parts: morning,​ afternoon, and evening. She then measures her air quality at 33 randomly selected times during each part of the day. What type of sampling is used? Stratified Random Convenience Systematic Cluster

Stratified

To determine her heart rate​, a subject divides their day into three​ parts: morning,​ afternoon, and evening. They then measure their heart rate at 22 randomly selected times during each part of the day. What type of sampling was used? Random Stratified Cluster Convenience Systematic

Decide if the following statement is true or false and explain your answer. P(Z<2.50) = P(Z ≤ 2.50)

True; these two probabilities are equal because there is no area under the standard normal curve associated with a single value.

When making predictions based on regression lines, which of the following is not listed as a consideration? -Use the regression equation for predictions only if the graph of regression line on the scatter-plot confirms that the regression line fits the point reasonably well. -Use the regression equation for prediction only if the linear correlation coefficient r indicates that there is a linear correlation between two variables. -Use the regression line for prediction only if the data go far beyond the scope of the available sample data. -If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate.

Use the regression line for prediction only if the data go far beyond the scope of the available sample data

An instructor at the College of Lake County is interested in the average number of days that CLC math students are absent from class during a semester. Let X = number of days that a CLC math student is absent. Then X is an example of a:

Variable

The correlation coefficient makes sense only if the trend is linear and the _______.

Variables are numerical

Which characteristic of data is a measure of the amount that the data values vary?

Variation

Which characteristic of data is a measure of the amount that the data values​ vary?

Variation

COMPLEMENT RULE

WHEN EVENTS DON'T OCCUR USE P(A) = 1-P(A)

CONTINUOUS DATA

WOULD BE ON A THERMOMETER.

Days before a presidential election, a nationwide random sample of registered voters was taken. Based on this random sample, it was reported that "52% of registered voters plan on voting for Robert Smith with a margin of error of ±3%." The margin of error was based on a 95% confidence level. Fill in the blanks to obtain a correct interpretation of this confidence interval. We are ______ confident that the _______ of registered voters _______ planning on voting for Robert Smith is between _______ and _______.

We are 95% confident that the percentage of registered voters in the nation planning on voting for Robert Smith is between 49% and 55%.

Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained the following 95% confidence interval for the proportion of all adults in the city rooting for North High, (0.52,0.68). Interpret this confidence interval.

We are 95% sure that between 52% and 68% of all adults in this city will root for North High School.

Which of the following statements about correlation is true? -We say that there is a positive correlation between x and y if there x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if there is no distinct pattern in the scatter-plot. -We say that there is a negative correlation between x and y if the x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values decrease.

We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.

Construct the cumulative frequency distribution that corresponds to the given frequency distribution. Weight (oz): Number of stones 1.2-1.6: 5 1.7-2.1: 2 2.2-2.6: 5 2.7-3.1: 5 3.2- 3.6: 13

Weight (oz): Cumulative Frequency 1.2-1.6: 5 1.7-2.1: 7 2.2-2.6: 12 2.7-3.1: 17 3.2- 3.6: 30

1. The value of r is always between -1 and 1 inclusive. -1≤ r ≤ 1 2. If all values of either variable are converted to a different scale, the value of r does not change. 3. The value of r is not affected by the choice of x or y. Interchange all x- and y-values and the value of r will not change. 4. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. 5. r is very sensitive to outlliers in the sense outlier can dramatically affect its value.

What are the properties of r?

1. To assume that correlation implies causality 2. Another error arises with data based on averages Averages suppress individual variation and may inflate the correlation coefficient. 3. A third error involves the property of linearity. If there is no linear correlation, there might be other correlation that is not linear.

What are the three most common errors made in interpreting correlation results?

Nonlinear relationship

What can we say about r = -.087

That if increase the x (the length) by 1 cm, the predicted height of the person will increase by 3.22 cm.

What does ^y=80.9 + 3.22x tell us?

Expresses a linear relationship between a response variable y and two or more predictor variables (x1,x2...xk) The general form of a multiple equation obtained from sample data is: ^y = b0 + b1x1 + b2x2 + ....bkxk

What is a multiple regression equation?

For a pair of sample x,y values, the residual is the difference between the observed sample value of y and the y value that is predicted by using the regression equation. resi=observed y- predicted y = y-^y

What is a residual?

The number that measures how well paired sample data fit a straight-line pattern when graphed.

What is the linear correlation coefficient r?

[St.Dev. 1]+[St.Dv. 2] = [34.7 + 13.5] = 48.2% probability

What is the probability that a randomly selected time falls between 40 and 42 seconds?

C. If the device eliminated all bike thefts, it would reduce odds of bike theft by 100%, so the 300% figure is misleading.

What is wrong with this​ statement: An ad for a device used to discourage bike thefts stated: "This device reduces your odds of bike theft by 300 percent." Choose the correct answer below. A. If bike theftsbike thefts fell by​ 100%, it would be cut in half.​ Thus, a decrease of​ 200% means that it would be totally​ eliminated, and a decrease of more than​ 200% is impossible. B. The actual amount of the decrease in bike thefts is less than​ 100%. C. If the device eliminated all bike thefts, it would reduce odds of bike theft by 100%, so the 300% figure is misleading. D. The statement does not mention the initial amount of bike thefts.

​Ideally, the standard deviation would be zero because all the measurements should be the same.

What should be the value of the standard​ deviation?

Z-SCORE

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a

Z-score

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a​ _______.

Which of the following is NOT a property of the standard deviation? -The value of the standard deviation is never negative -The standard deviation is a measure of variation of all data values from the mean. -When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations. -The units of the standard deviation are the same as the unites of the original data.

When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations.

Which of the following is NOT a property of the standard deviation? a. When comparing variation in samples with very different means, it is good practice to compare the two standard deviation. b. The value of the standard deviation is never negative c. The st. dev. is a measure of variation of all data values from the mean. d. The units of the st. dev. are the same as the units of the original data.

When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.

Which of the following is NOT a property of the standard​ deviation?

When comparing variation in samples with very different​ means, it is good practice to compare the two sample standard deviations.

A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be​ larger, the mean or the​ median? Why?

When data are either skewed left or skewed​ right, there are extreme values in the​ tail, which tend to pull the mean in the direction of the tail. If the distribution of the data is skewed​ right, there are large observations in the right tail. These observations tend to increase the value of the​ mean, while having little effect on the median.

Fill in the blank. When drawings of objects are used to depict​ data, false impressions can be made. These drawings are called​ _______.

When drawings of objects are used to depict​ data, false impressions can be made. These drawings are called pictographs.

When is a Data Set Multimodal?

When more than two data values occur with the same greatest frequency, each one is a mode and the data set is said to be multimodal.

In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as ---.

a nonzero axis

When it refers to a normal distribution does the term "normal" have the same meaning as in ordinary language? What criterion can be used to determine whether the data depicted in a histogram have a distribution that is approximately a normal distribution? Is this criterion totally objective, or does it involve subjective judgement?

When referring to a normal distribution, the term normal has a meaning that is different from its meaning in ordinary language. A normal distribution is characterized by a histogram that is approximately bell-shaped. Determination of whether a histogram is approximately bell-shaped does require some subjective judgment.

Round-Off Rule for Measures of Variation

When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.

Raw data

When the data are in original form, they are called raw data

Identify when the interquartile range is better than the standard deviation as a measure of dispersion and explain its advantage.

When the distribution is skewed left or right or contains some extreme observations, then the interquartile range is preferred since it is resistant.

Ungrouped Frequency Distribution

When the range of the data values is relatively small, a frequency distribution can be constructed using single data values for each class. This type of distribution is called an ungrouped frequency distribution

When is a Data Set Bimodal?

When two data values occur with the same greatest frequency, each one is a mode and the data set is bimodal.

A negative​ z-score indicates a data value is less than the mean.

Whenever a data value is less than the​ mean,

RANGE

Which measure of variation is very sensitive to extreme​ values?

Range

Which measure of variation is very sensitive to extreme​ values?

The mean is called the average by statisticians

Which of the following is NOT a characteristic of the​ mean?

Quantitative

Which of the following is NOT a level of​ measurement? Ordinal Nominal Ratio Quantitative

Quantitative

Which of the following is NOT a level of​ measurement? Quantitative Nominal Ordinal Ratio

C. Utilizing valid statistical methods and correct sampling techniques

Which of the following is NOT a misuse of​ statistics? A. Concluding that a variable causes another variable because they have some correlation B. Misleading graphs C. Utilizing valid statistical methods and correct sampling techniques D. Making conclusions about a population based on a voluntary response sample

D. Utilizing valid statistical methods and correct sampling techniques

Which of the following is NOT a misuse of​ statistics? A. Misleading graphs B. Making conclusions about a population based on a voluntary response sample C. Concluding that a variable causes another variable because they have some correlation D. Utilizing valid statistical methods and correct sampling techniques

When comparing variation in samples with very different​ means, it is good practice to compare the two sample standard deviations.

Which of the following is NOT a property of the standard​ deviation?

MEAN

Which of the following is NOT a value in the​ 5-number summary?

B. Quiz scores from a college level statistics course are analyzed to determine student progress.a Not voluntary (and no bias).

Which of the following is NOT a voluntary response​ sample? A. A radio station asks for​ call-in responses to a question concerning city recycling. B. Quiz scores from a college level statistics course are analyzed to determine student progress. C. A local dentist asks her patients to fill out a questionnaire and mail it back to determine the quality of the care received during an office visit. D. A survey is taken at a mall by asking passersby if they will fill out the survey.

A. Quiz scores from a college level statistics course are analyzed to determine student progress.

Which of the following is NOT a voluntary response​ sample? A. Quiz scores from a college level statistics course are analyzed to determine student progress. B. A radio station asks for​ call-in responses to a question concerning city recycling. C. A survey is taken at a mall by asking passersby if they will fill out the survey. D. A local dentist asks her patients to fill out a questionnaire and mail it back to determine the quality of the care received during an office visit.

When thinking about the variability of a categorical distribution, it is sometimes useful to think of the word_______.

diversity

B. In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same.

Which of the following is always​ true?. A. For skewed​ data, the mode is farther out in the longer tail than the median. B. In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same. C. The mean and median should be used to identify the shape of the distribution. D. Data skewed to the right have a longer left tail than right tail.

The frequency distribution below shows arrival delays for airplane flights. Arrival_delay_(min) Frequency (-60)-(-31) 11 (-30)-(-1) 28 0-29 11 30-59 0 60-89 2 Use the frequency distribution to construct a histogram. Which part of the histogram depicts flights that arrived​ early, and which part depicts flights that arrived​ late?

Which part of the histogram depicts flights that arrived​ early, and which part depicts flights that arrived​ late? The two leftmost bars depict flights that arrived​ early, and the other bars to the right depict flights that arrived late.

The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?

With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.

Does the result appear to have a normal​ distribution? Why or why​ not?

Yes, because the frequencies start low comma reach a maximum comma then become low again, and are roughly symmetric about the maximum frequency.

Look at #6 chart and answer the question: Does the frequency distribution appear to have a normal distribution? Explain.

Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric.

Look at #11 at the charts and answer the question: Does the result appear to have a normal distribution? Why or why not?

Yes, because the frequencies start low, reach a maximum, then become low again, and are roughly symmetric about the maximum frequency.

Does the frequency distribution appear to have a normal​ distribution? Explain. Temperature_ (F) Frequency 35-39 1 40-44 3 45-49 10 50-54 12 55-59 10 60-64 2 65-69 1

Yes, because the frequencies start​ low, proceed to one or two high​ frequencies, then decrease to a low​ frequency, and the distribution is approximately symmetric.

The graph to the right uses cylinders to represent barrels of oil consumed by two countries. Does the graph distort the data or does it depict the data​ fairly? Why or why​ not? If the graph distorts the​ data, construct a graph that depicts the data fairly.

Yes, because the graph incorrectly uses objects of volume to represent the data.

Look at the #41 charts and answer the questions: Does the configuration of the points appear to suggest that the volumes are from a population with a normal distribution? Are there any outliers?

Yes, the population appears to have a normal distribution because the dotplot resembles a "bell" shape. Yes, the volume of 50 oz appears to be an outlier because it is far away from the other volumes.

An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects ten schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? Simple random sample? Explain

Yes; no. The sample is random because all teachers have the same chance of being selected. It is not a simple random sample because some samples are not possible, such as a sample that includes teachers from schools that were not selected.

Suppose a student earns a 75 on his statistics​ exam, and his grade has a​ z-score of 1.5. Since the class did not perform well on the​ exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the​ student's z-score?

Your​ z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.

99.7% within 3 Standard deviation

[99.7-95= 4.7/2 = 2.35% >> [2.35% |..|.. () ..|..|2.35%]

Mean from frequency distribution

[Sigma(f*x)]/(Sigma*f). First multiply each frequency and class midpoint; then add the products. DIVIDED BY. Sum of frequencies.

median

a data set = the MOC that is the middle value when the original data values are arranged in order of increasing/decreasing magnitude

Identify the class​ width, class​ midpoints, and class boundaries for the given frequency distribution. Daily Low Temperature ​ (degrees°​F) 40-42 43-45 46-48 49-51 52-54 55-57 58-60 Frequency 1 3 5 11 7 7 1 a) What is the class​ width? b) What are the class​ midpoints? c) What are the class​ boundaries?

a) 3 b) 41, 44, 47, 50, 53, 56, 59 c) 39.5​, 42.5​, 45.5​, 48.5​, 51.5​, 54.5​, 57.5​, 60.5

Identify the class​ width, class​ midpoints, and class boundaries for the given frequency distribution. Height​ (inches) 65.0-68.9 69.0-72.9 73.0-76.9 77.0-80.9 81.0-84.9 85.0-88.9 89.0-92.9 93.0-96.9 97.0-100.9 101.0-104.9 Frequency 4 25 9 1 0 0 0 0 0 1 a) What is the class​ width? b) What are the class​ midpoints? ​(Use ascending order. Round to two decimal places as​ needed.) c) What are the class​ boundaries? ​(Use ascending order. Round to two decimal places as​ needed.)

a) 4 b) 66.95​, 70.95​, 74.95, 78.95, 82.95​, 86.95​, 90.95​, 94.95​, 98.95​, 102.95 c) 64.95​, 68.95​, 72.95, 76.95​, 80.95​, 84.95​, 88.95​, 92.95​, 96.95​, 100.95​, 104.95

Identify the class width and class midpoints. Height (inches) Frequency 59.0-61.9 4 62.0-64.9 25 65.0-67.9 9 68.0-70.9 1 71.0-73.9 0 74.0-76.9 0 77.0-79.9 0 80.0-82.9 0 83.0-85.9 0 86.0-88.9 1

a). What is the class​ width? 3 b.)What are the class​ midpoints? 60.45, 63.45​, 66.45​, 69.45 ​,72.45 ​,75.45​, 78.45​, 81.45​,84.45, 87.45

Which of the following is NOT true about statistical graphs? a. They utilize areas or volumes for data that are one-dimensional in nature. b. Similar graphs can be constructed in order to compare data sets. c. They can be used to identify extreme data values. d. They can be used to consider the overall shape of the distribution.

a.

A​ _______ is a graph of each data value plotted as a point.

dot plot

A ________ is a graph of each data value plotted as a point

dotplot

The classical approach to probability requires that the outcomes are

equally likely

Listed below are the measured radiation emissions (in W/kg) corresponding to cell phones: A, B, C, D, E, F, G, H, I, J, and K respectively. The media often present reports about the dangers of cell phone radiation as a cause of cancer. Cell phone radiation must be 1.6 W/kg or less. Find the a. mean, b. median, c. midrange, d. mode for the data. Also complete part e. 1.47 1.46 1.38 0.26 0.57 0.92 0.44 0.67 0.55 0.36 1.56 a. find the mean b. Find the median. c. Find the midrange. d. Find the mode. e. If you are planning to purchase a cell phone, are any of the measures of center the most important statistic? Is there another statistic that is most relevant? If so, which one?

a. .876 b. .67 c. .91 d. There is no mode. e. The maximum data value is the most relevant statistic, because it is closest to the limit of 1.6 W/kg and that cell phone should be avoided.

A particular group of men have heights with a mean of 173 cm and a standard deviation of 7 cm. Richard had a height of 199 cm. a. What is the positive difference between Richard's height and the mean? b. How many standard deviations is that [the difference found in part (a)]? c. Covert Richard's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between -2 and 2, is Richard's height usual or unusual?

a. 26 cm b. 3.71 c. 3.71 d. Unusual

With a height of 61 in., George was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 65.6 in and a standard deviation of 1.7 in. a. What is the positive difference between George's height and the mean? b. How many standard deviations is that [the difference found in par (a)]? c. Convert George's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between -2 and 2, is George's height usual or unusual?

a. 4.6 in. b. 2.71 c. -2.71 d. Unusual

An insurance institute conducted tests with crashes of new cars traveling at 6 mi/h. The total cost of the damages was found for a simple random sample of the tested cars and listed below. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Do the different measures of center differ very much? $7,526 $4,949 $9,127 $6,403 $4,287 a. The mean is ---. b. The median is ---. c. Find the mode. d. The midrange is ----. Do the different measures of center differ very much?

a. 6458.4 b. $6,403 c. There is no mode d. $6707 The different measures of center do not differ by very large amounts.

The graph to the right shows the braking distances for different cars measured under the same conditions. Describe the ways in which this graph might be deceptive. How much greater is the braking distance of Car A than the braking distance of Car​ C? Draw the graph in a way that depicts the data more fairly. a. In what way might the graph be​ deceptive? b. How much greater is the braking distance of Car A than the braking distance of Car​ C?

a. By starting the horizontal axis at​ 100, the graph cuts off portions of the bars. b. The braking distance of Car A is about 40​% greater than the braking distance of Car C.

Response bias

exist when the answers on a survey do not reflect the true feelings of the respondent

Nonresponse bias

exists when individuals selected to be in the sample who do not respond to the surgery have different opinions from those who do

a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in each​ row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists of all students in the row corresponding to the outcome of the die. b. For the same class described in part​ (a), the 36 student names are written on 36 individual index cards. The cards are shuffled and six names are drawn from the top. c. For the same class described in part​ (a), the six youngest students are selected.

a. This sample is not a simple random sample. It is a random sample. b. This sample is a simple random sample. It is a random sample. c. This sample is not a simple random sample. It is not a random sample.

With a height of 70​in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1in and a standard deviation of 2.4in. a.What is the positive difference between Roger​'s height and the​ mean? b.How many standard deviations is that? [the difference found in part​ (a)]? c.Convert Roger​'s height to a z score. d.If we consider​ "usual" heights to be those that convert to z scores between minus−2 and​ 2, is Roger​'s height usual or​ unusual? **when you enter the formulas, change the negative signs on all answers. They should all be positive. Instead of -9.1, the answer would be 9.1

a. To find the positive difference between Roger​'s height and the​ mean, subtract the mean from Roger​'s height and find the absolute value of the difference. I 70cm-75.1cm I = 5.1in b. To determine how many standard deviations the difference​ is, compare the​ difference, 5.1​, to the standard​ deviation, 2.4. 5.1/2.4 = 2.13 c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. - uses notes for formula - since the club is a population, use the population formula. 70-75.1/2.4 = -2.13 d. The z score is -2.13. Since​ "usual" heights are considered to be those that convert to z scores between −2 and​ 2, Roger​'s height is unusual.

Use the pulse rates​ (beats per​ minute) of males in the accompanying data set to construct a frequency distribution. Begin with a lower class limit of 40 and use a class width of 10. Do the pulse rates of males appear to have a normal​ distribution? b.) Do the pulse rates of males appear to have a normal​ distribution?

a.) Pulse Rate Frequency 40​-49 2 50​-59 21 60​-69 55 70​-79 41 80​-89 27 90​-99 5 100​-109 2 b.)The pulse rates of males appear to have a normal distribution because the frequencies start low, increase, and then decrease; and are roughly symmetric.

The table below shows the magnitudes of the earthquakes that have occurred in the past 10 years. magnitude Frequency 5.0-5.9 6 6.0-6.9 6 7.0-7.9 13 8.0-8.9 4 9.0-9.9 2 Use the frequency distribution to construct a histogram. Using a loose interpretation of the requirements for a normal​ distribution, does the histogram appear to depict data that have a normal​ distribution? Why or why​ not?

a.) Does the histogram appear to depict data that have a normal​ distribution? The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then​ decrease, and the histogram is symmetric.

xi means

all x values

outliers

are sample values that lie very far away from the majority of the other sample values.

Standard Deviation

average distance from the mean, square root of variance; used much more often - Population STD: sigma - Sample STD: S

Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same. c. Data skewed to the right have a longer left tail than right tail. d. The mean and median should be used to identify the shape of the distribution.

b.

A graphical display of a data set is given. Identify the overall shape of the distribution as (roughly) bell-shaped, triangular, uniform, reverse J-shaped, J-shaped, right skewed, left skewed, bimodal, or multimodal. A relative frequency histogram for the heights of a sample of adult women is shown below.

bell shaped

Before using the normal model to represent a data set, first check that the shape of the data's distribution is what shape?

both symmetric and unimodal

Which of the following is NOT a measure of center? a. mode b. mean c. census d. median

c.

Which of the following is NOT a measure of center? -census -mean -median -mode

census

Which of the following is NOT a measure of​ center?

census

Time

changing characteristics of the data over time (CVDOT)

____ is the difference btw two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.

class width

The process of representing categorical variables with numbers (such as letting a 1 represent "smoker" and a 0 represent "non-smoker") is called _______.

coding

The ___________ for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean.

coefficient of variation

A ___ probability of an event is a probability obtained with knowledge that some other event has already occurred.

conditional (knowledge)

Descriptive statistics

consists of organizing and summarizing information

2. When you need to find a proportion between 2 positive OR 2 negative z-scores, you:

consult the *mean to z column* for both. Find proportions & subtract the smaller from the larger.

Data are more than just numbers, because data have ______.

context

Which of the following is NOT a characteristic of the mean? a. The mean is sensitive to outliers. b. The mean is relatively reliable. c. The mean takes every data value into account. d. The mean is called the average by statisticians.

d.

Which of the following is NOT a characteristic of the mean? a.) The mean is sensitive to outliers. b.) The mean is relatively reliable. c.) The mean takes every data value into account. d.) The mean is called the average by statisticians.

d.) The mean is called the average by statisticians.

mode

data set = the value that occurs with the greatest frequency

Methods used that summarize or describe characteristics of data are called _______ statistics.

descriptive

If every x value is transformed into a z-score, then the distribution of z-scores will have what following properties regarding shape, mean, and standard deviation?

distribution of z-scores will have exactly the same shape as original distribution of scores; z-score mean will always have mean of 0 & z-scores will always have standard deviation of 1.

Below are 36 sorted ages of an acting award winner. Find Upper P using the method presented in the textbook. 30 18,18,19,21,22,25,26,26,29,31,32,34,37,41,42,42,43,45,47,49,51,5,51,52,55,58,58,59,62,63,64,65,67,74,74,76

next compute L=(k Over 100)times n where n is the total number of values in the data set and k is the percentile being used. n=36 k= 30 30/100*36 10.8 L=11 p 30=32

In modified boxplots, a data value is a(n) - if it is above Q3 + (1.5)(IQR) or below Q1 - (1.5)(IQR)

outlier

In modified boxplots, a data value is a(n)_______ if it is above Q3_(1.5)(IQR) or below Q1-(1.5)(IQR).

outlier

In modified​ box plots, a data value is​ a(n) _______ if it is above Q3+​(1.5)(IQR) or below Q1−​(1.5)(IQR).

outlier

In modified​ boxplots, a data value is​ a(n) _______ if it is above Q+​(1.5)(IQR) or below Q−​(1.5)(IQR). 3 1

outlier

In modified​ boxplots, a data value is​ a(n) _______ if it is above Q3plus+​(1.5)(IQR) or below Q1minus−​(1.5)(IQR).

outlier

- are sample values that lie very far away from the majority of the other sample values.

outliers

Correlation is affected by ____.

outliers

________are sample values that lie very far away from the majority of the other sample values.

outliers

unusual

outside 2 standard deviations

A health and fitness club reviews the weights of all of their members, and found that the average weight was 148 lb. Is this value a statistic or a parameter?

parameter

When drawings of objects are used to depict​ data, false impressions can be made. These drawings are called​ _______.

pictographs

When drawings of objects are used to depict​ data, false impressions can be made. These drawings are called

pictographs.

When drawings of objects are used to depict​ data, false impressions can be made. These drawings are called​ _______.

pictographs.

The most appropriate graphical display of categorical data is

pie chart

In statistics, the data we work with is just one part of a bigger picture called the ____.

population

N means

population

ρ = [1 / N] Σ { [ (xi - μx) / σx][ (Yi - μY) / σy ] } is the equation for what?

population correlation coefficient

What do each of the symbols mean in the equation ρ = [1 / N] Σ { [ (xi - μx) / σx][ (Yi - μY) / σy ] }? ρ = ____ N = ____ ∑ = ____ xi = ____ μx = ____ σx = ____ yi = ____ μy = ____

population correlation coefficient; population observation number; sum of; observation x; x population mean; population standard deviation; observation y; y population mean

"Mu" [µ] means

population mean

σ

population standard deviation

A ___________ is the complete collection of all measurements or data collected, whereas, a __________ is a subcollection of members selected from the complete collection. population; sample sample; population sample; census population; parameter

population; sample

A ____ correlation means that if one variable gets bigger, the other variable tends to get bigger.

positive

r = Σ(xy) / √[ ( Σ x² )( Σ y² ) ] is the formula for what?

product-moment correlation coefficient

What do each of the symbols mean in the formula r = Σ(xy) / √[ ( Σx² )( Σy² ) ] r = ____ ∑ = ____ x = ____ (formula) y = ____ (formula)

product-moment correlation coefficient; sum of; = (xi - ẋ); = (yi - ẏ)

percentile

provided information about how the data are spread over the interval from the smallest value to the largest value. (Recall the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile)

Mode is primarily a measure of

qualitative central tendency

Obtain the linear correlation coefficient for the data. Round your answer to three decimal places. Managers rate employees according to job performance (x) and attitude (y). The results for several randomly selected employees are given below. X: 59, 63, 65, 69, 58, 77, 76, 69, 70, 64, Y:72, 76, 78, 82, 75, 87, 92, 83, 87, 78

r=0.863 (edit-list-stat-linreg(ax+b)

What measure of variation is very sensitive to extreme values?

range

range rule of thumb to find SD

range/4

Modified Boxplots

regular boxplot constructed with these modifications: 1. A special symbol used to identify outliers. 2. The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier.

A​ _______ histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

relative frequency

In a -- distribution, the frequency of a class is replaced with a proportion or percent.

relative frequency

In a ___ distribution, the frequency of a class is replaced with a proportion or percent.

relative frequency

A​ _______ histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

relative frequency histogram A relative frequency histogram has the same shape and horizontal scale as a​ histogram, but the vertical scale is marked with relative frequencies​ (as percentages or​ proportions) instead of actual frequencies.

stem plots

represents quantitative data by separating each value into two parts: the stem (the leftmost digit) and the leaf ( the rightmost digit)

Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.

resistant

How to solve for a cumulative frequency distribution.

rewrite the table as less than the lower class boundary of the next class. EX: 20-29, 30-39 you would use "less than 30" Make sure to add the frequencies less than the number next to the phrase "less than"

In a​ boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left​ whisker, the distribution is skewed_______

right

The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.

s = range / 4

The Range Rule of Thumb roughly estimates the st. dev. of a data set as ___.

s = range/4

the symbol for sample variance is

s squareds2

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​ _______.

s= range/4

The Range Rule of Thumb is a rough estimate of the standard deviation of a data set as _______

s=range/4

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​ _______.

s=range/4

the symbol sample variance is

s^2

The _______ is/are a subset of the population that is being studied.

sample

r = [ 1 / (n - 1) ] Σ { [ (xi - x) / sx ] [ (yi - y) / sy ] } is the equation for what?

sample correlation coefficient

What do each of the variables mean in the equation r = [ 1 / (n - 1) ] Σ { [ (xi - ẋ) / sx ] [ (yi - y) / sy ] }? r = ____ n = ____ ∑ = ____ xi = ____ ẋ = ____ sx = ____ yi = ____ ẏ = ____ sy = ____

sample correlation coefficient; sample observation number; sum of; x observation; x mean; x standard deviation; y observation; y mean; y standard deviation

"x-bar" means

sample mean

s

sample standard deviation

Biased estimator

sample standard deviation, s, is this of the population standard deviation, little sigma. Values of the sample standard deviation s do NOT target the value of the population standard deviation little sigma. values of s generally tend to underestimate the value of little sigma.

What is s2 the symbol for?

sample variance

The correlation becomes weaker as the data points become more s____.

scattered

A - is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.

scatterplot

A _______ is a plot of paired data (x,y) and is helpful in determining wheather there is a relationship between the two varaibles

scatterplot

Fill in the blank. A histogram aids in analyzing the​ _______ of the data.

shape of the distribution A histogram is a visual tool used to represent and analyze data. It is basically a graphic version of a frequency​ distribution, and it can show the​ center, variation, and the shape of the distribution of the data.

z-score

standardized score; compare a data point to peers formula: data-mean/ standard deviation positive z-score means above average; negative z-score means below average units for z-scores are standard deviation, so anything can be compared anything above or below 2 is unusual

4. When you need to find the P for an area *greater than* a negative Z or *Less than* a positive Z use:

the *Body column*. Because the body column includes the mean & the tail.

For data sets having a distribution that approximately bell-shaped, ______ states that about 68% of all data values fall within one standard deviation from the mean.

the Empirical Rule

For data sets having a distribution that is approx. bell-shaped, ___________ states that about 68% of all data values fall within one standard deviation from the mean.

the Empirical Rule

For data sets having a distribution that is approximately bell-shaped, --- states that all about 68% of all data values fall within one standard deviation from the mean.

the Empirical Rule

Percentile

the Kth percentile, denoted Pk of a set of data is a value such that K percent of the observations are less than or equal to the value represented by the percentile, like class rank but the percentil starts from low to high, so 5th percentile is 5% of population has this or less and so forth, 95th is the top 95% of the data, and 95% of individuals and this number or less

Whenever a data value is less than the mean, ------------ (Hint: pertaining to z-score)

the corresponding z-score is negative

Whenever a data value is less than the mean, ______.

the corresponding z-score is negative

Whenever a data value is less than the​ mean

the corresponding z-score is negative

Whenever a data value is less than the​ mean, _______.

the corresponding z-score is negative

Normal When​ graphed, a normal distribution has a​ "bell" shape. Characteristics of the bell shape are​ (1) the frequencies increase to a​ maximum, and then​ decrease, and​ (2) symmetry, with the left half of the graph roughly a mirror image of the right half. Next Question

​A(n) _______ distribution has a​ "bell" shape.

Fill in the blank. ​A(n) _______ distribution has a​ "bell" shape.

​A(n) normal distribution has a​ "bell" shape.

Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each​ day, which measure of central tendency better describes the typical number of text messages per​ day?

​Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.

If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the​ Internet, can a graph help to overcome the deficiency of having a voluntary response​ sample?

​No, a graph cannot help to overcome the deficiency. If the sample is a bad​ sample, there are no graphs or other techniques that can be used to salvage the data.

The table provided below shows paired data for the heights of a certain​ country's presidents and their main opponents in the election campaign. Construct a scatterplot. Does there appear to be a​ correlation?

​No, there does not appear to be a correlation because there is no general pattern to the data.

Construct a scatter diagram using the data table to the right. This data is from a study comparing the amount of tar and carbon monoxide​ (CO) in cigarettes. Use tar for the horizontal scale and use carbon monoxide​ (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO.

​Yes, as the amount of tar increases the amount of carbon monoxide also increases.

Does the frequency distribution appear to have a normal​ distribution? Explain. Temperature ​(degrees°​F) 40-44 45-49 50-54 55-59 60-64 65-69 70-74 Frequency 1 3 10 12 10 2 1

​Yes, because the frequencies start​ low, proceed to one or two high​ frequencies, then decrease to a low​ frequency, and the distribution is approximately symmetric. Bell-shaped.

The graph to the right uses cylinders to represent barrels of oil consumed by two countries. Does the graph distort the data or does it depict the data​ fairly? Why or why​ not? If the graph distorts the​ data, construct a graph that depicts the data fairly. Does the graph distort the​ data? Why or why​ not?

​Yes, because the graph incorrectly uses objects of volume to represent the data.


संबंधित स्टडी सेट्स

Myers Troubleshooting and managing PCs 2

View Set

Skin Integrity and Wound Care Prep U

View Set

Achieve 3000: Notes from the Emoji Graveyard

View Set

MAR4156 Int'l Marketing Ch. 3 Midterm Bank

View Set

Rivieren,watervallen en woestijnen.

View Set