First 20

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

A manufacturer of bolts has a​ quality-control policy that requires it to destroy any bolts that are more than 2 standard deviations from the mean. The​ quality-control engineer knows that the bolts coming off the assembly line have mean length of 8 cm with a standard deviation of 0.05 cm. For what lengths will a bolt be​ destroyed?

A bolt will be destroyed if the length is less than 7.9 7.9 cm or greater than 8.1 8.1 cm.

UNUSUAL

A data value is considered __________ if its​ z-score is less than minus 2 or greater than 2

The complement of "at least one" is ___.

"none."

"At least one" is equivalent to ____.

"one or more"

Scores of an IQ test have a​ bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the empirical rule to determine the following. ​(a) What percentage of people has an IQ score between 85 and 115​? ​(b) What percentage of people has an IQ score less than 55 or greater than 145​? ​(c) What percentage of people has an IQ score greater than 130​?

(a) 68% (b) .30% (c) 2.5%

Determine whether the given value is from a discrete or continuous data set. When a car is randomly​ selected, it is found to have an engine with 6 cylinders an engine with 6 cylinders.

A discrete data set because there are a finite number of possible values.there are a finite number of possible values.

What is an ogive?

A graph that represents the cumulative frequency or cumulative relative frequency for the class

What is a value at the center or middle of a data set?

A measure of center.

A particular country has 60 total states. If the areas of all 60 states area added and then the sum is divided by 60, the result 193,950 square kilometers. Determine whether this result is a statistic or a parameter

The result is a parameter because it describes some characteristics of a population

Determine whether the sample described below is a simple random sample. In the last yearyear​, 123 comma 423123,423 adults got marriedgot married in a county. A researcher plans to conduct a survey of 800800 of those newlyweds.newlyweds. After obtaining a list of those who got married commagot married, he numbers the list from 1 to 123 comma 423 comma123,423, and then he uses a computer to randomly generate 800800 numbers between 1 and 123 comma 423.123,423. His sample consists of the newlywedsnewlyweds corresponding to the selected numbers.

The sample is a simple random sample because every sample of size 800800 has the same chance of being selected.

Determine whether the sample described below is a simple random sample. In order to test for a difference in the way that workersworkers and non dash workersnon-workers purchase magazines commamagazines, a research institution polls exactly 638638 adult workersworkers and 638638 adult non dash workersnon-workers randomly selected from adults in the United States.

The sample is not a simple random sample because every sample of size 12761276 does not have the same chance of being selected.

Determine whether the sample described below is a simple random sample. A quality control engineer selects every 5000 thevery 5000th hairdryerhairdryer that isis produced.

The sample is not a simple random sample because every sample of the same size does not have the same chance of being selected.

What does n denote?

The sample size, which is the number of of data values.

Statistics

The science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer question. In addition, statistics is about providing a measure of confidence in any conclusions

What is the formula for the z-score?

z = x value - mean or mew/ divided by standard deviation or sigma. The numerator X - mew is a *deviation score*. The denominator expresses deviation in standard deviation units.

The​ _______ represents the number of standard deviations an observation is from the mean.

z-score

When a data is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a ______.

z-score

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a_____.

z-score

When a data value is converted to a standardized scale representing the number of st. dev. the data value lies from the mean, we call the new value a __.

z-score.

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a​ _______.

z-score.

The sum of the deviations about the mean always equals

zero

Σxi

{sum of}{all x values}

Find the population mean or sample mean as indicated. ​Population: 2​, 1​, 11​, 15​, 6

µ= 7

What is the symbol used to represent the population​ mean?

μ

What is the formula to find the mean of all values in a population?

μ = Σx / N

What is the symbol for population standard deviation?

σ

What is the symbol for population variance?

σ2

Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each​ day, which measure of central tendency better describes the typical number of text messages per​ day?

​Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.

In the data table​ below, the​ x-values are the weights​ (in pounds) of cars and the​ y-values are the corresponding highway fuel consumption amounts​ (in mi/gal). Weight​ (lb) 40884088 33583358 41334133 36503650 35453545 Highway Fuel Consumption​ (mi/gal) 2626 3131 2929 2929 3030 Comment on the source of the data if you are told that car manufacturers supplied the values. Is there an incentive for car manufacturers to report values that are not​ accurate?

​Yes, because​ consumers, in​ general, would prefer to buy a car with a higher level of fuel efficiency. In this​ case, the source of the data would be suspect with a potential for bias.

On a test, 74% of the questions are answered correctly. If 111 questions are correct, how many questions are on the test? 37 questions 67 questions 150 questions 82 questions

150 questions

95% of values in a normal distribution fall within

2 Standard deviations [95-68= 27/2 = 13.5] >> (13.5% | 34% () 34%|13.5%)

A company advertises a mean lifespan of 1000 hours for a particular type of light bulb. If you were in charge of quality control at the​ factory, would you prefer that the standard deviation of the lifespans for the light bulbs be 5 hours or 50​ hours? Why?

5 hours would be preferable since a smaller standard deviation indicates more consistency.

which percent of observations are expected to lie within 1 standard deviation of the mean?

68% >> (34% () 34%)

Emperical rule is known as

68%-95%-99.7% rule

For data sets having a distribution that is approx. bell-shaped, ___________ states that about 68% of all data values fall within one standard deviation from the mean.

the Empirical Rule

Percentile

the Kth percentile, denoted Pk of a set of data is a value such that K percent of the observations are less than or equal to the value represented by the percentile, like class rank but the percentil starts from low to high, so 5th percentile is 5% of population has this or less and so forth, 95th is the top 95% of the data, and 95% of individuals and this number or less

If the data set is skewed (left or right), and/or there are outliers, then

the best measure of the center: median the best measure of the dispersion is IQR/2= (Q3-Q1)/2

The mean measures..

the center of distribution

Whenever a data value is less than the mean, ______.

the corresponding z-score is negative

Whenever a data value is less than the​ mean, _______.

the corresponding z-score is negative.

For data sets having a distribution that is approximately​ bell-shaped,_________ states that about 68% of all data values fall within one standard deviation from the mean

the empirical Rule

The Empirical Rule

the empirical rule can be used to determine the percentage of data that lie within k standard deviations of the mean. To help organize the empirical rule and make the analysis​ easier, draw a​ bell-shaped curve, as shown to the right. The line in the center of the curve represents the mean. The other lines are each​ 1, 2, and 3 standard deviations away from the mean.

the higher the standard deviation

the more spaced out and dispersed the bell shape.

What does the z-score number represent?

the number of standard deviations from the mean. Aka standardized scores.

In the binomial probability formula, the variable x represents the ___.

the number of successes

P (A or B) indicates ____.

the probability that in a single trial, event A occurs, event B occurs, or they both occur.

s

the sample variance symbol is

What is the square of the standard deviation called?

the variance. (s2)

What is the purpose of z-scores?

to describe the exact location of each score in a distribution; -always refers to population (must use a different formula for samples).

Standard deviation allows you

to see how spread out or concentrated the data in a bell curve is, should be able to pic which graphs go with which µ and "x-bar" and σ

The bars in a histogram ___.

touch (without gaps)

A data value is considered ___ if its z-score is less than -2 or greater than 2.

unusual

A data value is considered ______ if its z-score is less than -2 or greater than 2.

unusual

A data value is considered _________ if the z-score is less than -2 or greater than 2.

unusual

A data value is considered​ _______ if its​ z-score is less than −2 or greater than 2.

unusual

Inferential statistics

uses methods that generalize results obtained from a sample to the population and measure the reliability of the results

Median is, symbol is

value that lies in the middle of the data when arranged in ascending order. M is the symble

Mode

variable that is most the most freequent observation, N or n's can be no mode, single mode, bimodal or multimodal

The square of the standard deviation is called the _______.

variance

the square of a standard deviation is called the

variance

The square of the st. dev. is called the ___.

variance.

when to use mode for best measure of central tendency

when data is nominal or ordinal

how to tell which histogram has the highest standard deviation

which ever graph is more spread out

How do you calculate Mean from a frequency distribution?

x̄ = Σ (f * x) / Σf

What is the formula to find a weighted mean?

x̄ = Σ(w*x) / Σw

What is the formula to find the mean of a set of sample values?

x̄ = Σx / n

sample z-score

z = (x - x̄) / s

population z-score

z = (x - µ) / σ

AND

REFERS TO MULTIPLICATION. PROBABILITY OF EVENTS( A AND B) FOR INDEPENDENT EVENTS P(A AND B) = P(A)*P(B)

For a distribution that is skewed​ right, the median is of the box.

left to the center

The standard deviation is used in conduction with the ______ to numerically describe distributions that are bell shaped

mean

What measure of central tendency best describes the​ "center" of the​ distribution when the graph is symmetrical

mean

Population arithmetic mean, and it's symbol

mean computed by using all individuals in a population, symbol is "mew"

Sample arithmetic mean

mean using sample data, symbol is "x-bar"

The standard deviation is used in conjunction with the​ ______ to numerically describe distributions that are bell shaped. The​ ______ measures the center of the​ distribution, while the standard deviation measures the​ ______ of the distribution.

mean, mean, spread

which measures of central tendencies are not resistent

mean, range and standard deviation

A concrete mix is designed to withstand 3000 pounds per square inch​ (psi) of pressure. The following data represent the strength of nine randomly selected casts​ (in psi). 3970​, 4100​, 3200​, 3100​, 2950​, 3840​, 4100​, 4030​, 3650 Compute the​ mean, median and mode strength of the concrete​ (in psi).

mean: 3660 median: 3840 mode: 4100

An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were ​$437​, ​$411​, ​$487​, and ​$248 . Compute the​ mean, median, and mode cost of repair.

mean: 395.75 median: 424

A value at the center or middle of a data set is a(n)

measure of center

A value at the center or middle of a data set is a(n) _____.

measure of center

quantitative data

measures how much. such as weights of high school students. ARE DOT-PLOTS, HISTOGRAMS, AND STEM PLOTS.

What measure of central tendency best describes the​ "center" of the​ distribution when the graph is skewed

median

Which measures of central tendencies are resistant

median and mode

inter-quartile range contains the

middle 50% of all observatoins

The measure of center that is the value that occurs with the greatest frequency is the

mode

The measure of center that is the value that occurs with the greatest frequency is the ____.

mode

The measure of center that is the value that occurs with the greatest frequency is the _____.

mode

OUTLIER

n modified​ boxplots, a data value is​ a(n) if it is above Q3plus ​(1.5)(IQR) or below Q1minus ​(1.5)(IQR)

Below are 36 sorted ages of an acting award winner. Find Upper P using the method presented in the textbook. 30 18,18,19,21,22,25,26,26,29,31,32,34,37,41,42,42,43,45,47,49,51,5,51,52,55,58,58,59,62,63,64,65,67,74,74,76

next compute L=(k Over 100)times n where n is the total number of values in the data set and k is the percentile being used. n=36 k= 30 30/100*36 10.8 L=11 p 30=32

The four levels of measurement that are commonly used for classifying data are ratio, _________, ________, and _________. interval, normal, ordinary nominal, ordinal, interval nominal, ordinal, categorical normal, ordinal, interval

nominal, ordinal, interval

A(n) ____ distribution has a "bell" shape.

normal

Arithmetic mean

of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations

A data value is considered ___ if its z-score is greater than or equal to -2, or less than or equal to 2.

ordinary

Raw score

original, unchanged scores that are the direct result of measurement. A test score that has not been transformed or converted in any way.

In a scatter-plot, a(n) _________ is a point lying far away from the other data points.

outlier

In modified boxplots, a data value is a(n)_______ if it is above Q3_(1.5)(IQR) or below Q1-(1.5)(IQR).

outlier

In modified​ boxplots, a data value is​ a(n) _______ if it is above Q+​(1.5)(IQR) or below Q−​(1.5)(IQR). 3 1

outlier

Population mean is a

parameter

Below are 36 sorted ages of an acting award winner. Find the percentile corresponding to age 59 using the method presented in the textbook. 16,17,17,21,22,27,30,33,37,37,40,42,43,48,54,56,57,59,59,60,60,62,62,64,65,65,68,70,70,72,72,73,74,77,78,80

percentile of value = number of values less than x Over total number of values times 100 For this problem x=59. How many values are less than 59​? 17 What is the total number of​ values? 36 59=17/36 x100

N means

population

"Mu" [µ] means

population mean

σ

population standard deviation

σ2

population variance

A ___________ is the complete collection of all measurements or data collected, whereas, a __________ is a subcollection of members selected from the complete collection. population; sample sample; population sample; census population; parameter

population; sample

Find the population mean or sample mean as indicated. ​Sample: 22​, 18​, 6​, 13​, 6

13

A management survey for a company surveyed 235 employees. 44.7% of the employees surveyed were females. The number of males would be: 130 105 13 Unable to determine

130

Correlation does not imply: Linearity Bias Causation Significance

Causation

Determine whether the given value is from a discrete or continuous set " The total number of phone calls a sales representative makes in a month is 425."

Discrete

Determine whether the value is from a discrete or continuous data: Number of cars owned is 7

Discrete

What type of data values are quantitative and the number of values is finite or countable? Interval Discrete Categorical Continuous

Discrete

Explain the meaning of the accompanying percentiles. ​(a) The 5th percentile of the head circumference of males 3 to 5 months of age in a certain city is 41.5 cm. ​(b) The 90th percentile of the waist circumference of females 2 years of age in a certain city is 49.8 cm. ​(c) Anthropometry involves the measurement of the human body. One goal of these measurements is to assess how body measurements may be changing over time. The following table represents the standing height of males aged 20 years or older for various age groups in a certain city in 2015. Based on the percentile measurements of the different age​ groups, what might you​ conclude?

(a)5​% of​ 3- to​ 5-month-old males have a head circumference that is 41.5 cm or less (b)90​% of​ 2-year-old females have a waist circumference that is 49.8 cm or less (c)At each​ percentile, the heights generally decrease as the age increases. Assuming that an adult male does not grow after age​ 20, the percentiles imply that adults born in 1990 are generally taller than adults at the same age who were born in 1950.

Varience is

(standard deviation)^2

3. When you need to find the P that is *greater* than a positive Z or a negative Z you will go to the:

*tail column*. Easy way to remember is it's the only one that doesn't include the mean.

IQR (Interquartile Range)

-MEASURE OF DISPERSION (VARIABILITY) Remember, is data is symmetric: best measure of central tendency is the mean, while the best measure of dispersion is standard deviation. AND IQR (Q3-Q1) HOWEVER, if data is skewed or if it contains, best measure of central tendency is the median, and the best measure of dispersion is the IQR -DEFINITION: the range of the middle 50% of the observations in a data set. ===IQR=Q3-Q1 But if the data set is skewed and or has outliers:THE BEST MEASURE OF DISPERSION: IQR/2 = (Q3-Q1)/2

Identify the level of measurement of the data, and explain what's wrong with the calculation: In a survey, the respondents are identified as 100 for "yes", 200 for "no", 300 for "maybe", and 400 for anything else. The average is calculated for 652 respondents and the result is 256.1

-The data are at the nominal level of measurement -Such data are not counts or measures of anything, so it makes no sense to compute their average

Five-Number Summary

-five numbers used to summarize the data set 1.SDV-MINIMUN=xmin 2.Lower quartile=QL=Q1=P25 3.MIddle quartile =Median= M =Q2=P50 4.Upper quartile=QU=Q3=P75 5.LDV=MAXIMUM=xmax

Check recording 12 minutes for a step by step process on how to approach a problem!!!!

...

So, on the test he will ask to find the five number summary: in the following order: xmin, QL,M,QU,xmax

...

Z score rules

...

z-scores

... Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation

The sum of the deviations about the mean always equals

0 because observations greater than the mean will offset the observations less than the mean and cancel out to zero or close to zero

Finding quartiles

1) Arrange the data in ascending order 2) Determine the median, M, or second quartile, Q2. 3) Determine the first and third quartiles, Q1 and Q3, by dividing the data set into two halves; the bottom half will be the observations below (to the left of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half.

Steps for determining a box plot

1) Determine the lower and upper fence Lower fence = Q1 - 1.5 (IQR) Upper fence = Q3 +1.5 (IQR) 2) Draw vertical lines at the Q1, M, and Q3. Enclose these lines in a box. 3) Label lower and upper fence 4) Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest value that is smaller than the upper fence. 5) Any data values that are outliers (less than the lower fence and greater than the upper fence) get marked with an asterisk (*)

relationship between median, mean, and distribution shape... 1) Skewed left 2) Symmetric 3) Skewed right

1) mean < median 2) mean = median 3) mean > median

Name procedures you could follow to obtain a simple random sample of 5 students?

1)List each name on a separate piece of paper; place them all in a hat and pick five 2) Number the names from 1 to 427 and use a random number table to produce 5 different three digit numbers corresponding to the names selected

The computed mean and the actual mean are considered close if the difference is less than ____of the actual mean. Otherwise the means are said to be __________ different.

1. 5% 2. substantially.

How to Calculate Quartiles

1. Arrange data in ascending order 2. Determine Median (M)=Q2 3. Divide data set into halves: the observations below M and the observations above M The first quartile (Q1) is the median of the bottom half, and the third quartiles (Q3) is the median of the top half

TV viewing example: Compute Quartiles

1. Data in ascending order 2. Find quartiles a. Median=Q2 n=20 data values, so M=middle two data values/2 SO, Q2=M=30.5 b. Bottom half (n=10) so the median of that half=Q1 M=middle two data values/2 SO, Q1=23 c. Upper half (n=10) so the median of that half=Q3 M=middle two data values/2 SO, Q3=36.5

How to check outliers with Quartiles Rule

1. Determine Lower (Q1=QL) and Upper Quartiles (Q3=QU) 2. Compute IQR 3. Determine Lower and Upper Fences a. LIF=QL -1.5(IQR) b. UIF=QU +1.5(1QR) c. LOF=QL -3(IQR) d. UOF=QU+3(IQR)

How to draw a B&W Plot

1. Determine the five-number summary (xmin,QL,M,QU,xmax) 2. Determine the outliers using the quartiles method 3. Determine the adjacent values S=smallest data value that is larger than LIF L=largest data value that is smaller than UIF S= will be less than QL L= will be larger than QU 4. Draw a horizontal number line and mark : QL,M,QU,S, and L 5. Draw vertical lines at QL, M, QU, and enclose these lines in a box 6. Connect Ql to the S and QU to the L with whiskers 7. Plot Outliers: MO with * and EO with o If data set does not have outliers (simple b&w plot): S=xmin (smallest data value) L=xmax (largest data value)

How do the five numbers describe data set:

1. Median describes middle of data set 2. Info about the spread: Having the IQR because you have Q3 AND Q1, you can get measure of dispersion(variation), by dividing IQR BY 2 3.xmin and xmax will give you info about the distribution, about whether or not you have outliers.

5 Number summary

1. Minimum 2. First​ quartile, Q1 3. Second​ quartile, Q2​ (same as the​ median) 4. Third​ quartile, Q3 5. Maximum

Standardizing a distribution has two steps:

1. Original raw scores transformed to z-scores. 2. The z-scores are transformed to new X values so that the specific mew or mean & sigma/standard deviation are attained.

What are three important properties of the Mean?

1. Samples means drawn fromt he same population tend to vary less than other measures of center. 2. The mean of a data set uses every data value. 3. A disadvantage of the mean is that just on outlier can change the value of the mean substantially.

3 Properties of Standard Scores

1. The mean of a set of z-scores is always 0. 2. The standard distribution of a set of standardized scores is always 1. 3. The distribution of a set of standardized scores has the same shape as the original scores, the scaling is just different.

What are two important properties of x̃?

1. The median does not change by large amounts when we include just a few outliers. 2. The median does not use every data value.

A professor has recorded exam grades for 10 students in his​ class, but one of the grades is no longer readable. If the mean score on the exam was 82 and the mean of the 9 readable scores is 86​, what is the value of the unreadable​ score?

10 X 82 = 820 - 9 X 86= 774. 820 - 774 = 46 A= 46

If the standard deviation of a variable is 10​, what is the​ variance?

100

Number of notes in a song...

Discrete b/c its countable

MEASURE OF CENTER

A value at the center or middle of a data set is​ a(n) _________.

Measure of center

A value at the center or middle of a data set is​ a(n) _______

Define measure of center.

A value at the center or middle of a data set.

SUBSET

ALL THE NUMBER OF ONE SET BELONG TO ANOTHER.

POPULATION

ANY NUMBER FROM A PARAMETER IS A

How do you find the midrange?

Add the Max and min data value and then divide the sum by 2.

Which of the following is NOT a principle of probability? a. All events are equally likely in any probability procedure. b. The probability of any event is between 0 and 1 inclusive. c. The probability of an impossible event is 0. d. The probability of an event that is certain to occur is 1.

All events are equally likely in any probability procedure.

What can be said about a set of data with a standard deviation of​ 0?

All the observations are the same value.

Which word is associated with multiplication when computing probabilities?

And

A mutual fund rating agency ranks a​ fund's performance by using one to five stars. A​ one-star mutual fund is in the bottom​ 20% of its investment​ class; a​ five-star mutual fund is in the top​ 20% of its investment class. Interpret the meaning of a​ four-star mutual fund.

A​ four-star fund is in the 4th quintile of the funds. That​ is, it is above the bottom​ 60%, but below the top​ 20% of the ranked funds.

INTERSECTION

BOTH NUMBER HAVE IN COMMON IS _____

Why, in a frequency distribution, do we use the class midpoint when calculating mean?

Because we don't know the the exact values that fall into a particular class. So we just pretend that all values are equal to the class midpoint.

Which of the accompanying boxplots likely has the data with the larger standard​ deviation? Why?

Boxplot II likely has the data with the larger standard deviation because the boxplot appears to have a greater​ spread, which likely results in a larger standard deviation.

which car would a costumer buy based on standard deviation, range, mean, median

Car 2, because it has a lower sample standard deviation, hence more predictable gas mileage

Qualitive

Categorical data

Which of the following is NOT a procedure for determining whether it is reasonable to assume that sample data are from a normally distributed population? a. Visual inspection of a Histogram to determine if its roughly "bell shaped" b. Constructing a probability plot (QQ) c. Identifying the outliers. d. Checking that the probability of an event is 0.05 or less.

Checking that the probability of an event is 0.05 or less.

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "An education researcher randomly selects 48 middle schools and interviews all the teachers at each school."

Cluster

To determine customer opinion of their musical variety, Sony random selected 110 concerts during a certain week and surveys all concert goers. What type of sampling is this?

Cluster

Standardized Distribution

Composed of scores that have been transformed to create predetermined values for mean standard deviation. They are used to make dissimilar distributions comparable.

Determine whether the given value is from a discrete or continuous data set. The time it takes a computer to complete a task. Continuous Discrete

Continuous

Determine whether the given value is from a discrete or continuous set "The height of 2-year-old maple tree is 28.3 ft."

Continuous

Height of a child...

Continuous because it is not countable

Volume of water in a swimming pool..

Continuous because it is not countable

Identify which type of sampling is used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day Systematic Stratified Convenience Cluster Simple Random

Convenience

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A researcher interviews 19 work colleagues who work in his building."

Convnience

Which of the following is NOT one of three common errors involving correlation? - Correlation does not imply causality. -The conclusion that correction implies causality. -The use of data based on averages. -Mistaking no linear correlation with no correlation

Correlation does not imply causality

Discrete

Countable number

Identify the type of observational study used: A town obtains current employment data by polling 10,000 of its citizens this month. Prospective Retrospective Cross-sectional None of these

Cross-sectional

variance

DEALS WITH STANDARD DEVIATION.

The probability of event B​ occurring, given that event A has already occurred.

DESCRIBE WHAT THE P(B/A) MEAN.

Denotes the Median.

What does w denote?

Denotes weights, which are assigned to different data values.

Parameter

Describes characteristics of a population

Statistic

Describes characteristics of a sample

z-score

Describes the exact location of a score in a distribution relative to the mean. Aka Standard Score; how many standard deviations you are away from the norm. Used to make different distributions, or metric scales, comparable.

Suppose every student in a class is surveyed and it is reported that​ 75% of the class plans to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Descriptive​ statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.

Determine whether the given value is a discrete or continuous variable: People are asked to state how many times in the last month they visited their family doctor Continuous Discrete

Discrete

Quartiles (most common percentiles) --> resistant to extreme values

Divide data sets into fourths, or four equal parts. The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile divides the bottom 50% of the data from the top 50%, so the second quartile is equivalent to the 50th percentile, which is equivalent to the median. Finally the third percentile divides the bottom 75% of the data from the top 25%, so that the third quartile is equivalent to the 75th percentile.

What must be true for a sample to be considered a simple random​ sample?

Every possible sample of that size must have the same chance of being selected.

What does it mean if a statistic is resistant?

Extreme values (very large or small) relative to the data do NOT affect its value substantially

outliers

Extreme values that don't appear to belong with the rest of the data.

What does it mean if a statistic is​ resistant?

Extreme values​ (very large or​ small) relative to the data do not affect its value substantially

Identify the given statement as either true or false. The standard deviation is a resistant measure of spread.

False

True or​ False: A data set will always have exactly one mode.

Fasle -The mode of a variable is the most frequent observation of the variable that occurs in the data set. To compute the​ mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no​ mode, one​ mode, or more than one mode. If no observation occurs more than​ once, the data have no mode.

EMPIRICAL RULE

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean

The empirical rule

For data sets having a distribution that is approximately​ bell-shaped, _______ states that about​ 68% of all data values fall within one standard deviation from the mean.

1. When you need to find a proportion between a negative (-) & positive (+) z-score:

Go to *mean-to-z column* for each Z.; Find proportions and add together.

The U.S. Department of Housing and Urban Development​ (HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the​ median?

HUD uses the median because the data are skewed right

VENN DIAGRAM

INTERSECTION, UNION, COMPLIMENT

In a typical​ boxplot, the length of the box indicates which measure of​ spread?

IQR

COMPLIMENT

IS ALL THE NUMBERS THAT DON'T BELONG TO THE SET.

UNION

IS THE NUMBER FROM BOTH AND THE NUMBER THEY HAVE IN COMMON.

SAMPLE SPACE

IS THE SET OF ALL THE POSSIBLE OUTCOMES.

PROBABILITY

IT IS A PREDICTION OF A CERTAIN OUTCOME

THEORETCAL

IT IS BASED ON A PREDICTABLE OUTCOME

Determining z-score

If a data value is larger than the mean, the z-score will be positive. (occurs for observations with a value greater than the mean) If a data value is smaller than the mean, the z-score will be negative (occurs for observations less than the mean) If the data value equals the mean, the z-score will be zero Z-scores measure the number of standard deviations an observation is above or below the mean. Ex. A z-score 1.24 is interpreted as "the data value is 1.24 standard deviation above the mean." or GREATER than the mean. Ex. A z-score .5 or 1/2 , the standard deviation is LESS than the mean Ex. A z-score of 0 indicates that the value of observation is EQUAL to the mean

After constructing a relative frequency distribution summarizing IQ scores of college​ students, what should be the sum of the relative​ frequencies?

If percentages are​ used, the sum should be​ 100%. If proportions are​ used, the sum should be 1

Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables? -Any outliers must be removed if they are known to be errors. -If r>1, then there is a positive linear correlation. -The sample of paired data is sample random sample of quantitative data. -A scatter-plot should be visually show a straight-line pattern.

If r>1, then there is a positive linear correlation

Which of the following is always true? -For skewed data, the mode is farther out in the longer tail than the median. -The mean and median should be used to identify the shape of the distribution. -Data skewed to the right have a longer left tail than right tail. -In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

In a symmetric and bell-shaped distribution, the mean, median, and mode are the same

Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. Data skewed to the right have a longer left tail than right tail. c. The mean and median should be used to identify the shape of the distribution. d. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.

Which of the following is always​ true?

In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same

Suppose every student in a class is surveyed and it is found that​ 75% of the class plans to take another math class. It is reported that​ 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Inferential​ statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.

Determine which of the 4 levels of measure is the most appropriate: Years of elections: 1988, 1990, 1992, 1994, and 1996

Interval

What is a lurking variable?

Is an explanatory variable that was not considered in the study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study

Define the Mode.

Is the value that occurs with the greatest frequency.

x

Is the variable usually used to represent the individual data values.

population z-score

M = Mean O = Standard Deviation

Continuous

Many possible values

Which of the following is NOT a value in the 5-number summary? -Median -Mean -Minimum -Q1

Mean

Which of the following is NOT a value in the​ 5-number summary?

Mean

Which of the following is NOT needed to construct a​ boxplot?

Mean

What is the formula to calculate mean?

Mean = Σx / n

What are the measures of center?

Mean, medium, mode and midrange.

An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were ​$433​, ​$440, ​$495​, and ​$207 . Compute the​ mean, median, and mode cost of repair.

Mean: $393.75 Median:$436.5 Mode: None

Population of country of origin is qualitative or quantitative?

Quantitative because it is a numerical measure

_______ divide data sets in fourths.

Quartiles

The following data represent the weights​ (in grams) of a simple random sample of a candy. 0.90 0.87 0.83 0.92 0.90 0.86 0.86 0.87 0.81 0.84 Determine the shape of the distribution of weights of the candies by drawing a frequency histogram and computing the mean and the median. Which measure of central tendency best describes the weight of the​ candy?

Mean: 0.866 Median: 0.865 Which tendency described the weight of the candy better? A: Mean

There are many potential pitfalls that can cause problems when analyzing data. Which of these choices are not classified as a potential pitfall? Order of survey questions Nonresponse Self-reported data Measured data

Measured data

What is an observational study?

Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables

descriptive

Methods used that summarize or describe characteristics of data are called​ _______ statistics

DESCRIPTIVE STATISTICS

Methods used that summarize or describe characteristics of data are called​?

Are any of the measures of dispersion among the​ range, the​ variance, and the standard​ deviation, resistant? Explain.

No, all of these measures of dispersion are affected by extreme values.

Is this a property of the standard deviation? When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.

No, it is not a good practice to compare the two sample standard dev. in samples with very different means.

Is it OK to say "average" instead of mean?

No.

A psychology student wishes to investigate differences in political opinions between business majors and political science majors at her college. She randomly selects 100 students from the 260 business majors and 100 students from the 180 political science majors. Does this sampling plan result in a random sample? Simple random sample? Explain.

No; no. The sample is not random because political science majors have a greater chance of being selected than business majors. It is not a simple random sample because some samples are not possible, such as a sample consisting of 50 business majors and 150 political science majors.

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Favorite films

Nominal

Quantitative

Numerical data

ADDITION

OR REFERS TO ______ RULE.

Determine whether the given description corresponds to an experiment or an observational study: A stock analyst selects a stock from a group of twenty for investment by choosing the stock with the greatest earnings per share reported for the last quarter.

Observational study

Determine which of the four levels of measurement is most appropriate: Students' grades, A, B, or C, on a test. Interval Nominal Ordinal Ratio

Ordinal

____ are sample values that lie very far away from the majority of the other sample values.

Outliers

Determine whether the given value is a statistic or a parameter. "After inspecting all 45,000kg of meat stores at the Wurst Sausage Company, it was found that 20,000kg of the meat was spoiled."

Parameter

Determine whether the given value is a statistic or a parameter. "After taking the first exam, 15 of the students dropped the class."

Parameter

Determine whether the given value is a statistic or a parameter. Thirty percent of all dog owners poop scoop after their dog. Statistic Parameter

Parameter

Determine whether the given value is a statistic or a parameter: In a study of all 3153 seniors at a college, it is found that 50% own a computer

Parameter because the value is a numerical measurement describing a characteristic of a population

When we used the z-score method, we found that 77 was the only outlier, and it was an extreme one. But, what if we use the quartiles method?

Q1=23 Q2= Q3=56.5 IQR=13.5 LIF=2.75 UIF=56.75 LOF=17.5 UOF=77 OBS. BETWEEN LIF AND UIF=USUAL OBS. BETWEEN LOF AND LIF=MO OBS. BETEEN UIF AND UOF=MO OBS. BEFORE LOF AND THOSE AFTER UOF=EO 77 is at the border between MO AND EO. We can consider it a mild or extreme outlier. This examples shows that z-score method is better than quartiles method because it is even more specific. Meanwhile, quartiles method gives you the chance to designate it one of the other.

Inter-quartile range

Q3 minus Q1

Favorite rock group is qualitative or quantitative?

Qualitative because it is an attribute classification

Determine whether the data are qualitative or quantitative. "the number of seats in a movie theater"

Quantitative

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. In a poll conducted by a certain research​ center, 718718 adults were called after their telephone numbers were randomly generated by a​ computer, and 89 %89% were able to correctly identify the attorney general.attorney general.

Random sampling

Which measure of variation is very sensitive to extreme values?

Range

which measure of variation is very sensitive to extreme values?

Range

μ

Represent the mean of all values in a population.

z-score (often called the standardized value)

Represents the distance that a data value is from the mean in terms of the number of standard deviations. (It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation) The z-score is unitless. It has a mean 0 and standard deviation 1. The z-score is often called the standardized value.

Represents the mean of a set of sample values.

N

Represents the number of data values in a population.

Researchers collect data by interviewing athletes who have won Olympic gold medals from 1992 to 2016. Identify the type of study. Retrospective Cross-sectional Prospective None of these

Retrospective

Distribution Shape and Boxplot

Right Skewed: If the median is to the left of the center of the box, the right whisker is longer than the left one Symmetric: If the median is at or near the center of the box, the whiskers are of equal lengths Left Skewed: If the median is to the right of the center of the box, the left whisker is longer than the right one.

Rounding rule:

Round z-scores to 2 decimal places

P(A) + P(mean of A) = 1 is one way to express the ____.

Rule of complementary events.

The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.

S= RANGE/4

EVENT

SUBSET OF SAMPLE SPACE.

n

Sample size

Identify which type of sampling is used: The name of each contestant is written on a separate card, the cards are placed in a bag, and three names are picked from the bag Simple Random Cluster Convenience Stratified Systematic

Simple Random

When is B&W Plot simple or extended?

Simple: Data set does not contain outliers Extended: Data set contains outliers (MO or EO)

The​ x-values in the table to the right are the nicotine amounts​ (in mg) in different 100 mm​ filtered, non-"light" menthol cigarettes. The​ y-values are the nicotine amounts​ (in mg) in different​ king-size nonfiltered,​ nonmenthol, and​ non-"light" cigarettes. xx 1.11.1 0.80.8 0.90.9 1.01.0 1.11.1 yy 1.11.1 1.31.3 1.21.2 1.11.1 1.61.6 minus− minus− minus− minus− minus− minus− minus− If suitable methods of statistics are​ used, it can be concluded that the average​ (mean) nicotine amount of the 100 mm​ filtered, non-"light" menthol cigarettes is less than the average​ (mean) nicotine amount of the​ king-size nonfiltered,​ nonmenthol, and​ non-"light" cigarettes. Can it be concluded that the first type of cigarette is​ safe? Why or why​ not?

Since the first type of cigarette contains less nicotine than the second type of​ cigarette, the first type is safer.​ However, it cannot be concluded that it is safe.

Standard deviation measures the _____ of the distribution

Spread

Determining outliers

Standardized values (z-scores) can be used to identify outliers. It is recommended to treat any data value with a z-score less than -3 or greater than +3 as an outlier. Such data values can then be reviewed for accuracy and to determine whether they belong in the data set.

Determine whether the given value is a statistic or a parameter. "A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is "

Statistic

Determine whether the given value is a statistic or a parameter. "A sample of 120 employees of a company is selected, and the average age is found to be 37 years"

Statistic

Finding Quartiles

Step 1 Arrange the data in ascending order. Step 2 Determine the median, M, or second quartile, Q2 . Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q1 , is the median of the bottom half, and the third quartile, Q3 , is the median of the top half.

Checking for Outliers by Using Quartiles

Step 1 Determine the first and third quartiles of the data. Step 2 Compute the interquartile range. Step 3 Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence = Q1 - 1.5(IQR) Upper Fence = Q3 + 1.5(IQR) Step 4 If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "49,34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496,348, and 481 students respectively"

Stratified

To determine her air quality, Carrie divides up her day into three parts, morning, afternoon, and evening. She then measures her air quality at 4 randomly selected times during each part of the day. What type of sampling is this?

Stratified

What is meant by confounding?

Study occurs when the effects of TWO or MORE explanatory variable are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study

Σ

Sum of all data values

A tax auditor selects every 1000th income tax return that is received. Identify which of these types of sampling is used Stratified Systematic Simple Random Cluster Convenience

Systematic

Identify the type of sampling​ used: random,​ systematic, convenience,​ stratified, or cluster. To estimate the percentage of defects in a recent manufacturing​ batch, a quality control manager at ToshibaToshiba selects every 2020th laptoplaptop that comes off the assembly line starting with the secondsecond until she obtains a sample of 100100 laptopslaptops.

Systematic

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A sample consists of every 49th student from a group of 496 students."

Systematic

Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A tax auditor selects every 1000th income tax return that is received."

Systematic

u

THE SYMBOL FOR THE POPULATION IS

DISJOINT

THEY HAVE NOTHING IN COMMON. WHEN IT STATES (A OR B) P(A OR B) = P(A)+P(B)

5. When you need to find the z-score that forms the boundary between 2 areas under the bell curve i.e. between top 20% & bottom 80% use:

The *Tail column* & find the proportion closest to the percentage e.g. the proportion closest to .2000; the z-score in that row is the z-score that forms that boundary.

Interquartile range

The ... IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula

kth percentile

The ... denoted, Pk , of a set of data is a value such that k percent of the observations are less than or equal to the value.

Q1 Q2 Q3

The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile. The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median. The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.

S=RANGE/4

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​

S=Range/4

The Range Rule of Thumb roughly estimates the standard deviation of a data set as​ _______

What measure of variation is very sensitive to extreme values?

The Range.

Suppose babies born after a gestation period of 32 to 35 weeks have a mean weight of 2500 grams and a standard deviation of 600 grams while babies born after a gestation period of 40 weeks have a mean weight of 2900 grams and a standard deviation of 390 grams. If a 35​-week gestation period baby weighs 2750 grams and a 41​-week gestation period baby weighs 3150 ​grams, find the corresponding​ z-scores. Which baby weighs more relative to the gestation​ period?

The baby born in week 41 weighs relatively more since its​ z-score, .64 . 64​, is larger than the​ z-score of .42 . 42 for the baby born in week 35.

Whenever a data value is less than the mean,_____.

The corresponding z-score is negative.

State whether the data described below are discrete or continuous and explain why: The exact ages in hours of different cockroaches found in a certain city

The data are continuous because the data can take any value in an interval

State whether the data described below are discrete or continuous, and explain why: The temperatures (in degrees Fahrenheit) of pizzas fresh the from oven

The data are continuous because the data can take any value in any interval

State whether the data described below are discrete or continuous: The number of programs installed on various computers

The data are discrete because the data can only take in specific value

Determine whether the data described below are qualitative or quantitative and explain why: The types of climates for different regions (tropical, arid, temperate, etc.)

The data are qualitative because they don't measure or count anything

A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over​ $100,000 per year. If the faculty want to give the community the impression that they deserve higher​ salaries, should they advertise the mean or median of their current​ salaries?

The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.

Determine whether the given value is a statistic or a parameter: A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 131.6 volts

The given value is a parameter for the month because the data collected represented a population

Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile​ range?

The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.

Determine which of the 4 levels of measurement is the most appropriate for the data below: Years in which a war was started

The interval level of measurement is the most appropriate because the data can be ordered, difference is no natural starting point

Which of the following is NOT a property of the linear correlation coefficient r? -The value of r is always between -1 and 1 inclusive. -The value of r is not affected by the choice of x or y. -The value of r measure the strength of a linear relationship. -The linear correlation r is robust. This is, a single outlier will not affect the value of r.

The linear correlation coefficient is robust. That is, a single outlier will not affect the value of r.

Which of the following is NOT a characteristic of the mean? -The mean is relatively reliable. -The mean is called the average by statisticians. -The mean is sensitive to outliers. -The mean takes every data value into account.

The mean is called average by statisticians.

Which of the following is NOT a characteristic of the mean?

The mean is called the average by statisticians.

If each monthly cell phone bill in the country were​ doubled, how would the mean of the cell phone bills be​ affected?

The mean of the cell phone bills would double.

A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean of the median? Why?

The mean will be likely larger BECAUSE the extreme values in the right tail tend to pull up the mean in the direction of the tail

What is mean of a set of data?

The measure of center found by adding the data values and dividing the total by the number of data values.

What is the Median?

The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

What is the Midrange of a data set?

The measure of center that is the value midway between the max and min values in the original data set.

How can you tell from a boxplot if the distribution is​ symmetric?

The median is in the center of the​ box, and the left and right whiskers are approximately the same length.

Which measure of center​ (mean or​ median) is​ resistant? Explain what it means for that measure to be resistant.

The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was​ doubled, for​ example, the median would not change since that largest value does not factor into its computation.

The median

The median of a variable is the value that lies in the middle of the data when arranged in ascending order.We use M to represent the median

6. When you need to compute a raw score, that represents the minimum or maximum score needed to answer a question, look for the percentage in the question e.g. "What raw scores form the boundaries of the middle 60% of the distribution:

The middle 60% straddles the mean & can be divided into 2 = percentages; 30% & 30%. You look for the value closest to .3000 in the *mean to z column* & locate the z-score in that row. Then you use that z-score in the formula we use to compute raw score: X=mew + z sigma

A highly selective boarding school will only admit students who place at least 2 standard deviations above the mean on a standardized test that has a mean of 200 and a standard deviation of 24. What is the minimum score that an applicant must make on the test to be​ accepted?

The minimum score that an applicant must make on the test to be accepted is 248

mode

The mode of a variable is the most frequent observation of the variable that occurs in the data set. *if no observation occurs more than twice then there is NO MODE

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Social security numbers

The nominal level of measurement is most appropriate because the data cannot be ordered.ordered.

Standardized Score

The number of standard deviations that a piece of data lies above or below the mean. Z = (X - μ) / σ

What does P(B|A) represent?

The probability of event B occurring after it is assumed that event A has already occurred.

Which is relatively​ better: a score of 58 on a psychology test or a score of 49 on an economics​ test? Scores on the psychology test have a mean of 8585 and a standard deviation of 10. Scores on the economics test have a mean of 58 and a standard deviation of 3.

The psychology test score is relatively better because its z score is greater than the z score for the economics test score.

What makes the range less desirable than the standard deviation as a measure of​ dispersion?

The range does not use all the observations.

Interquartile range (IQR)

The range of the middle 50% of the observations in a data set. The difference between the upper quartile and the lower quartile. IQR = Q3 - Q1 Interpretation of the interquartile range is similar to that of the range and standard deviation. That is, the more spread a set of data has, the higher the interquartile range will be.

Determine which of the levels of measurement is most appropriate for the data below: Brain volumes measured in cubic cm

The ratio level of measurement is the most appropriate because the data can be ordered, differences can be found and are meaningful, and there is a natural starting point

Determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate for the data below. Length of the side of a square in cm

The ratioratio level of measurement is most appropriate because the data can becan be ordered commaordered, differences left parenthesis obtained by subtraction right parenthesisdifferences (obtained by subtraction) can becan be foundfound and areand are meaningful commameaningful, and thereand there is ais a naturalnatural startingstarting zerozero point.point.

VARIANCE

The square of the standard deviation is called the​ __

The standard deviation is used in conjunction with the _____ to numerically describe distributions that are bell shaped. The ____ measures the center of the​ distribution, while the standard deviation measures the ____ of the distribution.

The standard deviation is used in conjunction with the MEAN to numerically describe distributions that are bell shaped. The MEAN measures the center of the​ distribution, while the standard deviation measures the SPREAD of the distribution.

Determine whether the description below corresponds to an observational study or an experiment. In a studystudy sponsored by a​ company, 11 comma 07911,079 people were asked what contributes most to their anxiety commaanxiety, and 37 %37% of the respondents said that it was their health.health.

The study is an observational study because the survey subjects were not given any treatment.

What does Σx represent?

The sum of all data values. (All frequencies added together)

Which of the following is not a requirement of the binomial probability distribution? a. Each trial must have all outcomes classified into two categories b. The trials must be dependent. c. The procedure has a fixed number of trails. d. The probability of a success remains the same in all trails.

The trails must be dependent (For a binomial distribution, the trials must be independent.)

properties of standard deviation

The units of the standard deviation are the same as the units of the original data, the standard deviation is a measure of variation of all data values from the mean, the value of the standard deviation is never negative

If your score on your next statistics test is converted to a z​ score, which of these z scores would you​ prefer: −​2.00, −​1.00, ​0, 1.00,​ 2.00? Why?

The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.

unitless

The z-score is ... It has mean 0 and standard deviation 1.

If​ someone's gross annual income has a​ z-score of positive​ 2, what can be​ concluded?

Their income is 2 standard deviations above the mean income.

Which of the following is NOT true about statistical graph. a. Similar graphs can be constructed in order to compare data sets. b. They utilize areas or volumes for data that are one-dimensional in nature. c. They can be used to consider the overall shape of the distribution. d. They can be used to identify extreme data values.

They utilize areas or volumes for data that are one-dimensional in nature. (Utilizing 2-or 3- dimensional pictures to represent 1- dimensional data is poor practice and distorts the data.

Quartiles

This divides data sets into fourths, or four equal parts.

A company was conducting a survey to investigate​ people's spending habits and how they may have changed in recent years. One question on the survey​ was, "Did you spend​ more/less/the same amount of money this year as you did in​ 2007, the year the recession began in earnest in this​ country?" Is this question​ biased? If​ so, what answer does it​ favor?

This question is biased toward​ "spend less," since it mentions the recent recession. Many people would feel that they should answer that they spent​ less, since the country is in a recession.

Which of the following is NOT a property of the standard deviation? a. When comparing variation in samples with very different means, it is good practice to compare the two standard deviation. b. The value of the standard deviation is never negative c. The st. dev. is a measure of variation of all data values from the mean. d. The units of the st. dev. are the same as the units of the original data.

When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.

With a height of 70 ​in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1 in and a standard deviation of 2.4 in. a. What is the positive difference between Roger​'s height and the​ mean? b. How many standard deviations is that​ [the difference found in part​ (a)]? c. Convert Roger​'s height to a z score. d. If we consider​ "usual" heights to be those that convert to z scores between −2 and​ 2, is Roger​'s height usual or​ unusual?

To find the positive difference between Roger​'s height and the​ mean, subtract the mean from Roger​'s height and find the absolute value of the difference. 70 cm - 75.1 cm =5.1 in b. To determine how many standard deviations the difference​ is, compare the​ difference, 5.1, to the standard​ deviation, 2.4 5.1 Over 2.4 ≈2.13 standard deviations c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. Sample Population z= x- x overbar Over s or z=x- μ over σ The club is a population.​ Therefore, to convert Roger​'s height to a z​ score 70-75.1 divide 2.4 = -2.13

True or​ False: Chebyshev's inequality applies to all distributions regardless of​ shape, but the empirical rule holds only for distributions that are bell shaped

True, Chebyshev's inequality is less precise than the empirical​ rule, but will work for any​ distribution, while the empirical rule only works for​ bell-shaped distributions

True or​ False: When comparing two​ populations, the larger the standard​ deviation, the more dispersion the distribution​ has, provided that the variable of interest from the two populations has the same unit of measure.

True, because the standard deviation describes how​ far, on​ average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical​ value, and​ therefore, more dispersed.

Σ is called and means

Uppercase sigma, and means the "sum of terms [xi]"

When making predictions based on regression lines, which of the following is not listed as a consideration? -Use the regression equation for predictions only if the graph of regression line on the scatter-plot confirms that the regression line fits the point reasonably well. -Use the regression equation for prediction only if the linear correlation coefficient r indicates that there is a linear correlation between two variables. -Use the regression line for prediction only if the data go far beyond the scope of the available sample data. -If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate.

Use the regression line for prediction only if the data go far beyond the scope of the available sample data

Which characteristic of data is a measure of the amount that the data values vary?

Variation

COMPLEMENT RULE

WHEN EVENTS DON'T OCCUR USE P(A) = 1-P(A)

CONTINUOUS DATA

WOULD BE ON A THERMOMETER.

Which of the following statements about correlation is true? -We say that there is a positive correlation between x and y if there x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if there is no distinct pattern in the scatter-plot. -We say that there is a negative correlation between x and y if the x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values decrease.

We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.

[St.Dev. 1]+[St.Dv. 2] = [34.7 + 13.5] = 48.2% probability

What is the probability that a randomly selected time falls between 40 and 42 seconds?

Z-SCORE

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a

Z-score

When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the​ mean, we call the new value a​ _______.

What is a designed experiment?

When a researcher assigns individuals to a certain group intentionally changing the value of an explanatory variable, and then recording the value of the response for each group

Which of the following is NOT a property of the standard deviation? -The value of the standard deviation is never negative -The standard deviation is a measure of variation of all data values from the mean. -When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations. -The units of the standard deviation are the same as the unites of the original data.

When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations.

If the data set is symmetric or approximately symmetric, and no outliers, then

best measure of center: mean best measure of dispersion: standard deviation

The ________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.

linear correlation coefficient r

A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be​ larger, the mean or the​ median? Why?

When data are either skewed left or skewed​ right, there are extreme values in the​ tail, which tend to pull the mean in the direction of the tail. If the distribution of the data is skewed​ right, there are large observations in the right tail. These observations tend to increase the value of the​ mean, while having little effect on the median.

A negative​ z-score indicates a data value is less than the mean.

Whenever a data value is less than the​ mean,

RANGE

Which measure of variation is very sensitive to extreme​ values?

Range

Which measure of variation is very sensitive to extreme​ values?

The mean is called the average by statisticians

Which of the following is NOT a characteristic of the​ mean?

When comparing variation in samples with very different​ means, it is good practice to compare the two sample standard deviations.

Which of the following is NOT a property of the standard​ deviation?

MEAN

Which of the following is NOT a value in the​ 5-number summary?

B. In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same.

Which of the following is always​ true?. A. For skewed​ data, the mode is farther out in the longer tail than the median. B. In a symmetric and​ bell-shaped distribution, the​ mean, median, and mode are the same. C. The mean and median should be used to identify the shape of the distribution. D. Data skewed to the right have a longer left tail than right tail.

What is the difference between a random sample and a simple random​ sample?

With a random​ sample, each individual has the same chance of being selected. With a simple random​ sample, all samples of the same size have the same chance of being selected.

What is the formula to determine the x-value from z-score?

X = mew + z times sigma (X = u + zo). (Mean plus (2 multiplied by standard deviation)

In a symmetric and bell-shaped distribution, are the mean, median and mode the same?

Yes.

An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects ten schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? Simple random sample? Explain

Yes; no. The sample is random because all teachers have the same chance of being selected. It is not a simple random sample because some samples are not possible, such as a sample that includes teachers from schools that were not selected.

Suppose a student earns a 75 on his statistics​ exam, and his grade has a​ z-score of 1.5. Since the class did not perform well on the​ exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the​ student's z-score?

Your​ z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.

99.7% within 3 Standard deviation

[99.7-95= 4.7/2 = 2.35% >> [2.35% |..|.. () ..|..|2.35%]

Z-scores are turned into

a standard score. The purpose of z-scores is to identify and describe the exact location of each score in a distribution & to standardize an entire distribution to understand & compare scores from different tests.

Arithmetic mean

adding all values of variables and dividing by number of variables

xi means

all x values

To describe the exact position of a score within a distribution, z-score must transform each x-value into a signed number; positive or negative.

all z-scores above the mean are positive and all z-scores below the mean are negative. The number tells the distance between the score and the mean in terms of the number of standard deviations.

Variables

are the characteristics of the individuals within the population

In a probability histogram, there is a correspondence between ___.

area and probability.

Why is range not a good measure?

because it doesn't give you how wide the data is talking about but not weather it's scrunched or dispersed or how many n or N is

The U.S. Department of Housing and Urban Development(HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?

because the data are skewed right

A ________ is the collection of data from every member of the population. sample census placebo statistic

census

Which of the following is NOT a measure of center?

census

Which of the following is NOT a measure of center? -census -mean -median -mode

census

____ is the difference btw two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.

class width

sample artithmetic mean, x, (pronounced x bar)

computed using sample data, sample is a statistic

A ___ probability of an event is a probability obtained with knowledge that some other event has already occurred.

conditional (knowledge)

Descriptive statistics

consists of organizing and summarizing information

2. When you need to find a proportion between 2 positive OR 2 negative z-scores, you:

consult the *mean to z column* for both. Find proportions & subtract the smaller from the larger.

A ___ random variable has infinitely many values associated with measurements.

continuous

A _______ exists between two variables when the values of one variable are somehow associated with the values of the other variable.

correlation

kth percentile

denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value. Percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. Ex. P1 divides the bottom 1% of the observations from the top 99%, P2 divides the bottom 2% of the observations from the top 98% and so on.

Methods used that summarize or describe characteristics of data are called ___ statistics.

descriptive

Methods used that summarize or describe characteristics of data are called _______ statistics.

descriptive

Methods used that summarize or describe characteristics of data are called______ statistics.

descriptive

A _________ experiment allows the researcher to claim causation between an explanatory variable and a response variable

designed

Range is the

difference between the largest data value and the smallest

A ___ random variable has either a finite or a countable number of values.

discrete

Events that are ____ cannot occur at the same time.

disjoint (Disjoint events are mutually exclusive and cannot occur at the same time.)

If every x value is transformed into a z-score, then the distribution of z-scores will have what following properties regarding shape, mean, and standard deviation?

distribution of z-scores will have exactly the same shape as original distribution of scores; z-score mean will always have mean of 0 & z-scores will always have standard deviation of 1.

Find the sample variance and standard deviation. 23​, 11​, 5​, 9​, 10

do on calc

Response bias

exist when the answers on a survey do not reflect the true feelings of the respondent

Nonresponse bias

exists when individuals selected to be in the sample who do not respond to the surgery have different opinions from those who do

The ___ of a discrete random variable represents the mean value of the outcomes.

expected value

In a television​ advertisement, a company called​ "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this​ claim, an independent​ company, called​ "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one​ month, each participant was weighed again. The percent of weight lost was recorded for each​ person, where negative values indicated a weight gain. What type of study was​ performed?

experiment

numerical summary of data is said to be resistant if...

extreme values (very large or small) relative to the data do not affect its value substantially

How to find mean in odd N or n

find the middle value

The heights of the bars of a histogram correspond to ___ values.

frequency

A ____ indicates the shape and nature of the distribution of a data set.

frequency distribution

Box-and-Whiskers Plot

graph representing information about the five-number summary and outliers for a given data set

Two events A and B are ___ if the occurrence of one does not affect the probability of the occurrence of the other.

independent

Biased samples

internet polls, in which people online can decide whether to respond mail-in poll, in which subjects can decide whether to reply telephone call in polls, in which newspaper, radio, or television announcements ask that you voluntarily call a special number to register your opinion

A parameter

is a numerical summary of a population

A statistic

is a numerical summary of a sample

Population arithmetic mean, μ(pronounced "mew")

is computed using all the individuals in a population.The population mean is a parameter

Cluster sample

is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups

Stratified sample

is obtained by dividing the population into homogeneous groups and random selecting individuals from each group

Resistant means

is the measure of central tendencies resitant to extreme values, does it alter the data significantly

percentile

provided information about how the data are spread over the interval from the smallest value to the largest value. (Recall the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile)

When finding the mean of a set of data you should always do what first

put data in order!!!! median will be skewed otherwise

Mode is primarily a measure of

qualitative central tendency

mode can be used for both

quantitative and qualitative

A ___ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.

random

What measure of variation is very sensitive to extreme values?

range

A ___ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

relative frequency

In a ___ distribution, the frequency of a class is replaced with a proportion or percent.

relative frequency

In a​ boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left​ whisker, the distribution is skewed_______

right

the 68-95-99.7% rule applies for

roughly all bell-shaped curves

The symbol for sample standard deviation is

s

What is the symbol for sample standard deviation?

s

The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.

s = range / 4

The Range Rule of Thumb roughly estimates the st. dev. of a data set as ___.

s = range/4

the symbol sample variance is

s^2

n means

sample

"x-bar" means

sample mean

The ___ for a procedure consists of all possible simple events or all outcomes that cannot be broken down further.

sample space

s

sample standard deviation

What is s2 the symbol for?

sample variance

When determining whether there is a correlation between two variables, one should be a ______ to explore the data visually.

scatter-plot

A ___ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.

scatterplot

Deviation score

score minus the mean = how much the score deviates from the mean.

A histogram aids in analyzing the ___ of the data.

shape of the distribution

the symbol for population standard deviation is

skyrimy o thing

the symbol of population variance is

skyrimy o thing^2

z-score transformation

statistical technique that uses the mean and standard deviation to transform each raw score into a standard score

LOOK AT REAL LINE WITH LOF, LIF, UIF, UOF NUMBERS, AND OUTLIERS IN SLIDE 87!!!

study for midterm

Class width is found by ___.

subtracting a lower class limit from the next consecutive lower class limit

x with line above= weird Epison thing with x n

sum of all data values number of data values

(Σxi)/N means

sum of all x values / N - population

How to find mean in even N or n

take the mean of the middle 2 values

The larger the standard deviation means...

that observations are more distant from the typical value, and therefore more dispersed

Sampling bias means

that the technique used to obtain the sample's individuals tend to favor one part of the population over another

4. When you need to find the P for an area *greater than* a negative Z or *Less than* a positive Z use:

the *Body column*. Because the body column includes the mean & the tail.

For data sets having a distribution that approximately bell-shaped, ______ states that about 68% of all data values fall within one standard deviation from the mean.

the Empirical Rule


Kaugnay na mga set ng pag-aaral

APHG Tests (Afternoon) NOW ITS DONE

View Set

Intermediate Accounting I Unit Three WGU D103

View Set

Adaptive Quizzes- Chapter 51 Diabetes

View Set