First 20
A manufacturer of bolts has a quality-control policy that requires it to destroy any bolts that are more than 2 standard deviations from the mean. The quality-control engineer knows that the bolts coming off the assembly line have mean length of 8 cm with a standard deviation of 0.05 cm. For what lengths will a bolt be destroyed?
A bolt will be destroyed if the length is less than 7.9 7.9 cm or greater than 8.1 8.1 cm.
UNUSUAL
A data value is considered __________ if its z-score is less than minus 2 or greater than 2
The complement of "at least one" is ___.
"none."
"At least one" is equivalent to ____.
"one or more"
Scores of an IQ test have a bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the empirical rule to determine the following. (a) What percentage of people has an IQ score between 85 and 115? (b) What percentage of people has an IQ score less than 55 or greater than 145? (c) What percentage of people has an IQ score greater than 130?
(a) 68% (b) .30% (c) 2.5%
Determine whether the given value is from a discrete or continuous data set. When a car is randomly selected, it is found to have an engine with 6 cylinders an engine with 6 cylinders.
A discrete data set because there are a finite number of possible values.there are a finite number of possible values.
What is an ogive?
A graph that represents the cumulative frequency or cumulative relative frequency for the class
What is a value at the center or middle of a data set?
A measure of center.
A particular country has 60 total states. If the areas of all 60 states area added and then the sum is divided by 60, the result 193,950 square kilometers. Determine whether this result is a statistic or a parameter
The result is a parameter because it describes some characteristics of a population
Determine whether the sample described below is a simple random sample. In the last yearyear, 123 comma 423123,423 adults got marriedgot married in a county. A researcher plans to conduct a survey of 800800 of those newlyweds.newlyweds. After obtaining a list of those who got married commagot married, he numbers the list from 1 to 123 comma 423 comma123,423, and then he uses a computer to randomly generate 800800 numbers between 1 and 123 comma 423.123,423. His sample consists of the newlywedsnewlyweds corresponding to the selected numbers.
The sample is a simple random sample because every sample of size 800800 has the same chance of being selected.
Determine whether the sample described below is a simple random sample. In order to test for a difference in the way that workersworkers and non dash workersnon-workers purchase magazines commamagazines, a research institution polls exactly 638638 adult workersworkers and 638638 adult non dash workersnon-workers randomly selected from adults in the United States.
The sample is not a simple random sample because every sample of size 12761276 does not have the same chance of being selected.
Determine whether the sample described below is a simple random sample. A quality control engineer selects every 5000 thevery 5000th hairdryerhairdryer that isis produced.
The sample is not a simple random sample because every sample of the same size does not have the same chance of being selected.
What does n denote?
The sample size, which is the number of of data values.
Statistics
The science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer question. In addition, statistics is about providing a measure of confidence in any conclusions
What is the formula for the z-score?
z = x value - mean or mew/ divided by standard deviation or sigma. The numerator X - mew is a *deviation score*. The denominator expresses deviation in standard deviation units.
The _______ represents the number of standard deviations an observation is from the mean.
z-score
When a data is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a ______.
z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a_____.
z-score
When a data value is converted to a standardized scale representing the number of st. dev. the data value lies from the mean, we call the new value a __.
z-score.
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
z-score.
The sum of the deviations about the mean always equals
zero
Σxi
{sum of}{all x values}
Find the population mean or sample mean as indicated. Population: 2, 1, 11, 15, 6
µ= 7
What is the symbol used to represent the population mean?
μ
What is the formula to find the mean of all values in a population?
μ = Σx / N
What is the symbol for population standard deviation?
σ
What is the symbol for population variance?
σ2
Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each day, which measure of central tendency better describes the typical number of text messages per day?
Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.
In the data table below, the x-values are the weights (in pounds) of cars and the y-values are the corresponding highway fuel consumption amounts (in mi/gal). Weight (lb) 40884088 33583358 41334133 36503650 35453545 Highway Fuel Consumption (mi/gal) 2626 3131 2929 2929 3030 Comment on the source of the data if you are told that car manufacturers supplied the values. Is there an incentive for car manufacturers to report values that are not accurate?
Yes, because consumers, in general, would prefer to buy a car with a higher level of fuel efficiency. In this case, the source of the data would be suspect with a potential for bias.
On a test, 74% of the questions are answered correctly. If 111 questions are correct, how many questions are on the test? 37 questions 67 questions 150 questions 82 questions
150 questions
95% of values in a normal distribution fall within
2 Standard deviations [95-68= 27/2 = 13.5] >> (13.5% | 34% () 34%|13.5%)
A company advertises a mean lifespan of 1000 hours for a particular type of light bulb. If you were in charge of quality control at the factory, would you prefer that the standard deviation of the lifespans for the light bulbs be 5 hours or 50 hours? Why?
5 hours would be preferable since a smaller standard deviation indicates more consistency.
which percent of observations are expected to lie within 1 standard deviation of the mean?
68% >> (34% () 34%)
Emperical rule is known as
68%-95%-99.7% rule
For data sets having a distribution that is approx. bell-shaped, ___________ states that about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule
Percentile
the Kth percentile, denoted Pk of a set of data is a value such that K percent of the observations are less than or equal to the value represented by the percentile, like class rank but the percentil starts from low to high, so 5th percentile is 5% of population has this or less and so forth, 95th is the top 95% of the data, and 95% of individuals and this number or less
If the data set is skewed (left or right), and/or there are outliers, then
the best measure of the center: median the best measure of the dispersion is IQR/2= (Q3-Q1)/2
The mean measures..
the center of distribution
Whenever a data value is less than the mean, ______.
the corresponding z-score is negative
Whenever a data value is less than the mean, _______.
the corresponding z-score is negative.
For data sets having a distribution that is approximately bell-shaped,_________ states that about 68% of all data values fall within one standard deviation from the mean
the empirical Rule
The Empirical Rule
the empirical rule can be used to determine the percentage of data that lie within k standard deviations of the mean. To help organize the empirical rule and make the analysis easier, draw a bell-shaped curve, as shown to the right. The line in the center of the curve represents the mean. The other lines are each 1, 2, and 3 standard deviations away from the mean.
the higher the standard deviation
the more spaced out and dispersed the bell shape.
What does the z-score number represent?
the number of standard deviations from the mean. Aka standardized scores.
In the binomial probability formula, the variable x represents the ___.
the number of successes
P (A or B) indicates ____.
the probability that in a single trial, event A occurs, event B occurs, or they both occur.
s
the sample variance symbol is
What is the square of the standard deviation called?
the variance. (s2)
What is the purpose of z-scores?
to describe the exact location of each score in a distribution; -always refers to population (must use a different formula for samples).
Standard deviation allows you
to see how spread out or concentrated the data in a bell curve is, should be able to pic which graphs go with which µ and "x-bar" and σ
The bars in a histogram ___.
touch (without gaps)
A data value is considered ___ if its z-score is less than -2 or greater than 2.
unusual
A data value is considered ______ if its z-score is less than -2 or greater than 2.
unusual
A data value is considered _________ if the z-score is less than -2 or greater than 2.
unusual
A data value is considered _______ if its z-score is less than −2 or greater than 2.
unusual
Inferential statistics
uses methods that generalize results obtained from a sample to the population and measure the reliability of the results
Median is, symbol is
value that lies in the middle of the data when arranged in ascending order. M is the symble
Mode
variable that is most the most freequent observation, N or n's can be no mode, single mode, bimodal or multimodal
The square of the standard deviation is called the _______.
variance
the square of a standard deviation is called the
variance
The square of the st. dev. is called the ___.
variance.
when to use mode for best measure of central tendency
when data is nominal or ordinal
how to tell which histogram has the highest standard deviation
which ever graph is more spread out
How do you calculate Mean from a frequency distribution?
x̄ = Σ (f * x) / Σf
What is the formula to find a weighted mean?
x̄ = Σ(w*x) / Σw
What is the formula to find the mean of a set of sample values?
x̄ = Σx / n
sample z-score
z = (x - x̄) / s
population z-score
z = (x - µ) / σ
AND
REFERS TO MULTIPLICATION. PROBABILITY OF EVENTS( A AND B) FOR INDEPENDENT EVENTS P(A AND B) = P(A)*P(B)
For a distribution that is skewed right, the median is of the box.
left to the center
The standard deviation is used in conduction with the ______ to numerically describe distributions that are bell shaped
mean
What measure of central tendency best describes the "center" of the distribution when the graph is symmetrical
mean
Population arithmetic mean, and it's symbol
mean computed by using all individuals in a population, symbol is "mew"
Sample arithmetic mean
mean using sample data, symbol is "x-bar"
The standard deviation is used in conjunction with the ______ to numerically describe distributions that are bell shaped. The ______ measures the center of the distribution, while the standard deviation measures the ______ of the distribution.
mean, mean, spread
which measures of central tendencies are not resistent
mean, range and standard deviation
A concrete mix is designed to withstand 3000 pounds per square inch (psi) of pressure. The following data represent the strength of nine randomly selected casts (in psi). 3970, 4100, 3200, 3100, 2950, 3840, 4100, 4030, 3650 Compute the mean, median and mode strength of the concrete (in psi).
mean: 3660 median: 3840 mode: 4100
An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were $437, $411, $487, and $248 . Compute the mean, median, and mode cost of repair.
mean: 395.75 median: 424
A value at the center or middle of a data set is a(n)
measure of center
A value at the center or middle of a data set is a(n) _____.
measure of center
quantitative data
measures how much. such as weights of high school students. ARE DOT-PLOTS, HISTOGRAMS, AND STEM PLOTS.
What measure of central tendency best describes the "center" of the distribution when the graph is skewed
median
Which measures of central tendencies are resistant
median and mode
inter-quartile range contains the
middle 50% of all observatoins
The measure of center that is the value that occurs with the greatest frequency is the
mode
The measure of center that is the value that occurs with the greatest frequency is the ____.
mode
The measure of center that is the value that occurs with the greatest frequency is the _____.
mode
OUTLIER
n modified boxplots, a data value is a(n) if it is above Q3plus (1.5)(IQR) or below Q1minus (1.5)(IQR)
Below are 36 sorted ages of an acting award winner. Find Upper P using the method presented in the textbook. 30 18,18,19,21,22,25,26,26,29,31,32,34,37,41,42,42,43,45,47,49,51,5,51,52,55,58,58,59,62,63,64,65,67,74,74,76
next compute L=(k Over 100)times n where n is the total number of values in the data set and k is the percentile being used. n=36 k= 30 30/100*36 10.8 L=11 p 30=32
The four levels of measurement that are commonly used for classifying data are ratio, _________, ________, and _________. interval, normal, ordinary nominal, ordinal, interval nominal, ordinal, categorical normal, ordinal, interval
nominal, ordinal, interval
A(n) ____ distribution has a "bell" shape.
normal
Arithmetic mean
of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations
A data value is considered ___ if its z-score is greater than or equal to -2, or less than or equal to 2.
ordinary
Raw score
original, unchanged scores that are the direct result of measurement. A test score that has not been transformed or converted in any way.
In a scatter-plot, a(n) _________ is a point lying far away from the other data points.
outlier
In modified boxplots, a data value is a(n)_______ if it is above Q3_(1.5)(IQR) or below Q1-(1.5)(IQR).
outlier
In modified boxplots, a data value is a(n) _______ if it is above Q+(1.5)(IQR) or below Q−(1.5)(IQR). 3 1
outlier
Population mean is a
parameter
Below are 36 sorted ages of an acting award winner. Find the percentile corresponding to age 59 using the method presented in the textbook. 16,17,17,21,22,27,30,33,37,37,40,42,43,48,54,56,57,59,59,60,60,62,62,64,65,65,68,70,70,72,72,73,74,77,78,80
percentile of value = number of values less than x Over total number of values times 100 For this problem x=59. How many values are less than 59? 17 What is the total number of values? 36 59=17/36 x100
N means
population
"Mu" [µ] means
population mean
σ
population standard deviation
σ2
population variance
A ___________ is the complete collection of all measurements or data collected, whereas, a __________ is a subcollection of members selected from the complete collection. population; sample sample; population sample; census population; parameter
population; sample
Find the population mean or sample mean as indicated. Sample: 22, 18, 6, 13, 6
13
A management survey for a company surveyed 235 employees. 44.7% of the employees surveyed were females. The number of males would be: 130 105 13 Unable to determine
130
Correlation does not imply: Linearity Bias Causation Significance
Causation
Determine whether the given value is from a discrete or continuous set " The total number of phone calls a sales representative makes in a month is 425."
Discrete
Determine whether the value is from a discrete or continuous data: Number of cars owned is 7
Discrete
What type of data values are quantitative and the number of values is finite or countable? Interval Discrete Categorical Continuous
Discrete
Explain the meaning of the accompanying percentiles. (a) The 5th percentile of the head circumference of males 3 to 5 months of age in a certain city is 41.5 cm. (b) The 90th percentile of the waist circumference of females 2 years of age in a certain city is 49.8 cm. (c) Anthropometry involves the measurement of the human body. One goal of these measurements is to assess how body measurements may be changing over time. The following table represents the standing height of males aged 20 years or older for various age groups in a certain city in 2015. Based on the percentile measurements of the different age groups, what might you conclude?
(a)5% of 3- to 5-month-old males have a head circumference that is 41.5 cm or less (b)90% of 2-year-old females have a waist circumference that is 49.8 cm or less (c)At each percentile, the heights generally decrease as the age increases. Assuming that an adult male does not grow after age 20, the percentiles imply that adults born in 1990 are generally taller than adults at the same age who were born in 1950.
Varience is
(standard deviation)^2
3. When you need to find the P that is *greater* than a positive Z or a negative Z you will go to the:
*tail column*. Easy way to remember is it's the only one that doesn't include the mean.
IQR (Interquartile Range)
-MEASURE OF DISPERSION (VARIABILITY) Remember, is data is symmetric: best measure of central tendency is the mean, while the best measure of dispersion is standard deviation. AND IQR (Q3-Q1) HOWEVER, if data is skewed or if it contains, best measure of central tendency is the median, and the best measure of dispersion is the IQR -DEFINITION: the range of the middle 50% of the observations in a data set. ===IQR=Q3-Q1 But if the data set is skewed and or has outliers:THE BEST MEASURE OF DISPERSION: IQR/2 = (Q3-Q1)/2
Identify the level of measurement of the data, and explain what's wrong with the calculation: In a survey, the respondents are identified as 100 for "yes", 200 for "no", 300 for "maybe", and 400 for anything else. The average is calculated for 652 respondents and the result is 256.1
-The data are at the nominal level of measurement -Such data are not counts or measures of anything, so it makes no sense to compute their average
Five-Number Summary
-five numbers used to summarize the data set 1.SDV-MINIMUN=xmin 2.Lower quartile=QL=Q1=P25 3.MIddle quartile =Median= M =Q2=P50 4.Upper quartile=QU=Q3=P75 5.LDV=MAXIMUM=xmax
Check recording 12 minutes for a step by step process on how to approach a problem!!!!
...
So, on the test he will ask to find the five number summary: in the following order: xmin, QL,M,QU,xmax
...
Z score rules
...
z-scores
... Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation
The sum of the deviations about the mean always equals
0 because observations greater than the mean will offset the observations less than the mean and cancel out to zero or close to zero
Finding quartiles
1) Arrange the data in ascending order 2) Determine the median, M, or second quartile, Q2. 3) Determine the first and third quartiles, Q1 and Q3, by dividing the data set into two halves; the bottom half will be the observations below (to the left of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half.
Steps for determining a box plot
1) Determine the lower and upper fence Lower fence = Q1 - 1.5 (IQR) Upper fence = Q3 +1.5 (IQR) 2) Draw vertical lines at the Q1, M, and Q3. Enclose these lines in a box. 3) Label lower and upper fence 4) Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest value that is smaller than the upper fence. 5) Any data values that are outliers (less than the lower fence and greater than the upper fence) get marked with an asterisk (*)
relationship between median, mean, and distribution shape... 1) Skewed left 2) Symmetric 3) Skewed right
1) mean < median 2) mean = median 3) mean > median
Name procedures you could follow to obtain a simple random sample of 5 students?
1)List each name on a separate piece of paper; place them all in a hat and pick five 2) Number the names from 1 to 427 and use a random number table to produce 5 different three digit numbers corresponding to the names selected
The computed mean and the actual mean are considered close if the difference is less than ____of the actual mean. Otherwise the means are said to be __________ different.
1. 5% 2. substantially.
How to Calculate Quartiles
1. Arrange data in ascending order 2. Determine Median (M)=Q2 3. Divide data set into halves: the observations below M and the observations above M The first quartile (Q1) is the median of the bottom half, and the third quartiles (Q3) is the median of the top half
TV viewing example: Compute Quartiles
1. Data in ascending order 2. Find quartiles a. Median=Q2 n=20 data values, so M=middle two data values/2 SO, Q2=M=30.5 b. Bottom half (n=10) so the median of that half=Q1 M=middle two data values/2 SO, Q1=23 c. Upper half (n=10) so the median of that half=Q3 M=middle two data values/2 SO, Q3=36.5
How to check outliers with Quartiles Rule
1. Determine Lower (Q1=QL) and Upper Quartiles (Q3=QU) 2. Compute IQR 3. Determine Lower and Upper Fences a. LIF=QL -1.5(IQR) b. UIF=QU +1.5(1QR) c. LOF=QL -3(IQR) d. UOF=QU+3(IQR)
How to draw a B&W Plot
1. Determine the five-number summary (xmin,QL,M,QU,xmax) 2. Determine the outliers using the quartiles method 3. Determine the adjacent values S=smallest data value that is larger than LIF L=largest data value that is smaller than UIF S= will be less than QL L= will be larger than QU 4. Draw a horizontal number line and mark : QL,M,QU,S, and L 5. Draw vertical lines at QL, M, QU, and enclose these lines in a box 6. Connect Ql to the S and QU to the L with whiskers 7. Plot Outliers: MO with * and EO with o If data set does not have outliers (simple b&w plot): S=xmin (smallest data value) L=xmax (largest data value)
How do the five numbers describe data set:
1. Median describes middle of data set 2. Info about the spread: Having the IQR because you have Q3 AND Q1, you can get measure of dispersion(variation), by dividing IQR BY 2 3.xmin and xmax will give you info about the distribution, about whether or not you have outliers.
5 Number summary
1. Minimum 2. First quartile, Q1 3. Second quartile, Q2 (same as the median) 4. Third quartile, Q3 5. Maximum
Standardizing a distribution has two steps:
1. Original raw scores transformed to z-scores. 2. The z-scores are transformed to new X values so that the specific mew or mean & sigma/standard deviation are attained.
What are three important properties of the Mean?
1. Samples means drawn fromt he same population tend to vary less than other measures of center. 2. The mean of a data set uses every data value. 3. A disadvantage of the mean is that just on outlier can change the value of the mean substantially.
3 Properties of Standard Scores
1. The mean of a set of z-scores is always 0. 2. The standard distribution of a set of standardized scores is always 1. 3. The distribution of a set of standardized scores has the same shape as the original scores, the scaling is just different.
What are two important properties of x̃?
1. The median does not change by large amounts when we include just a few outliers. 2. The median does not use every data value.
A professor has recorded exam grades for 10 students in his class, but one of the grades is no longer readable. If the mean score on the exam was 82 and the mean of the 9 readable scores is 86, what is the value of the unreadable score?
10 X 82 = 820 - 9 X 86= 774. 820 - 774 = 46 A= 46
If the standard deviation of a variable is 10, what is the variance?
100
Number of notes in a song...
Discrete b/c its countable
MEASURE OF CENTER
A value at the center or middle of a data set is a(n) _________.
Measure of center
A value at the center or middle of a data set is a(n) _______
Define measure of center.
A value at the center or middle of a data set.
SUBSET
ALL THE NUMBER OF ONE SET BELONG TO ANOTHER.
POPULATION
ANY NUMBER FROM A PARAMETER IS A
How do you find the midrange?
Add the Max and min data value and then divide the sum by 2.
Which of the following is NOT a principle of probability? a. All events are equally likely in any probability procedure. b. The probability of any event is between 0 and 1 inclusive. c. The probability of an impossible event is 0. d. The probability of an event that is certain to occur is 1.
All events are equally likely in any probability procedure.
What can be said about a set of data with a standard deviation of 0?
All the observations are the same value.
Which word is associated with multiplication when computing probabilities?
And
A mutual fund rating agency ranks a fund's performance by using one to five stars. A one-star mutual fund is in the bottom 20% of its investment class; a five-star mutual fund is in the top 20% of its investment class. Interpret the meaning of a four-star mutual fund.
A four-star fund is in the 4th quintile of the funds. That is, it is above the bottom 60%, but below the top 20% of the ranked funds.
INTERSECTION
BOTH NUMBER HAVE IN COMMON IS _____
Why, in a frequency distribution, do we use the class midpoint when calculating mean?
Because we don't know the the exact values that fall into a particular class. So we just pretend that all values are equal to the class midpoint.
Which of the accompanying boxplots likely has the data with the larger standard deviation? Why?
Boxplot II likely has the data with the larger standard deviation because the boxplot appears to have a greater spread, which likely results in a larger standard deviation.
which car would a costumer buy based on standard deviation, range, mean, median
Car 2, because it has a lower sample standard deviation, hence more predictable gas mileage
Qualitive
Categorical data
Which of the following is NOT a procedure for determining whether it is reasonable to assume that sample data are from a normally distributed population? a. Visual inspection of a Histogram to determine if its roughly "bell shaped" b. Constructing a probability plot (QQ) c. Identifying the outliers. d. Checking that the probability of an event is 0.05 or less.
Checking that the probability of an event is 0.05 or less.
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "An education researcher randomly selects 48 middle schools and interviews all the teachers at each school."
Cluster
To determine customer opinion of their musical variety, Sony random selected 110 concerts during a certain week and surveys all concert goers. What type of sampling is this?
Cluster
Standardized Distribution
Composed of scores that have been transformed to create predetermined values for mean standard deviation. They are used to make dissimilar distributions comparable.
Determine whether the given value is from a discrete or continuous data set. The time it takes a computer to complete a task. Continuous Discrete
Continuous
Determine whether the given value is from a discrete or continuous set "The height of 2-year-old maple tree is 28.3 ft."
Continuous
Height of a child...
Continuous because it is not countable
Volume of water in a swimming pool..
Continuous because it is not countable
Identify which type of sampling is used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day Systematic Stratified Convenience Cluster Simple Random
Convenience
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A researcher interviews 19 work colleagues who work in his building."
Convnience
Which of the following is NOT one of three common errors involving correlation? - Correlation does not imply causality. -The conclusion that correction implies causality. -The use of data based on averages. -Mistaking no linear correlation with no correlation
Correlation does not imply causality
Discrete
Countable number
Identify the type of observational study used: A town obtains current employment data by polling 10,000 of its citizens this month. Prospective Retrospective Cross-sectional None of these
Cross-sectional
variance
DEALS WITH STANDARD DEVIATION.
The probability of event B occurring, given that event A has already occurred.
DESCRIBE WHAT THE P(B/A) MEAN.
x̃
Denotes the Median.
What does w denote?
Denotes weights, which are assigned to different data values.
Parameter
Describes characteristics of a population
Statistic
Describes characteristics of a sample
z-score
Describes the exact location of a score in a distribution relative to the mean. Aka Standard Score; how many standard deviations you are away from the norm. Used to make different distributions, or metric scales, comparable.
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.
Determine whether the given value is a discrete or continuous variable: People are asked to state how many times in the last month they visited their family doctor Continuous Discrete
Discrete
Quartiles (most common percentiles) --> resistant to extreme values
Divide data sets into fourths, or four equal parts. The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile divides the bottom 50% of the data from the top 50%, so the second quartile is equivalent to the 50th percentile, which is equivalent to the median. Finally the third percentile divides the bottom 75% of the data from the top 25%, so that the third quartile is equivalent to the 75th percentile.
What must be true for a sample to be considered a simple random sample?
Every possible sample of that size must have the same chance of being selected.
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do NOT affect its value substantially
outliers
Extreme values that don't appear to belong with the rest of the data.
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do not affect its value substantially
Identify the given statement as either true or false. The standard deviation is a resistant measure of spread.
False
True or False: A data set will always have exactly one mode.
Fasle -The mode of a variable is the most frequent observation of the variable that occurs in the data set. To compute the mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no mode, one mode, or more than one mode. If no observation occurs more than once, the data have no mode.
EMPIRICAL RULE
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean
The empirical rule
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
1. When you need to find a proportion between a negative (-) & positive (+) z-score:
Go to *mean-to-z column* for each Z.; Find proportions and add together.
The U.S. Department of Housing and Urban Development (HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?
HUD uses the median because the data are skewed right
VENN DIAGRAM
INTERSECTION, UNION, COMPLIMENT
In a typical boxplot, the length of the box indicates which measure of spread?
IQR
COMPLIMENT
IS ALL THE NUMBERS THAT DON'T BELONG TO THE SET.
UNION
IS THE NUMBER FROM BOTH AND THE NUMBER THEY HAVE IN COMMON.
SAMPLE SPACE
IS THE SET OF ALL THE POSSIBLE OUTCOMES.
PROBABILITY
IT IS A PREDICTION OF A CERTAIN OUTCOME
THEORETCAL
IT IS BASED ON A PREDICTABLE OUTCOME
Determining z-score
If a data value is larger than the mean, the z-score will be positive. (occurs for observations with a value greater than the mean) If a data value is smaller than the mean, the z-score will be negative (occurs for observations less than the mean) If the data value equals the mean, the z-score will be zero Z-scores measure the number of standard deviations an observation is above or below the mean. Ex. A z-score 1.24 is interpreted as "the data value is 1.24 standard deviation above the mean." or GREATER than the mean. Ex. A z-score .5 or 1/2 , the standard deviation is LESS than the mean Ex. A z-score of 0 indicates that the value of observation is EQUAL to the mean
After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?
If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1
Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables? -Any outliers must be removed if they are known to be errors. -If r>1, then there is a positive linear correlation. -The sample of paired data is sample random sample of quantitative data. -A scatter-plot should be visually show a straight-line pattern.
If r>1, then there is a positive linear correlation
Which of the following is always true? -For skewed data, the mode is farther out in the longer tail than the median. -The mean and median should be used to identify the shape of the distribution. -Data skewed to the right have a longer left tail than right tail. -In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same
Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. Data skewed to the right have a longer left tail than right tail. c. The mean and median should be used to identify the shape of the distribution. d. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
Which of the following is always true?
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same
Suppose every student in a class is surveyed and it is found that 75% of the class plans to take another math class. It is reported that 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Inferential statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.
Determine which of the 4 levels of measure is the most appropriate: Years of elections: 1988, 1990, 1992, 1994, and 1996
Interval
What is a lurking variable?
Is an explanatory variable that was not considered in the study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study
Define the Mode.
Is the value that occurs with the greatest frequency.
x
Is the variable usually used to represent the individual data values.
population z-score
M = Mean O = Standard Deviation
Continuous
Many possible values
Which of the following is NOT a value in the 5-number summary? -Median -Mean -Minimum -Q1
Mean
Which of the following is NOT a value in the 5-number summary?
Mean
Which of the following is NOT needed to construct a boxplot?
Mean
What is the formula to calculate mean?
Mean = Σx / n
What are the measures of center?
Mean, medium, mode and midrange.
An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were $433, $440, $495, and $207 . Compute the mean, median, and mode cost of repair.
Mean: $393.75 Median:$436.5 Mode: None
Population of country of origin is qualitative or quantitative?
Quantitative because it is a numerical measure
_______ divide data sets in fourths.
Quartiles
The following data represent the weights (in grams) of a simple random sample of a candy. 0.90 0.87 0.83 0.92 0.90 0.86 0.86 0.87 0.81 0.84 Determine the shape of the distribution of weights of the candies by drawing a frequency histogram and computing the mean and the median. Which measure of central tendency best describes the weight of the candy?
Mean: 0.866 Median: 0.865 Which tendency described the weight of the candy better? A: Mean
There are many potential pitfalls that can cause problems when analyzing data. Which of these choices are not classified as a potential pitfall? Order of survey questions Nonresponse Self-reported data Measured data
Measured data
What is an observational study?
Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables
descriptive
Methods used that summarize or describe characteristics of data are called _______ statistics
DESCRIPTIVE STATISTICS
Methods used that summarize or describe characteristics of data are called?
Are any of the measures of dispersion among the range, the variance, and the standard deviation, resistant? Explain.
No, all of these measures of dispersion are affected by extreme values.
Is this a property of the standard deviation? When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
No, it is not a good practice to compare the two sample standard dev. in samples with very different means.
Is it OK to say "average" instead of mean?
No.
A psychology student wishes to investigate differences in political opinions between business majors and political science majors at her college. She randomly selects 100 students from the 260 business majors and 100 students from the 180 political science majors. Does this sampling plan result in a random sample? Simple random sample? Explain.
No; no. The sample is not random because political science majors have a greater chance of being selected than business majors. It is not a simple random sample because some samples are not possible, such as a sample consisting of 50 business majors and 150 political science majors.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Favorite films
Nominal
Quantitative
Numerical data
ADDITION
OR REFERS TO ______ RULE.
Determine whether the given description corresponds to an experiment or an observational study: A stock analyst selects a stock from a group of twenty for investment by choosing the stock with the greatest earnings per share reported for the last quarter.
Observational study
Determine which of the four levels of measurement is most appropriate: Students' grades, A, B, or C, on a test. Interval Nominal Ordinal Ratio
Ordinal
____ are sample values that lie very far away from the majority of the other sample values.
Outliers
Determine whether the given value is a statistic or a parameter. "After inspecting all 45,000kg of meat stores at the Wurst Sausage Company, it was found that 20,000kg of the meat was spoiled."
Parameter
Determine whether the given value is a statistic or a parameter. "After taking the first exam, 15 of the students dropped the class."
Parameter
Determine whether the given value is a statistic or a parameter. Thirty percent of all dog owners poop scoop after their dog. Statistic Parameter
Parameter
Determine whether the given value is a statistic or a parameter: In a study of all 3153 seniors at a college, it is found that 50% own a computer
Parameter because the value is a numerical measurement describing a characteristic of a population
When we used the z-score method, we found that 77 was the only outlier, and it was an extreme one. But, what if we use the quartiles method?
Q1=23 Q2= Q3=56.5 IQR=13.5 LIF=2.75 UIF=56.75 LOF=17.5 UOF=77 OBS. BETWEEN LIF AND UIF=USUAL OBS. BETWEEN LOF AND LIF=MO OBS. BETEEN UIF AND UOF=MO OBS. BEFORE LOF AND THOSE AFTER UOF=EO 77 is at the border between MO AND EO. We can consider it a mild or extreme outlier. This examples shows that z-score method is better than quartiles method because it is even more specific. Meanwhile, quartiles method gives you the chance to designate it one of the other.
Inter-quartile range
Q3 minus Q1
Favorite rock group is qualitative or quantitative?
Qualitative because it is an attribute classification
Determine whether the data are qualitative or quantitative. "the number of seats in a movie theater"
Quantitative
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. In a poll conducted by a certain research center, 718718 adults were called after their telephone numbers were randomly generated by a computer, and 89 %89% were able to correctly identify the attorney general.attorney general.
Random sampling
Which measure of variation is very sensitive to extreme values?
Range
which measure of variation is very sensitive to extreme values?
Range
μ
Represent the mean of all values in a population.
z-score (often called the standardized value)
Represents the distance that a data value is from the mean in terms of the number of standard deviations. (It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation) The z-score is unitless. It has a mean 0 and standard deviation 1. The z-score is often called the standardized value.
x̄
Represents the mean of a set of sample values.
N
Represents the number of data values in a population.
Researchers collect data by interviewing athletes who have won Olympic gold medals from 1992 to 2016. Identify the type of study. Retrospective Cross-sectional Prospective None of these
Retrospective
Distribution Shape and Boxplot
Right Skewed: If the median is to the left of the center of the box, the right whisker is longer than the left one Symmetric: If the median is at or near the center of the box, the whiskers are of equal lengths Left Skewed: If the median is to the right of the center of the box, the left whisker is longer than the right one.
Rounding rule:
Round z-scores to 2 decimal places
P(A) + P(mean of A) = 1 is one way to express the ____.
Rule of complementary events.
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
S= RANGE/4
EVENT
SUBSET OF SAMPLE SPACE.
n
Sample size
Identify which type of sampling is used: The name of each contestant is written on a separate card, the cards are placed in a bag, and three names are picked from the bag Simple Random Cluster Convenience Stratified Systematic
Simple Random
When is B&W Plot simple or extended?
Simple: Data set does not contain outliers Extended: Data set contains outliers (MO or EO)
The x-values in the table to the right are the nicotine amounts (in mg) in different 100 mm filtered, non-"light" menthol cigarettes. The y-values are the nicotine amounts (in mg) in different king-size nonfiltered, nonmenthol, and non-"light" cigarettes. xx 1.11.1 0.80.8 0.90.9 1.01.0 1.11.1 yy 1.11.1 1.31.3 1.21.2 1.11.1 1.61.6 minus− minus− minus− minus− minus− minus− minus− If suitable methods of statistics are used, it can be concluded that the average (mean) nicotine amount of the 100 mm filtered, non-"light" menthol cigarettes is less than the average (mean) nicotine amount of the king-size nonfiltered, nonmenthol, and non-"light" cigarettes. Can it be concluded that the first type of cigarette is safe? Why or why not?
Since the first type of cigarette contains less nicotine than the second type of cigarette, the first type is safer. However, it cannot be concluded that it is safe.
Standard deviation measures the _____ of the distribution
Spread
Determining outliers
Standardized values (z-scores) can be used to identify outliers. It is recommended to treat any data value with a z-score less than -3 or greater than +3 as an outlier. Such data values can then be reviewed for accuracy and to determine whether they belong in the data set.
Determine whether the given value is a statistic or a parameter. "A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is "
Statistic
Determine whether the given value is a statistic or a parameter. "A sample of 120 employees of a company is selected, and the average age is found to be 37 years"
Statistic
Finding Quartiles
Step 1 Arrange the data in ascending order. Step 2 Determine the median, M, or second quartile, Q2 . Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q1 , is the median of the bottom half, and the third quartile, Q3 , is the median of the top half.
Checking for Outliers by Using Quartiles
Step 1 Determine the first and third quartiles of the data. Step 2 Compute the interquartile range. Step 3 Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence = Q1 - 1.5(IQR) Upper Fence = Q3 + 1.5(IQR) Step 4 If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "49,34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496,348, and 481 students respectively"
Stratified
To determine her air quality, Carrie divides up her day into three parts, morning, afternoon, and evening. She then measures her air quality at 4 randomly selected times during each part of the day. What type of sampling is this?
Stratified
What is meant by confounding?
Study occurs when the effects of TWO or MORE explanatory variable are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study
Σ
Sum of all data values
A tax auditor selects every 1000th income tax return that is received. Identify which of these types of sampling is used Stratified Systematic Simple Random Cluster Convenience
Systematic
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. To estimate the percentage of defects in a recent manufacturing batch, a quality control manager at ToshibaToshiba selects every 2020th laptoplaptop that comes off the assembly line starting with the secondsecond until she obtains a sample of 100100 laptopslaptops.
Systematic
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A sample consists of every 49th student from a group of 496 students."
Systematic
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A tax auditor selects every 1000th income tax return that is received."
Systematic
u
THE SYMBOL FOR THE POPULATION IS
DISJOINT
THEY HAVE NOTHING IN COMMON. WHEN IT STATES (A OR B) P(A OR B) = P(A)+P(B)
5. When you need to find the z-score that forms the boundary between 2 areas under the bell curve i.e. between top 20% & bottom 80% use:
The *Tail column* & find the proportion closest to the percentage e.g. the proportion closest to .2000; the z-score in that row is the z-score that forms that boundary.
Interquartile range
The ... IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula
kth percentile
The ... denoted, Pk , of a set of data is a value such that k percent of the observations are less than or equal to the value.
Q1 Q2 Q3
The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile. The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median. The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.
S=RANGE/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as
S=Range/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______
What measure of variation is very sensitive to extreme values?
The Range.
Suppose babies born after a gestation period of 32 to 35 weeks have a mean weight of 2500 grams and a standard deviation of 600 grams while babies born after a gestation period of 40 weeks have a mean weight of 2900 grams and a standard deviation of 390 grams. If a 35-week gestation period baby weighs 2750 grams and a 41-week gestation period baby weighs 3150 grams, find the corresponding z-scores. Which baby weighs more relative to the gestation period?
The baby born in week 41 weighs relatively more since its z-score, .64 . 64, is larger than the z-score of .42 . 42 for the baby born in week 35.
Whenever a data value is less than the mean,_____.
The corresponding z-score is negative.
State whether the data described below are discrete or continuous and explain why: The exact ages in hours of different cockroaches found in a certain city
The data are continuous because the data can take any value in an interval
State whether the data described below are discrete or continuous, and explain why: The temperatures (in degrees Fahrenheit) of pizzas fresh the from oven
The data are continuous because the data can take any value in any interval
State whether the data described below are discrete or continuous: The number of programs installed on various computers
The data are discrete because the data can only take in specific value
Determine whether the data described below are qualitative or quantitative and explain why: The types of climates for different regions (tropical, arid, temperate, etc.)
The data are qualitative because they don't measure or count anything
A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the faculty want to give the community the impression that they deserve higher salaries, should they advertise the mean or median of their current salaries?
The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.
Determine whether the given value is a statistic or a parameter: A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 131.6 volts
The given value is a parameter for the month because the data collected represented a population
Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile range?
The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.
Determine which of the 4 levels of measurement is the most appropriate for the data below: Years in which a war was started
The interval level of measurement is the most appropriate because the data can be ordered, difference is no natural starting point
Which of the following is NOT a property of the linear correlation coefficient r? -The value of r is always between -1 and 1 inclusive. -The value of r is not affected by the choice of x or y. -The value of r measure the strength of a linear relationship. -The linear correlation r is robust. This is, a single outlier will not affect the value of r.
The linear correlation coefficient is robust. That is, a single outlier will not affect the value of r.
Which of the following is NOT a characteristic of the mean? -The mean is relatively reliable. -The mean is called the average by statisticians. -The mean is sensitive to outliers. -The mean takes every data value into account.
The mean is called average by statisticians.
Which of the following is NOT a characteristic of the mean?
The mean is called the average by statisticians.
If each monthly cell phone bill in the country were doubled, how would the mean of the cell phone bills be affected?
The mean of the cell phone bills would double.
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean of the median? Why?
The mean will be likely larger BECAUSE the extreme values in the right tail tend to pull up the mean in the direction of the tail
What is mean of a set of data?
The measure of center found by adding the data values and dividing the total by the number of data values.
What is the Median?
The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.
What is the Midrange of a data set?
The measure of center that is the value midway between the max and min values in the original data set.
How can you tell from a boxplot if the distribution is symmetric?
The median is in the center of the box, and the left and right whiskers are approximately the same length.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.
The median
The median of a variable is the value that lies in the middle of the data when arranged in ascending order.We use M to represent the median
6. When you need to compute a raw score, that represents the minimum or maximum score needed to answer a question, look for the percentage in the question e.g. "What raw scores form the boundaries of the middle 60% of the distribution:
The middle 60% straddles the mean & can be divided into 2 = percentages; 30% & 30%. You look for the value closest to .3000 in the *mean to z column* & locate the z-score in that row. Then you use that z-score in the formula we use to compute raw score: X=mew + z sigma
A highly selective boarding school will only admit students who place at least 2 standard deviations above the mean on a standardized test that has a mean of 200 and a standard deviation of 24. What is the minimum score that an applicant must make on the test to be accepted?
The minimum score that an applicant must make on the test to be accepted is 248
mode
The mode of a variable is the most frequent observation of the variable that occurs in the data set. *if no observation occurs more than twice then there is NO MODE
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Social security numbers
The nominal level of measurement is most appropriate because the data cannot be ordered.ordered.
Standardized Score
The number of standard deviations that a piece of data lies above or below the mean. Z = (X - μ) / σ
What does P(B|A) represent?
The probability of event B occurring after it is assumed that event A has already occurred.
Which is relatively better: a score of 58 on a psychology test or a score of 49 on an economics test? Scores on the psychology test have a mean of 8585 and a standard deviation of 10. Scores on the economics test have a mean of 58 and a standard deviation of 3.
The psychology test score is relatively better because its z score is greater than the z score for the economics test score.
What makes the range less desirable than the standard deviation as a measure of dispersion?
The range does not use all the observations.
Interquartile range (IQR)
The range of the middle 50% of the observations in a data set. The difference between the upper quartile and the lower quartile. IQR = Q3 - Q1 Interpretation of the interquartile range is similar to that of the range and standard deviation. That is, the more spread a set of data has, the higher the interquartile range will be.
Determine which of the levels of measurement is most appropriate for the data below: Brain volumes measured in cubic cm
The ratio level of measurement is the most appropriate because the data can be ordered, differences can be found and are meaningful, and there is a natural starting point
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Length of the side of a square in cm
The ratioratio level of measurement is most appropriate because the data can becan be ordered commaordered, differences left parenthesis obtained by subtraction right parenthesisdifferences (obtained by subtraction) can becan be foundfound and areand are meaningful commameaningful, and thereand there is ais a naturalnatural startingstarting zerozero point.point.
VARIANCE
The square of the standard deviation is called the __
The standard deviation is used in conjunction with the _____ to numerically describe distributions that are bell shaped. The ____ measures the center of the distribution, while the standard deviation measures the ____ of the distribution.
The standard deviation is used in conjunction with the MEAN to numerically describe distributions that are bell shaped. The MEAN measures the center of the distribution, while the standard deviation measures the SPREAD of the distribution.
Determine whether the description below corresponds to an observational study or an experiment. In a studystudy sponsored by a company, 11 comma 07911,079 people were asked what contributes most to their anxiety commaanxiety, and 37 %37% of the respondents said that it was their health.health.
The study is an observational study because the survey subjects were not given any treatment.
What does Σx represent?
The sum of all data values. (All frequencies added together)
Which of the following is not a requirement of the binomial probability distribution? a. Each trial must have all outcomes classified into two categories b. The trials must be dependent. c. The procedure has a fixed number of trails. d. The probability of a success remains the same in all trails.
The trails must be dependent (For a binomial distribution, the trials must be independent.)
properties of standard deviation
The units of the standard deviation are the same as the units of the original data, the standard deviation is a measure of variation of all data values from the mean, the value of the standard deviation is never negative
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
unitless
The z-score is ... It has mean 0 and standard deviation 1.
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income.
Which of the following is NOT true about statistical graph. a. Similar graphs can be constructed in order to compare data sets. b. They utilize areas or volumes for data that are one-dimensional in nature. c. They can be used to consider the overall shape of the distribution. d. They can be used to identify extreme data values.
They utilize areas or volumes for data that are one-dimensional in nature. (Utilizing 2-or 3- dimensional pictures to represent 1- dimensional data is poor practice and distorts the data.
Quartiles
This divides data sets into fourths, or four equal parts.
A company was conducting a survey to investigate people's spending habits and how they may have changed in recent years. One question on the survey was, "Did you spend more/less/the same amount of money this year as you did in 2007, the year the recession began in earnest in this country?" Is this question biased? If so, what answer does it favor?
This question is biased toward "spend less," since it mentions the recent recession. Many people would feel that they should answer that they spent less, since the country is in a recession.
Which of the following is NOT a property of the standard deviation? a. When comparing variation in samples with very different means, it is good practice to compare the two standard deviation. b. The value of the standard deviation is never negative c. The st. dev. is a measure of variation of all data values from the mean. d. The units of the st. dev. are the same as the units of the original data.
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
With a height of 70 in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1 in and a standard deviation of 2.4 in. a. What is the positive difference between Roger's height and the mean? b. How many standard deviations is that [the difference found in part (a)]? c. Convert Roger's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between −2 and 2, is Roger's height usual or unusual?
To find the positive difference between Roger's height and the mean, subtract the mean from Roger's height and find the absolute value of the difference. 70 cm - 75.1 cm =5.1 in b. To determine how many standard deviations the difference is, compare the difference, 5.1, to the standard deviation, 2.4 5.1 Over 2.4 ≈2.13 standard deviations c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. Sample Population z= x- x overbar Over s or z=x- μ over σ The club is a population. Therefore, to convert Roger's height to a z score 70-75.1 divide 2.4 = -2.13
True or False: Chebyshev's inequality applies to all distributions regardless of shape, but the empirical rule holds only for distributions that are bell shaped
True, Chebyshev's inequality is less precise than the empirical rule, but will work for any distribution, while the empirical rule only works for bell-shaped distributions
True or False: When comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure.
True, because the standard deviation describes how far, on average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.
Σ is called and means
Uppercase sigma, and means the "sum of terms [xi]"
When making predictions based on regression lines, which of the following is not listed as a consideration? -Use the regression equation for predictions only if the graph of regression line on the scatter-plot confirms that the regression line fits the point reasonably well. -Use the regression equation for prediction only if the linear correlation coefficient r indicates that there is a linear correlation between two variables. -Use the regression line for prediction only if the data go far beyond the scope of the available sample data. -If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate.
Use the regression line for prediction only if the data go far beyond the scope of the available sample data
Which characteristic of data is a measure of the amount that the data values vary?
Variation
COMPLEMENT RULE
WHEN EVENTS DON'T OCCUR USE P(A) = 1-P(A)
CONTINUOUS DATA
WOULD BE ON A THERMOMETER.
Which of the following statements about correlation is true? -We say that there is a positive correlation between x and y if there x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if there is no distinct pattern in the scatter-plot. -We say that there is a negative correlation between x and y if the x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values decrease.
We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.
[St.Dev. 1]+[St.Dv. 2] = [34.7 + 13.5] = 48.2% probability
What is the probability that a randomly selected time falls between 40 and 42 seconds?
Z-SCORE
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a
Z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
What is a designed experiment?
When a researcher assigns individuals to a certain group intentionally changing the value of an explanatory variable, and then recording the value of the response for each group
Which of the following is NOT a property of the standard deviation? -The value of the standard deviation is never negative -The standard deviation is a measure of variation of all data values from the mean. -When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations. -The units of the standard deviation are the same as the unites of the original data.
When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations.
If the data set is symmetric or approximately symmetric, and no outliers, then
best measure of center: mean best measure of dispersion: standard deviation
The ________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.
linear correlation coefficient r
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean or the median? Why?
When data are either skewed left or skewed right, there are extreme values in the tail, which tend to pull the mean in the direction of the tail. If the distribution of the data is skewed right, there are large observations in the right tail. These observations tend to increase the value of the mean, while having little effect on the median.
A negative z-score indicates a data value is less than the mean.
Whenever a data value is less than the mean,
RANGE
Which measure of variation is very sensitive to extreme values?
Range
Which measure of variation is very sensitive to extreme values?
The mean is called the average by statisticians
Which of the following is NOT a characteristic of the mean?
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
Which of the following is NOT a property of the standard deviation?
MEAN
Which of the following is NOT a value in the 5-number summary?
B. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
Which of the following is always true?. A. For skewed data, the mode is farther out in the longer tail than the median. B. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same. C. The mean and median should be used to identify the shape of the distribution. D. Data skewed to the right have a longer left tail than right tail.
What is the difference between a random sample and a simple random sample?
With a random sample, each individual has the same chance of being selected. With a simple random sample, all samples of the same size have the same chance of being selected.
What is the formula to determine the x-value from z-score?
X = mew + z times sigma (X = u + zo). (Mean plus (2 multiplied by standard deviation)
In a symmetric and bell-shaped distribution, are the mean, median and mode the same?
Yes.
An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects ten schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? Simple random sample? Explain
Yes; no. The sample is random because all teachers have the same chance of being selected. It is not a simple random sample because some samples are not possible, such as a sample that includes teachers from schools that were not selected.
Suppose a student earns a 75 on his statistics exam, and his grade has a z-score of 1.5. Since the class did not perform well on the exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the student's z-score?
Your z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.
99.7% within 3 Standard deviation
[99.7-95= 4.7/2 = 2.35% >> [2.35% |..|.. () ..|..|2.35%]
Z-scores are turned into
a standard score. The purpose of z-scores is to identify and describe the exact location of each score in a distribution & to standardize an entire distribution to understand & compare scores from different tests.
Arithmetic mean
adding all values of variables and dividing by number of variables
xi means
all x values
To describe the exact position of a score within a distribution, z-score must transform each x-value into a signed number; positive or negative.
all z-scores above the mean are positive and all z-scores below the mean are negative. The number tells the distance between the score and the mean in terms of the number of standard deviations.
Variables
are the characteristics of the individuals within the population
In a probability histogram, there is a correspondence between ___.
area and probability.
Why is range not a good measure?
because it doesn't give you how wide the data is talking about but not weather it's scrunched or dispersed or how many n or N is
The U.S. Department of Housing and Urban Development(HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?
because the data are skewed right
A ________ is the collection of data from every member of the population. sample census placebo statistic
census
Which of the following is NOT a measure of center?
census
Which of the following is NOT a measure of center? -census -mean -median -mode
census
____ is the difference btw two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.
class width
sample artithmetic mean, x, (pronounced x bar)
computed using sample data, sample is a statistic
A ___ probability of an event is a probability obtained with knowledge that some other event has already occurred.
conditional (knowledge)
Descriptive statistics
consists of organizing and summarizing information
2. When you need to find a proportion between 2 positive OR 2 negative z-scores, you:
consult the *mean to z column* for both. Find proportions & subtract the smaller from the larger.
A ___ random variable has infinitely many values associated with measurements.
continuous
A _______ exists between two variables when the values of one variable are somehow associated with the values of the other variable.
correlation
kth percentile
denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value. Percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. Ex. P1 divides the bottom 1% of the observations from the top 99%, P2 divides the bottom 2% of the observations from the top 98% and so on.
Methods used that summarize or describe characteristics of data are called ___ statistics.
descriptive
Methods used that summarize or describe characteristics of data are called _______ statistics.
descriptive
Methods used that summarize or describe characteristics of data are called______ statistics.
descriptive
A _________ experiment allows the researcher to claim causation between an explanatory variable and a response variable
designed
Range is the
difference between the largest data value and the smallest
A ___ random variable has either a finite or a countable number of values.
discrete
Events that are ____ cannot occur at the same time.
disjoint (Disjoint events are mutually exclusive and cannot occur at the same time.)
If every x value is transformed into a z-score, then the distribution of z-scores will have what following properties regarding shape, mean, and standard deviation?
distribution of z-scores will have exactly the same shape as original distribution of scores; z-score mean will always have mean of 0 & z-scores will always have standard deviation of 1.
Find the sample variance and standard deviation. 23, 11, 5, 9, 10
do on calc
Response bias
exist when the answers on a survey do not reflect the true feelings of the respondent
Nonresponse bias
exists when individuals selected to be in the sample who do not respond to the surgery have different opinions from those who do
The ___ of a discrete random variable represents the mean value of the outcomes.
expected value
In a television advertisement, a company called "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this claim, an independent company, called "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one month, each participant was weighed again. The percent of weight lost was recorded for each person, where negative values indicated a weight gain. What type of study was performed?
experiment
numerical summary of data is said to be resistant if...
extreme values (very large or small) relative to the data do not affect its value substantially
How to find mean in odd N or n
find the middle value
The heights of the bars of a histogram correspond to ___ values.
frequency
A ____ indicates the shape and nature of the distribution of a data set.
frequency distribution
Box-and-Whiskers Plot
graph representing information about the five-number summary and outliers for a given data set
Two events A and B are ___ if the occurrence of one does not affect the probability of the occurrence of the other.
independent
Biased samples
internet polls, in which people online can decide whether to respond mail-in poll, in which subjects can decide whether to reply telephone call in polls, in which newspaper, radio, or television announcements ask that you voluntarily call a special number to register your opinion
A parameter
is a numerical summary of a population
A statistic
is a numerical summary of a sample
Population arithmetic mean, μ(pronounced "mew")
is computed using all the individuals in a population.The population mean is a parameter
Cluster sample
is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups
Stratified sample
is obtained by dividing the population into homogeneous groups and random selecting individuals from each group
Resistant means
is the measure of central tendencies resitant to extreme values, does it alter the data significantly
percentile
provided information about how the data are spread over the interval from the smallest value to the largest value. (Recall the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile)
When finding the mean of a set of data you should always do what first
put data in order!!!! median will be skewed otherwise
Mode is primarily a measure of
qualitative central tendency
mode can be used for both
quantitative and qualitative
A ___ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.
random
What measure of variation is very sensitive to extreme values?
range
A ___ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
In a ___ distribution, the frequency of a class is replaced with a proportion or percent.
relative frequency
In a boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left whisker, the distribution is skewed_______
right
the 68-95-99.7% rule applies for
roughly all bell-shaped curves
The symbol for sample standard deviation is
s
What is the symbol for sample standard deviation?
s
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s = range / 4
The Range Rule of Thumb roughly estimates the st. dev. of a data set as ___.
s = range/4
the symbol sample variance is
s^2
n means
sample
"x-bar" means
sample mean
The ___ for a procedure consists of all possible simple events or all outcomes that cannot be broken down further.
sample space
s
sample standard deviation
What is s2 the symbol for?
sample variance
When determining whether there is a correlation between two variables, one should be a ______ to explore the data visually.
scatter-plot
A ___ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
Deviation score
score minus the mean = how much the score deviates from the mean.
A histogram aids in analyzing the ___ of the data.
shape of the distribution
the symbol for population standard deviation is
skyrimy o thing
the symbol of population variance is
skyrimy o thing^2
z-score transformation
statistical technique that uses the mean and standard deviation to transform each raw score into a standard score
LOOK AT REAL LINE WITH LOF, LIF, UIF, UOF NUMBERS, AND OUTLIERS IN SLIDE 87!!!
study for midterm
Class width is found by ___.
subtracting a lower class limit from the next consecutive lower class limit
x with line above= weird Epison thing with x n
sum of all data values number of data values
(Σxi)/N means
sum of all x values / N - population
How to find mean in even N or n
take the mean of the middle 2 values
The larger the standard deviation means...
that observations are more distant from the typical value, and therefore more dispersed
Sampling bias means
that the technique used to obtain the sample's individuals tend to favor one part of the population over another
4. When you need to find the P for an area *greater than* a negative Z or *Less than* a positive Z use:
the *Body column*. Because the body column includes the mean & the tail.
For data sets having a distribution that approximately bell-shaped, ______ states that about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule