STAT 231 MIDTERM REVIEW
uniform
(or rectangular) all the same output
Solve the problem. A group of 49 randomly selected students has a mean age of 22.4 years with a standard deviation of 3.8. Construct a 98% confidence interval for the population mean. (20.3, 24.5) (21.1, 23.7) (19.8, 25.1) (18.8, 26.3)
(21.1, 23.7)
Empirical study
Study in which the outcomes are different every trial
Target population
The population whose attributes we are interested in.
convenience
choosing members at convenience
the ________ "r" is to ___, the more ______ and strong it is.
closer, linear
population
collection of all outcomes, responses, measurements, or counts that are of interest
types of form
linear curvilinear clusters
Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive, A die is rolled. A: The result is a 3. B: The result is an odd number. not mutually exclusive mutually exclusive
not mutually exclusive
Mutually Exclusive Events
don't share any outcomes; cannot occur at the same time
Simple Event
event that consists of a single outcome
Systematic Sample
every member of the population is given an assigned number and ordered. Then members are randomly selected
Example: Researchers choose a random sample of 1,760 U.S eligible voters and collect data on their opinions regarding their political preferences This is an example of a _________ __________
sample survey
Poll a sample of individuals with the following q's: while watching TV, do you eat more snacks (a) more than usual (b) less than usual (c) same amount This is an example of a
sample survey
Randomization
subjects are randomly assigned to different groups
sample
subset, or part of the population
mode
the MOST occurring
When calculating the median and "N" is EVEN, the median is
the mean of the 2 center observations
Q3
the median from Q2 to the end of the set
finding the line that best fits the pattern of the linear relationship
linear regression
Range
maximum - minimum
N = 32, so the median will be the
mean of 2 observations
We use ________ and _______ as measures of center and spread ONLY for reasonably symmetric distributions with no outliers
mean, standard deviation
Find the mean, variance, and standard deviation of the binomial distribution for which n=250 and p=0.69.
mean=172.5 variance=53.475 standard deviation=7.3127 (see test 3 #4)
Solve the problem. Identify the level of measurement for data that are the number of milligrams of tar in 79 cigarettes.
ratio
cumulative frequency
the sum of the frequencies in that class & previous classes
T/F the lurking variable has an effect on both the explanatory and response variables
true
Solve the problem. What method of data collection would you use to collect data for a study where you would like to determine the chance getting three girls in a family of three children?
use a simulation
Solve the problem. Find the z-score for the value 88, when the mean is 95 and the standard deviation is 7. z = -1.14 z = -1.00 z = 0.85 z = -0.85
z = -1.00
State the type of correlation (positive, negative, or zero) you would expect from the set of data. a. The amount of time a candle has been burning and the height of the candle. b. The number of siblings a student has and their grade point average.
a. negative b. zero (test 2 #9-10)
Parameter
the numerical description of a population characteristic
matched-pair design
where subjects are paired up according to a similarity. One subject in each pair is randomly selected to receive one treatment while the other subject receives a different treatment.
In regards to the "4 possibilities for role-type classifications" what would light type and nearsightedness be?
C --> C
In regards to the "4 possibilities for role-type classifications" (2 variables) what would gender and test scores be?
C --> Q
Take category or label values, and place an individual into one of several groups
Categorical
Predictive problem
Involves trying to estimate the future value of a variable.
If the graph is skewed right...
Mean is larger than median
If the graph is skewed LEFT
Mean is smaller than median
The group's receiving different things
Treatment groups
A. To make inferences about a population based on information from a random sample
What is the objective of statistics? A. To make inferences about a population based on information from a random sample B. To make inferences about a sample with a high degree of reliability C. To state, beyond a shadow of a doubt, a conclusion about a population D. to make inferences about a random sample based on information from the population E. To use numbers in as many different ways as possible
Descriptive statistics
branch of statistics that involves the organization, summarization, and display of data
Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. In California's Pick Three lottery, a person selects a 3-digit number. The probability of winning California's Pick Three lottery is 1/1000 empirical probability classical probability subjective probability
classical probability
Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. The probability that a newborn baby is a boy is 1/2. empirical probability classical probability subjective probability
classical probability
Solve the problem. At a local community college, five statistics classes are randomly selected and all of the students from each class are interviewed. What sampling technique is used? systematic random convenience stratified cluster
cluster
Compound Events
concerning 2 or more events and their relationship
Confounding Variable
confuses the study and effects your results because it is related to variables of interest in the study
Qualitative data
consist of attributes, labels, or nonnumerical entries. Ex: Major, Place of birth, Eye color
Solve the problem. The average age of the students in a statistics class is 22 years. Does this statement describe:
descriptive statistics?
Deviation
deviation of an entry (x) in a population data set is the difference between the entry and the meah
Sampling Error
differences that can occur between the sample and population
a return rate that after a certain point fails to increase proportionately to additional outlays of investment
diminishing returns
When describing the relationship between 2 quantitative variables using a scatterplot, we look at:
direction form strength outliers
Solve the problem. Identify the level of measurement for data that are the temperature of 90 refrigerators. nominal interval ordinal ratio
interval
Census
is a count or measure of an entire population. Taking a census provides complete information, but it is often costly and difficult to perform.
Solve the problem. An elementary school claims that the standard deviation in reading scores of its fourth grade students is less than 3.45. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. right-tailed two-tailed left-tailed
left-tailed
ogive (cumulative frequency graph)
line graph that displays the cumulative frequency of each class as its upper class boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on vertical axis
cumulative frequency graph
line graph that represents the cumulative frequency
A _________ variable is a variable that is not among the explanatory or response variables in a study, but could substantially affect your interpretation among those variables.
lurking
Find the mean and standard deviation of the sampling distribution of sample means when the mean=50, standard deviation=6, and n=25.
mean = 50 standard error of the mean = 1.2 (see test 3 #15)
We use _________ and __________ as measures of center and spread for all other cases.
median, IQR
Solve the problem. Identify the level of measurement for data that are a list of 1247 social security numbers. ratio ordinal nominal interval
nominal
Solve the problem. Identify the level of measurement for data that are the nationalities listed in a recent survey (for example, Asian, European, or Hispanic).
nominal
Ex 1: In a recent survey, 834 employees in the United States were asked if they thought their jobs were highly stressful. Of the 834 respondents, 517 said yes. Identify the population and the sample. Describe the sample data set
population: responses of all employees in the US sample: responses of the 834 employees in the US data set: 517 who said yes & 317 who said no; so if there is 817 total and we know 517 said yes, calculation * 834 (all) -517 (yes) =317 (nos)
an increase in one of the variables is associated with an increase in the other
positive relationship
Solve the problem. The names of 70 contestants are written on 70 cards. The cards are placed in a bag, and three names are picked from the bag. What sampling technique was used?
random
Identify the data set's level of measurement. number of milligrams of tar in 85 cigarettes
ratio
Identify the data set's level of measurement. the lengths (in minutes) of the top ten movies with respect to ticket sales in 2007
ratio
Solve the problem. The numbers of touchdowns scored by a major university in five randomly selected games are given below. Identify the level of measurement. 1 5 4 5 5
ratio
The dependent variable; the outcome of the study
response variable
choosing the individuals from the population that will be included in the sample
sampling
list of potential individuals to be sampled - doesn't match the population of interest; may/may not be biased
sampling frame
Complete the frequency distribution (midpoint, relative frequency, and cumulative frequency) and use the distribution to construct a frequency histogram. Class; f(x) 0-7; 8 8-15; 8 16-23; 3 24-31; 3 32-39; 3
see quiz 2 #2
Compliment of Event E
set of all outcomes in a sample space that are not included in event E E' (E prime)
occurs whenever including a lurking variable causes you to rethink the direction of an association
simpsons paradox
SAT math scores of 1,000 future engineers and scientists is an example of
skewed left distribution
"SALARY.... Most people earn the low/medium range of salaries except CEOs, athletes, etc." This is an example of a _______ __________ ___________________
skewed right distribution
Age of death from trauma (car accidents, murder, suicide) is an example of
skewed right distribution
outlier
the most removed from the entire set
Statistics
the science of collecting, organizing, analyzing, and interpreting data in order to make a decision
Statistics
the science of collecting, organizing, analyzing, and interpreting data in order to make decisions
Goal of Statistics
to collect data from a small part (subset) of a larger group so that we can learn something about the larger group
Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Find the equation of the regression line for the given data. hrs x: 3 5 2 8 2 4 4 5 6 Scores Y: 65 80 60 88 66 78 85 90 90 71 y= -56.11x - 5.044 y= 56.11x - 5.044 y= 5.044x + 56.11 y= -5.044x + 56.11
y= 5.044x + 56.11
individuals report variables values frequently by giving their opinions
sample
Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Calculate the correlation coefficient, r. temp x: 74 87 93 92 90 100 77 102 82 # of absences y: 5 9 12 12 10 17 6 17 7 0.881 0.819 0.980 0.890
.0980
Solve the problem. A study of 1000 randomly selected flights of a major airline showed that 782 of the flights arrived on time. What is the probability of a flight arriving on time? 500/109 391/500 109/500 500/391
391/500
Solve the problem. The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Find the standard error of estimate, se, given that age x: 38 41 45 48 51 53 57 61 65 pressure y: 116 120 123 131 142 145 148 150 152 5.572 3.099 6.981 4.199
4.199
Given the equation of a regression line is y = 5x - 6, what is the best predicted value for y given x = 10? Assume that the variables x and y have a significant correlation. 44 55 9 56
44
is unitless; just measures the strength of a linear relationship
"r"
Solve the problem. A multiple regression equation is y = -35,000 + 130x1 + 20,000x2, where x1 is a person's age, x2 is the person's grade point average in college, and y is the person's income. Predict the income for a person who is 26 years old and had a college grade point average of 2.3. $485,299 $84,380 $49,380 $14,380
$14,380
Solve the problem. A random sample of 10 parking meters in a beach community showed the following incomes for a day. Assume the incomes are normally distributed. 3.60 4.50 2.8 6.3 2.6 5.2 6.75 4.25 8 3 Find the 95% confidence interval for the true mean. ($1.35, $2.85) ($2.11, $5.34) ($3.39, $6.01) ($4.81, $6.31)
($3.39, $6.01)
Solve the problem. Find the z-score for which 99% of the distribution's area lies between -z and z. (-1.645, 1.645) (-2.575, 2.575) (-1.96, 1.96) (-1.28, 1.28)
(-2.575, 2.575)
Sample variance
(1/n-1) * sum of (x-mean)^2
Solve the problem. In a recent study of 42 eighth graders, the mean number of hours per week that they watched television was 19.6 with a standard deviation of 5.8 hours. Find the 98% confidence interval for the population mean. (19.1, 20.4) (17.5, 21.7) (18.3, 20.9) (14.1, 23.2)
(17.5, 21.7)
Solve the problem. Construct a 98% confidence interval for the population mean, . Assume the population has a normal distribution. A study of 14 bowlers showed that their average score was 192 with a standard deviation of 8. (186.3, 197.7) (328.3, 386.9) (115.4, 158.8) (222.3, 256.1)
(186.3, 197.7)
Solve the problem. Construct a 95% confidence interval for the population mean, . Assume the population has a normal distribution. A random sample of 16 fluorescent light bulbs has a mean life of 645 hours with a standard deviation of 31 hours. (321.7, 365.8) (628.5, 661.5) (531.2, 612.9) (876.2, 981.5)
(628.5, 661.5)
Solve the problem. Construct a 95% confidence interval for the population mean, . Assume the population has a normal distribution. A sample of 25 randomly selected students has a mean test score of 81.5 with a standard deviation of 10.2. (87.12, 98.32) (77.29, 85.71) (66.35, 69.89) (56.12, 78.34)
(77.29, 85.71)
Solve the problem. A random sample of 40 students has a test score with average = 81.5 and s = 10.2. Construct the confidence interval for the (mean symbol) population mean, if c = 0.90. (66.3, 89.1) (78.8, 84.2) (51.8, 92.3) (71.8, 93.5)
(78.8, 84.2)
Choose the potion of the sentence that illustrates descriptive statistics. Students who came into office hours (a) had a mean score of 88 on the first test. It would appear that (b) attending office hours improves performance.
(a) (see test 1 #1)
Solve the problem. Calculate the correlation coefficient, r, for the data below. x:-12 -10 -3 -6 -8-9 -7 -5-4 -11 Y:14 -3 11 0 1 4 8 -2 9 10 -0.104 -0.549 -0.581 -0.132
-0.104
Solve the problem. Determine the standardized test statistic, z, to test the claim about the population proportion p = 0.250 given n=48 and p = 0.231. Use α = 0.01. -1.18 -0.304 -2.87 -0.23
-0.304
Solve the problem. Use a standard normal table to find the z-score that corresponds to the cumulative area of 0.01. -0.255 0.255 -2.33 2.33
-2.33
Solve the problem. Use a standard normal table to find the z-score that corresponds to the cumulative area of 0.01. -0.255 0.255 -2.33 2.33
-2.33
Solve the problem. Use the regression equation to predict the value of y for x = 3.2. Assume that the variables x and y have a significant correlation. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 -5.274 0.541 6.790 4.311
-5.274
Solve the problem. Use the regression equation to predict the value of y for x = 3.2. Assume that the variables x and y have a significant correlation. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 -5.274 0.541 6.790 4.311
-5.274
Interval Level of Measurement
-data can be ordered -meaningful differences between data entries can be calculated -zero entry is just a position
Ratio Level of Measurement
-zero means nonexistent -meaningful differences are multiples of other data entries
Solve the problem. The distribution of blood types for 100 Americans is listed in the table. If one donor is selected at random, find the probability of selecting a person with blood type A+ or A-. Blood type: o+ o- A+ A- B+ B- AB+ AB- 37 6 34 6 10 2 4 1 .4 .34 .6 .45
.4
A coin is tossed. Find the probability that the result is heads. .1 1 0.5 0.9
.5
Solve the problem. At the local racetrack, the favorite in a race has odds 3:2 in favor of winning. What is the probability that the favorite wins the race? 0.8 0.2 0.6 0.4
.6
Solve the problem. The events A and B are mutually exclusive. If P(A) = 0.2 and P(B) = 0.1, what is P(A and B)? 0.3 0.02 0 0.5
0
Solve the problem. The events A and B are mutually exclusive. If P(A) = 0.2 and P(B) = 0.1, what is P(A and B)? 0.5 0.3 0 0.02
0
Solve the problem. Assume that the heights of men are normally distributed with a mean of 67.9 inches and a standard deviation of 2.8 inches. If 64 men are randomly selected, find the probability that they have a mean height greater than 68.9 inches. 0.8188 0.0021 9.9671 0.9005
0.0021
suppose you are using α = 0.01 to test the claim that μ ≤ 29 using a P-value. You are given the sample statistics n = 40, x = 30.8, and s = 4.3. Find the P-value. 0.1030 0.0211 0.0040 0.9960
0.0040
Solve the problem. Determine the margin of error if the grade point averages for 10 randomly selected students from a class of 125 students has a mean of = 2.7. Assume the grade point average of the 125 students has a mean of = 2.9. 0.2 2.6 -0.2 2.8
0.2
Solve the problem. A survey of 250 homeless persons showed that 86 were veterans. Find a point estimate p, for the population proportion of homeless persons who are veterans. 0.34400002 0.524 0.256 0.65599998
0.34400002
Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) Major Frequency Mathematics 230 English 206 Engineering 86 Business 176 Education 222 What is the probability that a randomly selected student with a Master's degree majored in English or Mathematics? Round your answer to three decimal places. 0.224 0.474 0.526 0.250
0.474
Solve the problem. Use the standard normal distribution to find P(0 < z < 2.25). 0.5122 0.7888 0.4878 0.8817
0.4878
Solve the problem. Find the standard error of estimate, se, for the data below, given that y = -2.5x x: -1 -2 -3 -4 y: 2 6 7 10 0.532 0.349 0.675 0.866
0.866
Solve the problem. The table lists the smoking habits of a group of college students. no yes Heavy man 135 52 5 woman 187 21 5 If a student is chosen at random, find the probability of getting someone who is a man or a non-smoker. Round your answer to three decimal places. 0.941 0.820 0.948 0.936
0.936
Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing a red card? 1/2 1/13 1/52 1/4
1/2
Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing a red card? 1/2 1/52 1/4 1/13
1/2
Solve the problem. Use Bayes' theorem to solve this problem. A storeowner purchases stereos from two companies. From Company A, 600 stereos are purchased and 11% are found to be defective. From Company B, 350 stereos are purchased and 1% are found to be defective. Given that a stereo is defective, find the probability that it came from Company A. 77/139 132/139 7/139 12/139
132/139
Solve the problem. Identify the midpoint of the first class. Weight (in Pounds) Frequ. 135-139 6 140-144 4 145-149 11 150-154 15 155-160 8 137 139 135 11
137
Solve the problem. The cholesterol levels (in milligrams per deciliter) of 30 adults are listed below. Find Q1. 154 156 165 165 170 171 172 180 184 185 189 189 190 192 195 198 198 200 200 200 205 205 211 215 220 220 225 238 255 265 171 180 184.5 200
180
Provide an appropriate response. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 118.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 195.8 201.2 196.7 196.1
196.1
Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 201.2 192.2 196.7 196.1
196.1
Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 201.2 196.1 196.7 192.2
196.1
Solve the problem. Find the value of E, the margin of error, for c = 0.90, n = 10 and s = 3.6. 2.06 2.09 0.66 1.57
2.09
Solve the problem. The average IQ of students in a particular calculus class is 110, with a standard deviation of 5. The distribution is roughly bell-shaped. Use the Empirical Rule to find the percentage of students with an IQ above 120. 15.85% 2.5% 13.5% 11.15%
2.5%
Solve the problem. The average IQ of students in a particular calculus class is 110, with a standard deviation of 5. The distribution is roughly bell-shaped. Use the Empirical Rule to find the percentage of students with an IQ above 120. 2.5% 11.15% 15.85% 13.5%
2.5%
Provide an appropriate response. Grade points are assigned as follows: A = 4, B = 3, C = 2, D = 1, and F = O. Grades are weighted according to credit hours. If a student receives an A in a four-unit class, a D in a two-unit class, a B in a three-unit class and a C in a three-unit class, what is the student's grade point average?
2.75
Solve the problem. Grade points are assigned as follows: A = 4, B = 3, C = 2, D = 1, and F = O. Grades are weighted according to credit hours. If a student receives an A in a four-unit class, a D in a two-unit class, a B in a three-unit class and a C in a three-unit class, what is the student's grade point average? 2.75 2.50 3.00 1.75
2.75
A freshman's first semester grades are as follows: Biology: - Grade: C - Grade Points Earned: 2.0 - Credits: 4 Calculus 1: - Grade: B - Grade Points Earned: 3.0 - Credits: 4 Spanish: - Grade: B+ - Grade Points Earned: 3.3 - Credits: 3 Philosophy: - Grade: A- - Grade Points Earned: 3.6 - Credits: 3 Creative Writing: - Grade: A - Grade Points Earned: 4.0 - Credits: 1 Using the credits as weights, use a weighted mean to calculate the students grade point average for the first semester. Round to the nearest hundredth.
2.98 (see test 1 #19)
A restaurant offers a $12 dinner special that has 5 choices of an appetizer, 10 choices for an entree, and 4 choices for a dessert. How many different meals are available when you select an appetizer, an entree, and a dessert?
200 meals (see quiz 4)
The box an whisker plot (test 2 #4) shows the cost per DVD for a sample of 44 DVDs. How many DVDs cost between $14 and $20?
22 DVDs
Solve the problem. A city in the Pacific Northwest recorded its highest temperature at 74 degrees Fahrenheit and its lowest temperature at 23 degrees Fahrenheit for a particular year. Use this information to find the upper and lower limits of the first class if you wish to construct a frequency distribution with 10 classes. 23-27 18-28 23-28 23-29
23- 27
Solve the problem. Given that P(A or B)=1/6, P(A)=1/7, and P(A and B)=1/8, find P(B) 73/168 1/16 31/168 25/168
25/168
Solve the problem. The grade point averages for 10 students are listed below. Find the range of the data. 2.0 3.2 1.8 2.9 .9 4.0 3.3 2.9 3.6 .8 3.2 1.4 2.45 2.8
3.2
Solve the problem. Find the critical value, tc for c = 0.99 and n = 10. 2.2821 2.262 1.833 3.250
3.250
Solve the problem. For the following data set, approximate the sample standard deviation. Height (in inches) Frequency 50-52 5 53-55 8 56-58 12 59-61 13 62-64 11 2.57 .98 1.86 3.85
3.85
Solve the problem. If an individual is selected at random, what is the probability that he or she has a birthday in July? Ignore leap years. 31/365 1/365 12/365 364/365
31/365
Solve the problem. If a couple plans to have five children, how many gender sequences are possible? 5 32 25 3125
32
Solve the problem. SAT verbal scores are normally distributed with a mean of 446 and a standard deviation of 91. Use the Empirical Rule to determine what percent of the scores lie between 355 and 446. 47.5% 34% 68% 49.9%
34%
Solve the problem. The ages of 10 grooms at their first marriage are listed below. Find the midquartile. 35.1 24.3 46.6 41.6 32.9 26.8 39.8 21.5 45.7 33.9 43.7 34.5 34.1 34.2
34.2
B. Stratified
35 sophomores, 39 juniors, and 29 seniors are randomly selected from 475 sophomores, 517 juniors, and 550 seniors at a certain high school. What sampling technique is used? A. Simple random B. Stratified C. Convenience
Solve the problem. The birth weights for twins are normally distributed with a mean of 2353 grams and a standard deviation of 647 grams. Use z-scores to determine which birth weight could be considered unusual. 1200g 2000g 3600g 2353g
3600 g
Professor Duckett has 4 Celtics jerseys, 7 Celtics hats, and 13 Celtics t-shirts. How many different outfits can he choose from when he goes to the game?
364 (test 2 #13)
Solve the problem. A study of 1000 randomly selected flights of a major airline showed that 782 of the flights arrived on time. What is the probability of a flight arriving on time? 391/500 500/391 109/500 500/109
391/500
Solve the problem. A researcher found a significant relationship between a person's age, x1, the number of hours a person works per week, x2, and the number of accidents, y, the person has per year. The relationship can be represented by the multiple regression equation y = -3.2 + 0.012x1 + 0.23x2. Predict the number of accidents per year (to the nearest whole number) for a person whose age is 41 and who works 31 hours per week. 5 6 4 3
4
Solve the problem. A tire company finds the lifespan for one brand of its tires is normally distributed with a mean of 47,500 miles and a standard deviation of 3000 miles. If the manufacturer is willing to replace no more than 10% of the tires, what should be the approximate number of miles for a warranty? 51,340 52,435 42,565 43,660
43,660
The median is the value that separates data into the bottom ______ and top 50%
50%
Solve the problem. Seven guests are invited for dinner. How many ways can they be seated at a dinner table if the table is straight with seats only on one side? 40,320 720 4 5040
5040
Solve the problem. Identify the midpoint of the first class. Height (in inches) Frequency 50-52 5 53-55 8 56-58 12 59-61 13 62-64 11 52 50 51 49.5
51
Provide an appropriate response. Use the ogive below to approximate the cumulative frequency for 24 hours. Student Answer: 75 27 17 63
63
Solve the problem. Lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 16 days. Use the Empirical Rule to determine the percentage of women whose pregnancies are between 252 and 284 days. 50% 68% 95% 99.7%
68%
Here are the number of hours that nine students spend on the computer on a typical day: 1 6 7 5 5 8 11 12 15 The median number of hours spent on the computer is:
7
Provide an appropriate response. Find the sample standard deviation. 2 6 15 9 11 22 1 4 8 19
7.1
Use the grouped data formulas to find the indicated mean or standard deviation. A random sample of 30 high school students is selected. Each student is asked how many hours he or she spent on the Internet during the previous week. The results are shown in the histogram. Estimate the sample mean. 8.1 8.3 7.9 7.7
7.9
Provide an appropriate response. Use the histogram below to approximate the mode heart rate of adults in the gym. a) 70 b) 2 c) 55 d) 42
70
Provide an appropriate response. The scores of the top ten finishers in a recent golf tournament are listed below. Find the median score. 67 67 68 71 72 72 72 72 73 76
72
Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. What is the best predicted value for y given Assume that the variables x and y have a significant correlation. # of absences x: 0 3 6 4 9 2 15 8 Final grade y: 98 86 80 82 71 92 55 76 76 78 79 77
77
Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. What is the best predicted value for y given x = 7? Assume that the variables x and y have a significant correlation. # of absences x: 0 3 6 4 9 2 15 8 5 Final Grade y: 98 86 80 82 71 92 76 78 79 77
77
Solve the problem. The lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 15 days. A baby is premature if it is born three weeks early. What percentage of babies are born prematurely? 6.81% 8.08% 10.31% 9.21%
8.08%
Solve the problem. In a survey of 2480 golfers, 15% said they were left-handed. The survey's margin of error was 3%. Find the confidence interval for p. 84.5% 98.5% 95% 80%
84.5%
Find the weighted mean of the data. The scores and the percents of the final grade for a statistics student are shown below. What is the student's mean score? HW: 85, 5% Quizzes: 80, 35% Project: 100, 20% Speech: 90, 15% Final exam: 93, 25%
89% (see quiz 2)
If Q3 is the median of the top 1/2 of the data, since there are 16 observations in that half, Q3 is the mean of the (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) & (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) observations in that half
8th and 9th observations
Use the grouped data formulas to find the indicated mean or standard deviation. The manager of a bank recorded the amount of time a random sample of customers spent waiting in line during peak business hours one Monday. The frequency distribution below summarizes the results. Approximate the sample mean. Round your answer to one decimal place. 9.0 13.5 9.2 7.7
9.2
Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Construct a 95% prediction interval for y, the number of days absent, given x = 95 degrees, y = 0.449x - 30.27 and se = 0.934. tep x: 72 85 91 90 88 98 75 100 80 # of absences y: 3 7 10 10 8 15 4 15 5 4.321 < y < 6.913 3.176 < y < 5.341 9.957 < y < 14.813 6.345 < y < 8.912
9.957 < y < 14.813
Provide an appropriate response. The mean score of a competency test is 77, with a standard deviation of 4. Use the Empirical Rule to find the percentage of scores between 69 and 85. (Assume the data set has a bell-shaped distribution.) 95% 68% 50% 99.7%
95%
Solve the problem. A competency test has scores with a mean of 69 and a standard deviation of 4. A histogram of the data shows that the distribution is normal. Use the Empirical Rule to find the percentage of scores between 61 and 77. 95% 68% 99.7% 50%
95%
Solve the problem. Assume that the heights of men are normally distributed with a mean of 69.0 inches and a standard deviation of 2.8 inches. The U.S. Marine Corps requires that men have heights between 64 and 78 inches. Find the percentage of men meeting these height requirements. 31.12% 96.26% 99.93% 3.67%
96.26%
For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 3 SD of the observation
99.7%
Right-skewed data
A data set where mean >> median
Attribute
A function of variates collected in a study
When the experiment is repeated many times the proportion of intervals that will contain the true value will converge to p%
A p% confidence interval for a variable satisfies _____________________________________________________________.
No evidence for association
A relative risk value close to 1 indicates _________________________.
Strong evidence for association
A relative risk value far from 1 indicates _________________________.
Sampling Error
A sampling error is the difference between results of a sample and those of the population.
Simulation
A simulation is the use of a mathematical or physical model to reproduce the conditions of a situation or process. Collecting data often involves the use of computers. Simulations allow you to study situations that are impractical or even dangerous to create in real life, and often they save time and money. Ex: Automobile manufactures use simulations with dummies to study the effects of crashes on humans.
D. No, because the probability of success is different for each trial
A state lottery randomly chooses 6 balls numbered from 1 through 43 without replacement. You choose 6 numbers and purchase a lottery ticket. The random variable represents the number of matches on your ticket to the numbers drawn in the lottery. Determine whether this experiment is binomial A. No, there are more than two outcomes for each trial B. Yes, there are a fixed number of trials and the trials are independent of each other C. Yes, the probability of success is the same for each trial D. No, because the probability of success is different for each trial
C. 0.555 B. 0.5 D. 0.6 D. The conditional distribution of opinion given gender
A student organization is trying to decide whether or not to offer more movies on campus. A random sample of 1000 students was asked if they were in favor of more movies on campus. The results by gender are shown in the table below In Favor 330 (Male) 225 (Female) No Opinion 165 (Male) 180 (Female) Opposed 55 (Male) 45 (Female) What proportion of the sampled students is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 What proportion of the sampled females is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 What proportion of sampled males is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 To answer the original question regarding whether or not to offer more movies on campus, which distribution should the student organization study? A. The joint distribution of gender and opinion B. The marginal distribution of gender C. The conditional distribution of gender given opinion D. The conditional distribution of opinion given gender
Experiment is binomial success in this experiment is selecting a worker who is reducing the amount of vacation
A survey asks 1200 workers, "Has the economy forced you to reduce the amount of vacation you plan to take this year?" Forty-six percent of those surveyed say they are reducing the amount of vacation. Twenty workers participating in the survey are randomly selected. The random variable represents the number of workers who are reducing the amount of vacation. Decide whether the experiment is a binomial experiment.
Conveince Sample
A type of sample that often lead to biased studies (so it is not recommended) is a convenience sample. A convenience sample consists only of members of the population that are easy to get.
Outlier
Any point outside the whiskers of a box plot is an __________________________.
Q1 [32] - 1.5 (9.5) [IQR] = 17.75 Q3 [41.5] + 1.5 (9.5) [IQR] = 55.75 What does this tell us?
Anything below 17.75 and above 55.75 is a suspected outlier.
Shrinks
As p increases the 100p% likelihood interval __________________________.
Shoe sizes - since the shoe sizes of women are usually smaller than those of men, we'd expect 2 places where the bars are higher. This is an example of...
Bimodal graph
The key to establishing ____________ is to rule out the possibility of any________variable, or in other words, to ensure that individuals differ only with respect to the values of the ________ variable
Causation, lurking, explanatory
Median
Central observation when they are arranged in increasing order, or average of two central ones
Quantitative Data
Consists of numerical measurements or counts
The group that tries to quit w/o drugs or therapy
Control group
Stresses that the values of the experiments explanatory variables have been assigned by researchers as opposed to naturally
Controlled experiment
Null hypothesis
Conventional wisdom, if there is not evidence to reject it we fail to reject it.
Data at the interval level of measurement
Data at the interval level of measurement can be ordered, and meaningful differences between data entries can be calculated. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero
Data at the ratio level of measurement
Data at the ratio level of measurement are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data entries can be formed so that one data entry can be meaningfully expressed as a multiple of another.
C. 19.5 (Lies in the middle of the 20 values)
Data on the mileage of 20 randomly selected cars is listed below (ordered for convenience) 12 13 15 16 16 17 18 18 19 19 20 20 22 23 24 26 26 27 27 29 What is the median mileage for these 20 cars? A. 17.5 B. 19 C. 19.5 D. 20
The events ARE NOT mutually exclusive, since there IS AT LEAST 1 PRESIDENTIAL CANDIDATE WHO LOST THE POPULAR VOTE and LOST THE ELECTION
Determine whether the events in the accompanying Venn diagram are mutually exclusive
The events ARE NOT mutually exclusive, since there are SOME movies that are rated PG-13 and RECEIVE MOSTLY POSITIVE REVIEWS
Determine whether the events in the accompanying Venn diagram are mutually exclusive
D. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) - P(A and B)
Determine whether the following statement is true or false The probability that event A or event B will occur is P(A or B) = P(A) + P(B) - P(A or B) A. True B. False, the probability that A or B will occur is P(A or B) = P(A) * P(B) C. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) D. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) - P(A and B)
A. The random variable is discrete, because it has a countable number of possible outcomes
Determine whether the random variable x is discrete or continuous Let x represent the number of fish caught during a fishing tournament A. The random variable is discrete, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is continuous, because it has a countable number of possible outcomes D. The random variable is discrete, because it has an uncountable number of possible outcomes
D. The random variable is discrete, because it has a countable number of possible outcomes
Determine whether the random variable x is discrete or continuous Let x represent the number of people with blood type A in a random sample of 21 people A. The random variable is continuous, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is discrete, because it has an uncountable number of possible outcomes D. The random variable is discrete, because it has a countable number of possible outcomes
D. The random variable is discrete, because it has a countable number of possible outcomes
Determine whether the random variable x is discrete or continuous Let x represent the number of statistics students now reading a book A. The random variable is continuous, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is discrete, because it has an uncountable number of possible outcomes D. The random variable is discrete, because it has a countable number of possible outcomes
A. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to 1 - P(none of the items)
Explain how the complement can be used to find the probability of getting at least one item of a particular type A. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to 1 - P(none of the items) B. The complement of "at least one" is "all". So, the probability of getting at least one item is equal to 1 - P(all items) C. The complement of "at least one" is "all". So, the probability of getting at least one item is equal to P(all items) - 1 D. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to P(none of the items) - 1
D. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other
For the given pair of events, classify the two events as independent or dependent Finding that your cell phone works Finding that your DVD player works A. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other B. The two events are independent because the occurrence of one affects the probability of the occurrence of the other C. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other D. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other
Population attributes
Greek letters denote ____________________________, unknown constants.
The buyer of a local hiking club store recommends against buying the new digital altimeters because they vary more than the old altimeters, which had a standard deviation of one yard. Write the null and alternative hypotheses.
H0: Standard deviation ≤ 1 Ha: Standard deviation > 1 (see 7.1 in class practice)
The statement represents a claim. Write its complement and state which is H0 and which is Ha. µ ≤ 0.93
H0: p ≤ 0.93 (claim) Ha: p > 0.93 (see 7.1 in class practice)
People in an experiment who behave differently from how they would normally behave
Hawthorne effect
B. Each trial is independent of other trials if the outcome of one trial does not affect the outcome of any of the other trials
In a binomial experiment, what does it mean to say that each trial is independent of other trials A. Each trial is independent of other trials if no more than one trial occurs at a time B. Each trial is independent of other trials if the outcome of one trial does not affect the outcome of any of the other trials C. Each trial is independent of other trials if the sum of all the possible trial outcomes equals 1 D. Each trial is independent of other trials if the outcome of one trial affects the outcome of another trial
A. No, because the total probability is not equal to 1
Is the probability distribution a discrete distribution A. No, because the total probability is not equal to 1 B. Yes, because the distribution is symmetric C. No, because some of the probabilities have values greater than 1 or less than 0 D. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive
The relative likelihood function
L(theta)/L(thetahat)
A. 67 mph [(18x72)+(30x64)]/48 = 67 (48 = total number of people ticketed)
Last weekend police ticketed 18 men whose mean speed was 72 miles per hour, and 30 women going an average of 64 miles per hour. Overall, what was the mean speed of all the people ticketed? A. 67 mph B. 68 mph C. It cannot be determined D. None of those E. 69 mph
A brewery claims that the mean amount of beer in their bottles is at least 12 ounces. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. Sketch the distribution of the test statistic and label the P-value region.
Left-tailed (see 7.1 in class practice)
C. Drawing one card from a standard deck, not replacing it, and the selecting another card
List an example of two events that are dependent A. Rolling a die twice B. Selecting a ball numbered 1 through 12 from a bin, replacing it, and then selecting a second numbered ball from the bin C. Drawing one card from a standard deck, not replacing it, and the selecting another card D. Tossing a coin and getting a head, and then rolling a 6-sided die and obtaining a 6
B. Rolling a die twice
List an example of two events that are independent A. Not putting money in a parking meter and getting a parking ticket B. Rolling a die twice C. A father having hazel eyes and a daughter having hazel eyes D. Selecting a queen from a standard deck, not replacing it, and then selecting a queen from the deck
If the graph is symmetric...
Mean = median
Which 2 have NO outliers? 1) Mean and SD 2) Mean and IQR 3) IQR and SD 4) Median and IQR
Mean and SD
a + b*xbar
Mean of Y = a + bX with E(X) = xbar
Range
Measure of dispersion calculated as the maximum - minimum value.
Sample correlation coefficient
Measures both the strength and direction of the linear relationship between two random variables.
Example: a dietician obtains the amounts of sugar from 100 centigrade in each of 10 different cereals - 3 24 30 47 43 47 23 44 24 39 Which is the best measure of center? Why?
Median because there's an outlier
What are the 5 things included in a boxplot?
Min Q1 M Q3 Max
To model a normal problem with known mean and known variance, we use ____________________ as the pivotal quantity.
N(0,1)
Relative Risk
P(A|B)/P(A|B') for categorical variables A, B.
Quartiles
Q1, Q2, Q3
Determine whether the data are qualitative or quantitative and determine the level of measurement of the data set (i.e. nominal, ordinal, interval, or ration). The top five music albums for 2012 are listed: 1. Adele "21" 2. Michael Buble "Christmas" 3. Drake "Take Care" 4. Taylor Swift "Red" 5. One Direction "Up All Night"
Qualitative Ordinal because the data can be ranked (see quiz 1)
Ex 1: The table shows sports-related head injuries treated in U.S. emergency rooms during a recent five-year span for several sports. Which data are qualitative data, and which are quantitative data? (Source: BMC Emergency Medicine)
Qualitative: name of sport Quantitative: number of head injuries treated
Quantitative data
Quantitative data consist of numbers that are measurements or counts. Ex: Age, Weight of a letter, Temperature
Best way to prevent treatment groups of individuals from differing each other in ways other than the treatment assigned
Randomized assignment to treatment
The most reliable way to determine whether the explanatory variable is actually causing changes in the response variable
Randomized controlled double blind experiment
Discrete data
Refers to numerical data that can be counted
Continuous data
Refers to numerical data that can be measured
Observational study
Study where data collectors have no control over the conditions of the experiment
Experimental study
Study where data collectors have some control over the conditions of the experiment
Those ppl whom no specific treatment was imposed
Subjects
The results of rolling a 6-sided die 1,000 times is an example of
Symmetric uniform distribution
What do you think is the shape of the distribution of the age at which a child takes it's first steps?
Symmetric-Unimodal
Z/root(Chi-squared(n)/n)
T follows a T-distribution with n df if it is the independent ratio ________________________________.
To model a normal problem with n observations and unknown variance, we use ________________________ as the pivotal quantity.
T(n-1)
D. Breed
The SPCA collects the following data about the dogs they house. Which is categorical? A. Weight B. Number of days housed C. Veterinary costs D. Breed E. Age
E. Timeplot (Timeplots show how data has changed over time, and unlike bar charts, do not have categories)
The SPCA has kept these data records for the past 20 years. If they want to show the trend in the number of dogs they have housed, what kind of plot should they make? A. Bar graph B. Histogram C. Boxplot D. Pie chart E. Timeplot
A. preserves the individual data values
The advantage of making a stem-and-leaf display instead of a dotplot is that a stem-and-leaf display... A. preserves the individual data values B. satisfies the area principle C) A stem-and-leaf display is for quantitative data, while a dotplot shows categorical data D) none of these E) shows the shape of the distribution better than a dotplot
mean
The average of a set of numbers
D. An expected value of 0 means that the average money gained is equal to the average money spent, representing the break-even point
The expected value of an accountants profit an loss analysis is 0. Explain what this means A. Since the expected value cannot be less than 0, an expected value of 0 means that the average money gained is equal to or less than the average money spent B. An expected value cannot be equal to 0 C. An expected value of 0 means that there was not any money gained or spent D. An expected value of 0 means that the average money gained is equal to the average money spent, representing the break-even point
Four Levels of Measurement
The four levels of measurement, in order from lowest to highest, are nominal, ordinal, interval, and ratio
D. Yes. The probability of the event is close to 0
The frequency distribution shows the number of voters (in millions) according to age. consider the event below. Can it be unusual? A voter chosen at random is between 21 and 24 years old A. No. The probability of the event is not close to 0 B. No. The probability of the event is not close to 1 C. Yes. The probability of the event is close to 1 D. Yes. The probability of the event is close to 0
The MLE of g(theta) = g(thetahat)
The invariance property of the MLE says ___________________________________.
Min value >= Q1-1.5IQR
The lower whisker of a box and whisker plot ends at ___________________________.
B. The 50 monthly interest rates have a distribution which is skewed to the right. (Mean is larger than the median)
The mean and the median of a sample of 50 home mortgage monthly interest rates are 6.5 and 5.0 percent, respectively. Comment on the distribution of the 50 home mortgage interest rates. A. The 50 monthly interest rates have a distribution which is skewed to the left. B. The 50 monthly interest rates have a distribution which is skewed to the right. C. A few mortgages are very low, which makes the median smaller than the mean. D. Half of the mortgages are at rates greater than 6.5 percent. E. A few mortgages are at very low rates, pulling the mean up.
C. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10
The probability of winning an instant prize game is 1/10. The offs of winning a different instant prize game are 1:10. If you want the best chance of winning, which game should you play? A. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/9, which is greater than 1/10 B. You should play the second game because the probability of winning a game with 1:10 odds of winning is 1/9, which is greater than 1/10 C. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10 D. You should play the second game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10
Variate
The property of the unit of interest in the study
C. Skewed to the right (Most of the data would be focused in the lower range, with few in the upper, making this skewed to the right)
The salaries of MLB players range from several hundred thousand dollars per year to very few earning in the millions. Suppose a histogram is made of all last year's salaries of major league baseball players. Which shape would best describe the shape of this histogram? A. Skewed to the left B. Bell shaped C. Skewed to the right D. Bimodal
Solve the problem. The mean age of bus drivers in Chicago is 56.9 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is not sufficient evidence to reject the claim = 56.9. There is sufficient evidence to support the claim = 56.9. There is not sufficient evidence to support the claim = 56.9. There is sufficient evidence to reject the claim = 56.9.
There is not sufficient evidence to reject the claim = 56.9.
Solve the problem. The mean age of bus drivers in Chicago is greater than 56.2 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to support the claim μ > 56.2. There is not sufficient evidence to reject the claim μ > 56.2. There is sufficient evidence to reject the claim μ > 56.2. There is not sufficient evidence to support the claim μ > 56.2.
There is not sufficient evidence to support the claim > 56.2.
Solve the problem. The mean score for all NBA games during a particular season was less than 91 points per game. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to reject the claim μ < 91. There is sufficient evidence to support the claim μ < 91. There is not sufficient evidence to reject the claim μ < 91. There is not sufficient evidence to support the claim μ < 91
There is not sufficient evidence to support the claim μ < 91
Solve the problem. The mean IQ of statistics teachers is greater than 130. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to reject the claim μ > 130. There is not sufficient evidence to support the claim μ > 130. There is not sufficient evidence to reject the claim μ > 130. There is sufficient evidence to support the claim μ > 130.
There is not sufficient evidence to support the claim μ > 130.
Solve the problem. The mean age of bus drivers in Chicago is greater than 56.2 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to support the claim μ > 56.2. There is not sufficient evidence to reject the claim μ > 56.2. There is sufficient evidence to reject the claim μ > 56.2. There is not sufficient evidence to support the claim μ > 56.2.
There is not sufficient evidence to support the claim μ > 56.2.
Solve the problem. The mean age of bus drivers in Chicago is 47.4 years. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis? There is sufficient evidence to support the claim μ = 47.4. There is not sufficient evidence to reject the claim μ = 47.4. There is not sufficient evidence to support the claim μ = 47.4. There is sufficient evidence to reject the claim μ = 47.4.
There is sufficient evidence to reject the claim μ = 47.4.
The mean score for all NBA games during a particular season was less than 104 points per game. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis?
There is sufficient evidence to support the claim µ<104 (see 7.1 in class practice)
Solve the problem. The mean IQ of statistics teachers is greater than 160. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis? There is not sufficient evidence to support the claim μ > 160. There is sufficient evidence to support the claim μ > 160. There is not sufficient evidence to reject the claim μ > 160. There is sufficient evidence to reject the claim μ > 160.
There is sufficient evidence to support the claim μ > 160.
Use the 4 conditions of a Binomial Experiment to decide if the following situation represents a Binomial Experiment. If it satisfies all conditions, state the values of p, q, n, and all possible values of the random variable. A survey found that 49% of US household own a dedicated gaming console. Eight US households are randomly selected. The random variable represents the number of US households that own a dedicated console.
This is a binomial experiment because: 1. There is a set number of trials 2. There is a possibility of success or failure 3. The outcomes are independent 4. Random variable x = the number of successes n=8, p=0.49, q=0.51 Possible x values: 0, 1, 2, 3, 4, 5, 6, 7, 8 (see quiz 5)
The log likelihood function
To take derivatives and find the MLE of theta, we typically first find __________________________________.
A researcher claims that 71% of voters favor gun control. Determine whether the hypothesis test for the claim is left-tailed, right-tailed, or two tailed. Sketch the distribution of the test statistic and label the P-value region.
Two-tailed (see 7.1 in class practice)
The mean age of bus drivers in Chicago is 51.3 years. Identify the type I and type II errors for the hypothesis test of this claim.
Type I: rejecting H0: µ=51.3 when µ=51.3 Type II: failing to reject H0: µ=51.3 when µ≠51.3 (see 7.1 in class practice)
The mean IQ of statistics teachers is greater than 130. Identify the type I and type II errors for the hypothesis test of this claim.
Type I: rejecting H0: µ≤130 when µ≤130 Type II: failing to reject H0: µ≤130 when µ>130 (see 7.1 in class practice)
Solve the problem. For a sample of 20 IQ scores the mean score is 105.8. The standard deviation, , is 15. Determine whether a normal distribution or a t-distribution should be used or whether neither of these can be used to construct a confidence interval. Use a t-distribution Use a normal distribution Neither a normal distribution nor a t-distribution can be used.
Use a normal distribution.
D. It would be unusual because the probability of having no HD televisions is less than 0.05
Using the data given below, determine whether it would be unusual for a household to have no HD televisions A. It would not be unusual because 52 people have no HD televisions in the town B. It would not be unusual because the probability of having no HD televisions is more than 0.05 C. It would be unusual because 52 people have no HD televisions in the town D. It would be unusual because the probability of having no HD televisions is less than 0.05
A fast food outlet chain claims that the mean waiting time in line is less than 3.8 minutes. A random sample of 60 customers has a mean of 3.7 minutes with a population standard deviation of 0.6 minutes. If alpha = 0.05, use the P-value to test the fast food outlets claim.
We are testing for µ and standard deviation is known. Sample size, n, is 60 ≥ 30. Use z-test. H0: µ ≥ 3.8 Ha: µ <3.8 (claim) - Left-tailed test alpha = 0.05 z = -1.29 P-value = normalcdf(-1x10^99, -1.29, 0, 1) = 0.0985 Fail to reject H0. There is not enough evidence to support the fast food outlets claim that the mean waiting time is less than 3.8 minutes. (see 7.2 in class practice)
B. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities is 1
What are two conditions that determine a probability distribution A. The probability of each value of the discrete random variable is greater than 0 and less than 1, and the sum of all the probabilities can be any amount B. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities is 1 C. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities can be any amount D. The probability of each value of the discrete random variable is greater than 0 and less than 1, and the sum of all the probabilities is 1
C. The probability of event B occurring, given that event A has occurred
What does the notation P(B|A) mean A. The probability of both event A and event B occurring B. The probability of event A occurring, given that event B has occurred C. The probability of event B occurring, given that event A has occurred D. The probability of event B occurring, divided by the probability of event A occurring
C. A discrete probability distribution lists each possible value a random variable can assume, together with its probability
What is a discrete probability distribution A. A discrete probability distribution lists each possible value a random variable can assume B. A discrete probability distribution exclusively lists probabilities C. A discrete probability distribution lists each possible value a random variable can assume, together with its probability D. None of the above
A. The outcome of a probability experiment is often a count or a measure. When this occurs, the outcome is called a random variable
What is a random variable A. The outcome of a probability experiment is often a count or a measure. When this occurs, the outcome is called a random variable B. A variable is random when it has a finite or countable number of possible outcomes that can be listed C. A variable is random when it has an uncountable number of possible outcomes D. The outcome of a probability experiment is often a category or label. When this occurs, the outcome is called a random variable
An outcome is the result of a single probability experiment. An event is a set of one or more possible outcomes.
What is the difference between an outcome and an event? A. An outcome is the result of a single probability experiment. An event is the set of all possible outcomes B. An event is the result of a single probability experiment. An outcome is the set of all possible events C. An outcome is the result of a single probability experiment. An event is a set of one or more possible outcomes. D. An event is the result of a single probability experiment. An outcome is a set of one or more possible events
C. Two events are independent when the occurrence of one event does not affect the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event affects the probability of the occurrence of the other event
What is the difference between independent and dependent events? A. Two events are independent when the occurrence of one event affects the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event does not affect the probability of the occurrence of the other event B. Two events are independent if only one of the two events can occur. Two events are dependent if they can occur at the same time C. Two events are independent when the occurrence of one event does not affect the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event affects the probability of the occurrence of the other event D. Two events are independent if they occur at the same time. Two events are dependents if only one of the two events can occur.
frequency polygon
a line graph that emphasizes the continuous change in frequencies. x-axis= Midpoint Y-Axis= frequencies *graph must extent and end at zero
Randomization
a process of randomly assigning subjects to different treatment groups
In a sample of 2,016 US adults, 383 said Franklin Roosevelt was the best president since WWII. Two US adults are selected at random without replacement. a. Find the probability that both adults think Roosevelt was the best president. b. Find the probability that neither adult thinks Roosevelt was the best president. c. Find the probability that at least one of the adults thinks Roosevelt was the best president.
a. 0.036 b. 0.656 c. 0.344 (see quiz 4)
A bag of marbles contains 7 red marbles, 5 blue marbles, and 9 green marbles. Two marbles are selected from the bag without replacement. a. What is the probability of getting a red and a blue? b. What is the probability of getting two reds? c. What is the probability of getting no greens? d. What is the probability of getting at least 1 green?
a. 0.083 b. 0.1 c. 0.314 d. 0.686 (test 2 #16)
The table (quiz 4 #4) shows data for students at University of Oklahoma Health Science Center. a. Find the probability that a randomly selected student is male, given that the student is a nursing major. b. Find the probability that the student is male or a nursing major.
a. 0.115 b. 0.533
The distribution of cholesterol levels in teenage boys is approximately normal with a mean of 170 and a standard deviation of 30. a. Find the probability that a teenage boy has a cholesterol level less than 144. b. Find the probability that a teenage boy has a cholesterol level greater than 240.5. c. In a sample of 30 teenage boys, how many would you expect to have a cholesterol level less than 144?
a. 0.1922 b. 0.0094 c. 5.7660 (see test 3 #12)
Jaylen Brown currently has an abysmal free throw percentage of 59%. Assume that making or missing a free throw has no effect on his next free throw. Suppose he takes his 17 free throws in his next game. a. Find the probability that he makes exactly 10 free throws. b. Find the probability that he makes at least 14 free throws. c. Find the probability that he makes less than 14 free throws.
a. 0.1936 b. 0.390 c. 0.961 (see test 3 #5)
Use the table (test 2 #14) to find each probability. a. Probability of selecting a junior. b. Probability of not selecting a junior. c. probability of selecting a student that plays sports, given that the student is a senior. d. Probability that the student is a freshman and plays sports. e. Probability that the student is a freshman or plays sports.
a. 0.239 b. 0.761 c. 0.272 d. 0.107 e. 0.531
Sixty percent of US adults trust national newspapers to present the news fairly and accurately. You randomly select 9 US adults. Let x equal the number of adults from the 9 selected that think newspapers present the news fairly. a. Find P(x=5), the probability that exactly 5 of the 9 adults though news was presented fairly. b. Find the probability that at least 6 of the 9 adults thought the news was presented fairly.
a. 0.2508 b. 0.4826 (see quiz 5)
The table below shows the results of an experiment at which a ball is dropped, x, to the height of its first bounce, y. Drop height (x): 100 90 80 70 60 Bounce height (y): 26 23 21 18 16 a. Calculate the correlation coefficient and interpret your results. b. Give the equation for the line of regression. c. Make a prediction for the bounce height if the ball is dropped from a height of 50 cm or explain why it is not meaningful to do so.
a. 0.998 Strong positive correlation b. y = 0.25x + 0.8 c. Not meaningful because this is outside the original range of data.
State if the random variable is continuous or discrete. a. The number of red Sour Patch Kids in a bag of candy. b. The vertical height of an Olympic pole jumper.
a. Discrete b. Continuous (see test 3 #1-2)
In a survey of US men, the heights in the 20-29 age group were normally distributed with a mean of 69.4 inches and a standard deviation of 2.9 inches. Use your calculator to find the probability that a randomly selected study participant has a height that is a. less than 66 inches b. between 66 and 72 inches c. more than 72 inches
a. normalcdf(-1x10^99, 66, 69.4, 2.9) = 0.1205 b. normalcdf(66, 72, 69.4, 2.9) = 0.6945 c. normalcdf(72, 1x10^99, 69.4, 2.9) = 0.1850 (see quiz 6)
A probability experiment consists of rolling a 6-sided die and spinning a spinner that has four colors; red, blue, green, and yellow. You are equally likely to land on each color. a. create a tree diagram to describe the sample space. b. What is the probability of rolling a number less than 6 on the die and having the spinner land on yellow?
a. see quiz 4 #2 b. 0.208
The caloric contents and sodium contents of 10 hotdogs are listed in the data below. Calories, x: 150 170 120 120 90 180 170 140 90 110 Sodium, y: 420 470 350 360 270 550 530 460 380 330 a. Give the equation for the regression line. Round values to the nearest hundredth. b. Use the regression line to make a prediction for the sodium content of a hotdog with 140 calories or explain why it is not meaningful to make such a prediction. c. Use the regression line to make a prediction for the sodium content of a hotdog with 210 calories or explain why it is not meaningful to make such a prediction.
a. y = 2.47x + 80.81 b. 426.61 mg c. This is not meaningful because 210 is outside of the original data values. (see quiz 3)
SAT writing scores are normally distributed with mean = 448 and standard deviation = 114. a. Use the z-table to determine what percentage of scores were greater than 500. b. If 1000 SAT writing scores are randomly selected, how many of the scores would be greater then 500?
a. z = 0.3228 b. About 323 (see quiz 6)
Find the critical value and rejection region for the type of z-test with level of significance alpha. a. Right-tailed test, alpha = 0.01 b. Left-tailed test, alpha = 0.05 c. Two-tailed test, alpha = 0.01
a. z0 = 2.33 Rejection region: z > 2.33 b. z0 = -1.645 Rejection region: z < -1.645 c. -z0 = -2.575, z0 = 2.575 Rejection region: z < -2.575, z > 2.575 (see 7.2 in class practice)
Represents the distribution of a quantitative variable by visually displaying the 5 number summary and any observations that were classified as a suspected outlier using the 1.5 IQR criterion
boxplot
Solve the problem. Compare the scores: a score of 220 on a test with a mean of 200 and a standard deviation of 21 and a score of 90 on a test with a mean of 80 and a standard deviation of 8. a) A score of 220 with a mean of 200 and a standard deviation of 21 is better. b) The two scores are statistically the same. c) A score of 90 with a mean of 80 and a standard deviation of 8 is better. d) You cannot determine which score is better from the given information.
c) A score of 90 with a mean of 80 and a standard deviation of 8 is better.
Interval Data
can be ordered and differences have meaning ex. temperature scales
Ordinal Data
can be put into order ex. top 10 cities in the US or letter grades
When we are given a histogram of the data (w/o actual data) we _________ (can/cannot) determine the ___________ (mean/median/mode), but only determine what could be a possible value for the ___________ and what values of the __________.
cannot, median, median, median
Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. In California's Pick Three lottery, a person selects a 3-digit number. The probability of winning California's Pick Three lottery is 1/1000. empirical probability classical probability subjective probability
classical probability
Identify the sampling technique used. A researcher for an airline interviews all of the passengers on five randomly selected flights.
cluster
Identify the sampling technique used. A researcher randomly selected 25 of the nation's middle schools and interviewed all of the teachers at each school.
cluster
Solve the problem. A researcher for an airline interviews all of the passengers on five randomly selected flights. What sampling technique is used? systematic random convenience stratified cluster
cluster
Solve the problem. At a local community college, five statistics classes are randomly selected and all of the students from each class are interviewed. What sampling technique is used?
cluster
Qualitative Data
consists of attributes, labels, or nonnumerical entries (quality)
Census
consists of data from an entire population. But, unless a population is small, it is usually impractical to obtain all the population data. In most studies, information must be obtained from a random sample.
Data
consists of information coming from observations, counts, measurements, or responses
Data
consists of information coming from observations, counts, measurements, or responses.
Census
count or measure of an entire population
Sampling
count or measure part of the population
When calculating the median, what's the 2nd step
counting the number of observations
two major branches of stat
descriptive inferential
From past figures, it is predicted that 19% of the registered voters in California will vote in the June primary.
descriptive statistics
Random sample
every member of the population has an equal chance of being selected.
Simple Random Sample
every possible sample of the same size has the same chance of being selected from the population
Decide which method of data collection you would use to collect data for the study. Specify either observational study, experiment, simulation, or survey. A study where a drug was given to 23 patients and a placebo to another group of 23 patients to determine if the drug has an effect on a patient's illness
experiment
Recruit participants; while they're being interviewed, 1/2 sit in a waiting room w/ snacks and TV on. Other 1/2 sit in waiting room with just snacks. Researchers determine whether ppl consume more snacks in the TV setting. This is an example of an
experiment
researchers interfere and they assign the values of the explanatory variables to the individuals
experiment
The independent variable; the variable that claims to explain, predict, or affect the response
explanatory variable
Solve the problem. Given H0: µ ≥18and P = 0.085. Do you reject or fail to reject H0 at the 0.05 level of significance? reject H0 fail to reject H0 not sufficient information to decide
fail to reject H0
Solve the problem. Given H0: μ = 25, Ha: μ ≠ 25, and P = 0.028. Do you reject or fail to reject H0 at the 0.01 level of significance? reject H0 fail to reject H0 not sufficient information to decide
fail to reject H0
T/F Association imply's causation
false
T/F every relationship between 2 quantitative variables has a linear form
false
a graph's general shape
form
We want to explore if the score on a test if affected by the test-takers gender. Which is the: - explanatory variable - response variable
gender - explanatory test score - response
Inferential Statistics
generalize the information learned from the sample to the entire population
blocks
groups of subjects with similar characteristics. A commonly used experimental design is a randomized block design. The experimenter divides the subjects with similar characteristics into blocks, and then, within each block, randomly assigns subjects to treatment groups. Ex: An experimenter who is testing the effects of a new weight loss drink may first divide the subjects into age categories, and then, within each age group, randomly assigned subjects to either the treatment group or the control group
Match the statement with the appropriate letter. a. Nominal b. Ordinal c. Interval d. Ratio e. Observational f. Experiment g. Simulation h. Survey i. Control j. Placebo k. Double Blind l. Selection Bias m. Response Bias i. The level of measurement for qualitative data. ii. A fake treatment given to the control in an experiment. iii. The level of measurement for data where the data can be ordered, but the differences between data has no meaning. iv. A model used to reproduce conditions that would be difficult to observe otherwise. v. Issues that occur as a result of errors in the measurement of data. vi. The level of measurement where difference between data values has meaning, but zero does not mean the absence of value.
i. a) Nominal ii. j) Placebo iii. b) Ordinal iv. g) Simulation v. m) Response Bias vi. c) Interval (see test 1 #5-10)
Solve the problem. Classify the events as dependent or independent. The events of getting two aces when two cards are drawn from a deck of playing cards and the first card is replaced before the second card is drawn. independent dependent
independent
A recent test in a statistics class had a mean score of 78 and a standard deviation of 8.5. Use your calculator to find the score that is needed to be in the top 15%.
invnorm(.85, 78, 8.5) = 86.8097 (see test 3 #13)
Sampling
is a count or measure of part of a population and is more commonly used in statistical studies. To collect unbiased data, a researcher much ensure that the sample is representative of the population. Appropriate sampling techniques must be used to ensure the inferences about the population are valid.
simple random sample
is a sample in which every possible sample of the same size has the same chance of being selected. -Random numbers can be generated by random number table, a software program, or calculator. Assign a number to each member of the population. -Members of the population that correspond to those numbers become members of the sample
among all the lines that look good on your data, choose the one that has the smallest sum of squared vertical deviations
least squares criterion
Classical Probability
long term expected results
For the dot plot below, what is the maximum and what is the minimum entry? max: 14; min: 12 max: 54; min: 12 max: 54; min: 15 max: 17; min: 12
max: 17; min: 12
In a histogram, CENTER refers to the...
median
Parameters
numerical summaries of a population
Observations that fall outside the overall pattern
outliers
correlation is heavily influenced by ________
outliers
Solve the problem. What method of data collection would you use to collect data for a study where a drug was given to 57 patients and a placebo to another group of 57 patients to determine if the drug has an effect on a patient's illness? use sampling use a simulation take a census perform an experiment
perform an experiment
Two types of statistical data sets studied in STAT 231
population sample
Subjective Probability
probability based in intuition, educated guesses, and estimates made by someone who is knowledgeable in the field.
any plan that relies on random selection
probability sampling plan
1. measure of spread and mean is a measure of center 2. the only way its equal to 0 is if all the observations have the same value 3. Its strongly influenced by outliers
properties of standard deviation
the values of the variables of interest are recorded forward in time
prospective
2 types of observational studies
prospective retrospective
Nominal Level of Measurement
qualitative ONLY (no math)
Solve the problem. Classify the number of seats in a movie theater as qualitative data or quantitative data. qualitative data quantitative data
quantitative data
Calculate the correlation coefficient and describe the type of correlation (strong/weak, positive/negative) for the data below. Interpret what the correlation coefficient tells you. Earnings per share, x: 2.79 5.10 4.53 3.06 3.70 2.20 Dividends per share, y: 0.52 2.40 1.46 0.88 1.04 0.22
r = 0.976 Strong positive correlation (see quiz 3)
Identify the sampling technique used. A lobbyist for a major airspace firm assigns a number to each legislator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the legislators corresponding to these numbers.
random
Solve the problem. A lobbyist for a major airspace firm assigns a number to each legislator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the legislators corresponding to these numbers. What sampling technique was used?
random
the technique that specifies the dependence of the response
regression
Replication
reputation of an experiment on more than one subject in order to measure variation within and between groups
the values of the variables are recorded backward in time
retrospective
Solve the problem. A car maker claims that its new sub-compact car gets better than 52 miles per gallon on the highway. Determine whether the hypothesis test for this is left-tailed, right-tailed, or two-tailed. left-tailed two-tailed right-tailed
right-tailed
inferential statistics
s is the branch of statistics that involves using a sample data to draw conclusions about a population. A basic tool in the study of inferential statistics is probability to "infer"
Use a stem and leaf plot to display the data. The data represent the scores of a biology class on a midterm exam. 75, 85, 90, 80, 87, 67, 82, 88, 95, 91, 73, 80, 83, 92, 94, 68, 75, 91, 79, 95, 87, 76, 91, 85
see quiz 2 #3
Below are the NBA MVP winners from the last 10 years. Make a Pareto Chart for the data. 2016-17 Russel Westbrook 2015-16 Stephen Curry 2014-15 Stephen Curry 2013-14 Kevin Durant 2012-13 LeBron James 2011-12 LeBron James 2010-11 Derrick Rose 2009-10 LeBron James 2008-09 LeBron James 2001-08 Kobe Bryan
see test 1 #25
Sample Space
set of all outcomes for a probability experiment
if people are sampled completely at random; doesn't mean it's equally representative
simple random sample
Scenario: Kidney Stones - Treatment A and B - 700 subjects with kidney stones participated - Found that success rate of treatment A was higher than B - 2 groups of patients: (1) those with large stones (2) those with small stones. -Results: treatment B was more effective. This study was an example of ___________ ___________ The _______ _______ is the ________ variable
simpson's paradox, kidney stone, lurking
Here are the number of hours that nine students spend on the computer on a typical day: 1 6 7 5 5 8 11 12 15 --------------------- The median # of hours is 7...why?
since n=9, the median is 9+1/2 = 5th observation in the ordered list, which is 7.
Age of death from natural causes (heart disease, cancer, etc) is an example of
skewed left distribution
Prices of 1,000 California homes is an example of
skewed right distribution
a measure of spread; gives the average between a data point and the mean
standard deviation definition
collecting data
study design
Stratified Sample
subdivide the population into at least two different subgroups so that subjects within the same subgroup share similar characteristics, then select a random sample from each subgroup.
Cluster Sampling
subdivide the population into sections, then randomly select entire clusters and choose all the members from those selected clusters.
Inferential statistics
the branch of statistics that involves using a sample to draw conclusions about a population
Scenario: A psychologist selects a sample of kids ages 6-13 and measures their shoe size and childs vocabulary. What's the lurking variable?
the child's age
Population
the collection of all outcomes, responses, measurements, or counts that are of interest (Ex: everyone in a classroom)
Population
the complete collection of all elements, objects, individuals, or events to be studied
Q1
the median from the beginning to Q2
Independent Events
the occurrence of one event does NOT affect the occurrence of another event
Range of Probabilities Rule
the probability of an event E is between 0 and 1 0 ≤ P(E) ≤ 1
Statistics
the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.
midpoint
the sum of the lower and upper limits of the class divided by two
median
the value that lies in the middle
Solve the problem. A researcher claims that 73% of voters favor gun control. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. right-tailed left-tailed two-tailed
two-tailed
a sample that produces data that represents the population
unbiased sample
Solve the problem. What method of data collection would you use to collect data for a study where a political pollster wishes to determine if his candidate is leading in the polls?
use sampling
if r is close to zero it is a...
weak linear relationship
Solve the problem. Given a sample with r = -0.765, n = 22, and α = 0.02, determine the critical values t0 necessary to test the claim E = 0. ± 2.080 ± 2.831 ± 2.528 ± 1.721
± 2.528
Solve the problem. Given a sample with r = 0.321, n = 30, and = 0.10, determine the critical values t0 necessary to test the claim Ε = 0. ± 1.311 ± 0.683 ± 1.701 ± 2.462
±1.701
Solve the problem. Find the area under the standard normal curve between z = 0 and z = 3. 0.4641 0.9987 0.0010 0.4987
0.4987
Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) Major Frequency Mathematics 216 English 207 Engineering 85 Business 175 Education 215 What is the probability that a randomly selected student with a Master's degree majored in Business, Education or Engineering? Round your answer to three decimal places. 0.334 0.290 0.529 0.471
0.529
Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) major Frequency mathematics 216 English 207 Engineering 79 Business 179 Education 226 What is the probability that a randomly selected student with a Master's degree majored in Business, Education or Engineering? Round your answer to three decimal places. 0.532 0.282 0.468 0.337
0.532
Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. has cc no cc 13 47 22 18 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a sophomore. Round your answer to three decimal places. 0.629 0.450 0.550 0.220
0.550
Solve the problem. A community college student interviews everyone in a statistics class to determine who owns a car. What sampling technique is used?
convenience
where individuals happen to be at the right place at the right time and and place to suit the schedule of the researcher; biased
convenience sample
Solve the problem. Classify the events as dependent or independent. Event A: A red candy is selected from a package with 30 colored candies and eaten. Event B: A blue candy is selected from the same package and eaten. dependent independent
dependent
Solve the problem. Classify the events as dependent or independent. Events A and B where P(A) = 0.8, P(B) = 0.1, and P(A and B) = 0.07 dependent independent
dependent
Randomization
process of randomly assigning subjects to different treatment groups.
What are the 3 study designs
Observational Experiment Sample survey
What are the two types of charts used for a Categorical variable?
Pie chart bar chart
Upper quartile
Data value below which 75% of the data lies
Percentile
Data values which divide data into 100 parts; p% of the data lies below the percentile qp..
Example: recruit participants - ask them to recall, for each hour of the previous day, whether they were watching TV, and what snacks they consumed each hour. Determine if food consumption was higher during the TV times
Retrospective study
Response variable
Dependent variable in a causal problem.
Sampling error
Difference between the sample and study population
Relative frequency equation
Frequency/sum of all frequencies x 100
What are the 3 numerical measures?
Mean Median Mode
Blind study
Test patients are unaware of certain conditions in the experiment
A. 2000
The IQR for the following data is approximately... Minimum: 2000 Q1: 3500 Median: 4500 Q3: 5500 Maximum: 8000 A. 2000 B. 4000 C. 6000 D. 7000 E. None of these
Discrete Data
data that can be counted (0, 1, 2, 3). never has values that are decimals or fractions
example: -0.74 what's the correlation?
moderately strong and linear
When calculating the median, whats the first step
putting the observations in order
Solve the problem. Calculate the correlation coefficient, r, for the data below. x: -10 -8 -1 -4 -6 -7 -5 -3 -2 -9 y: 2 -3 -15 -10 -0.885 -0.778 -0.995 -0.671
-0.995
Solve the problem. Find the critical value for a left-tailed test with α = 0.025 and n = 50. -2.575 -1.96 -1.645 -2.33
-1.96
Solve the problem. Construct a 95% prediction interval for y given x = -3.5, = 2.097x - 0.552 and se = 0.976. x: -5 -3 4 1-1 -2 0 2 3-4 Y: -10 -8 9 1 -2 -6 -1 3 6 -8 -3.187 < y < -2.154 -10.367 < y < -5.417 -4.598 < y < -1.986 -12.142 < y < -6.475
-10.367 < y < -5.417
Solve the problem. Construct a 95% prediction interval for y given x = -3.5, y = 2.097x - 0.552 and se = 0.976. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: -10 -8 9 1 -2 -6 -1 3 6 -8 -3.187 < y < -2.154 -10.367 < y < -5.417 -4.598 < y < -1.986 -12.142 < y < -6.475
-10.367 < y < -5.417
You wish to test the claim that μ = 940 at a level of significance of α = 0.01 and are given sample statistics n = 35, x = 910 and s = 82. Compute the value of the standardized test statistic. Round your answer to two decimal places. -3.82 -2.16 -5.18 -4.67
-2.16
Solve the problem. Given a sample with r = -0.541, n = 20, and α = 0.01, determine the standardized test statistic t necessary to test the claim Ε = 0. Round answers to three decimal places. -5.132 -3.251 -4.671 -2.729
-2.729
skewed left
-the tail is on the left side
Solve the problem. A coffee machine dispenses normally distributed amounts of coffee with a mean of 12 ounces and a standard deviation of 0.2 ounce. If a sample of 9 cups is selected, find the probability that the mean of the sample will be greater than 12.1 ounces. 0.9332 0.2123 0.0668 0.3216
0.0668
Solve the problem. The lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 15 days. Find the probability of a pregnancy lasting less than 250 days. 0.1151 0.0066 0.1591 0.0606
0.1151
Of the cartons produced by a company, 5% have a puncture, 8% have a smashed corner, and 0.4% have both a puncture and a smashed corner. Find the probability that a randomly selected carton has a puncture or has a smashed corner.
0.126 (see quiz 4)
Solve the problem. A survey of 100 fatal accidents showed that 13 were alcohol related. Find a point estimate for p, the population proportion of accidents that were alcohol related. 0.87 0.149 0.13 0.115
0.13
Solve the problem. A survey of 2650 golfers showed that 392 of them are left-handed. Find a point estimate for p, the population proportion of golfers that are left-handed. 0.129 0.174 0.852 0.148
0.148
Solve the problem. A random sample of 150 students has a grade point average with a standard deviation of 0.78. Find the margin of error if c = 0.98. 0.12 0.08 0.11 0.15
0.15
Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. has cc no cc 24 36 37 3 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a freshman. Round your answer to three decimal places. 0.400 0.240 0.600 0.393
0.400
Use the z-table (quiz 5 #3) to find the shaded area.
0.4878
Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. class cc carrier no cc Freshman 13 47 Sophomore 22 18 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a sophomore. Round your answer to three decimal places. 0.550 0.450 0.220 0.629
0.550
Solve the problem. Find the area of the indicated region under the standard normal curve. 0.309 0.6562 1.309 0.3438
0.6562
Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table class has cc no cc freshman 11 49 sophomore 27 13 0.289 0.270 0.711 0.980
0.711
Solve the problem. Find the area under the standard normal curve to the right of z = -1.25. 0.7193 0.6978 0.5843 0.8944
0.8944
Solve the problem. Find the standardized test statistic t for a sample with n = 15, x = 5.4000001, s = 0.8, and α = 0.05 if H0: μ ≤ 5.0999999. Round your answer to three decimal places. 1.728 1.452 1.631 1.312
1.452
Solve the problem. A researcher found a significant relationship between a person's age, x1, the number of hours a person works per week, x2, and the number of accidents, y, the person has per year. The relationship can be represented by the multiple regression equation y = -3.2 + 0.012x1 + 0.23x2. Predict the number of accidents per year (to the nearest whole number) for a person whose age is 37 and who works 54 hours per week. 10 11 9 12
10
Solve the problem. The access code to a house's security system consists of five digits. How many different codes are available if each digit can be repeated? 3125 5 100,000 32
100,000
Solve the problem. Use Bayes' theorem to solve this problem. A storeowner purchases stereos from two companies. From Company A, 550 stereos are purchased and 1% are found to be defective. From Company B, 850 stereos are purchased and 6% are found to be defective. Given that a stereo is defective, find the probability that it came from Company A. 66/113 11/113 102/113 17/113
11/113
Solve the problem. The Environmental Protection Agency must visit nine factories for complaints of air pollution. In how many different ways can a representative visit five of these to investigate this week? 45 362,880 15,120 5
15,120
Solve the problem. For the following data, approximate the mean number of phone calls per day. 8-11 31 12-15 34 16-19 28 20-23 30 24-27 6 16 15 14 26 17
16
B. 175 219.5 299 350 549 (Five-number summary consists of the minimum, 1st Quartile, Median, 3rd Quartile, and maximum)
175 199 205 234 259 275 299 304 317 345 355 384 549 What is the five-number summary? A. 175 234 299 345 549 B. 175 219.5 299 350 549 C. 175 219.5 299 350 384 D. 175 234 299 331 549
C. One outlier: 549
175 199 205 234 259 275 299 304 317 345 355 384 549 IQR = 111 (?) Which of the following is true? A. No outliers present B. One outlier: 175 C. One outlier: 549 D. Two outliers: 175 and 549
bimodal
2 numbers are most occurring
Solve the problem. A card is drawn from a standard deck of 52 playing cards. Find the probability that the card is an ace or a king. 4/13 2/13 8/13 1/13
2/13
Solve the problem. The grade point averages for 10 students are listed below. Find the range of the data. 2.0 3.2 1.8 2.9 .9 4.0 3.3 2.9 3.6 .8 2.8 1.4 2.45 3.2
3.2
Provide an appropriate response. The mean SAT verbal score is 478, with a standard deviation of 98. Use the Empirical Rule to determine what percent of the scores lie between 380 and 478. (Assume the data set has a bell-shaped distribution.) 34% 47.5% 68% 49.9%
34%
Solve the problem. SAT verbal scores are normally distributed with a mean of 426 and a standard deviation of 94. Use the Empirical Rule to determine what percent of the scores lie between 332 and 426. 49.9% 34% 68% 47.5%
34%
Solve the problem. A random sample of 40 students has a test score average with a standard deviation of 11.7. Find the margin of error if c = 0.98. 1.81 1.85 4.31 0.68
4.31
Provide an appropriate response. A teacher gives a 20-point quiz to 10 students. The scores are listed below. What percentile corresponds to the score of 12? 80 8 10 7 15 16 12 19 14 9 13 12 25 40
40
Provide an appropriate response. The lengths of phone calls from one household (in minutes) were 2, 4, 6, 7, and 8 minutes. Find the midrange for this data. 2 minutes 5 minutes 6 minutes 10 minutes
5 min
Solve the problem. For the following data set, approximate the sample standard deviation of phone calls per day. 8-11 18 12-15 23 16-19 38 20-23 47 24-27 32 2.9 18.8 3.2 5.1
5.1
Solve the problem. The scores of the top ten finishers in a recent LPGA Valley of the Stars Tournament are listed below. (Source: Los Angeles Times) 71 67 67 72 73 68 72 72 Find the mode score. 67 73 72 76
72
Solve the problem. IQ test scores are normally distributed with a mean of 100 and a standard deviation of 15. Find the x-score that corresponds to a z-score of -1.645. 82.3 75.3 91.0 79.1
75.3
Solve the problem. Use the ogive below to approximate the number in the sample. 28 100 80 341
80
Solve the problem. Given H0: p = 0.85 and α = 0.10, which level of confidence should you use to test the claim? 80% 95% 99% 90%
90%
For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 2 SD of the observation
95%
Confounding Variable
A confounding variable occurs when an experimenter cannot tell the difference between the effects of different factors on the variable. Ex: To attract more customers, a coffee shop owner experiments by remodeling the shop using bright colors. At the same time, a shopping mall nearby has its grand opening. If business increases, it cannot be determined whether it was from the remodel or the opening of the shopping mall.
B. $14 (12+15+17+22+14)/5 = 16
A consumer group surveyed the prices for white cotton extra-long twin sheet sets in five different department stores and reported the average price as $16. We visited four of the five stores, and found the prices to be $12, $15, $17, and $22. Assuming that the consumer group is correct, what is the price of the item at the store that we did not visit? A. $10 B. $14 C. $15 D. $17
Left-skewed data
A data set where mean << median
E. B and C
A study is conducted on students taking a statistic class. Several variables are recorded in the survey. Which variables are quantitative? A. Type of car the student owns B. Number of credit hours taken during that semester C. The time the student waited in line at the bookstore to pay for his/her textbooks D. Home state of the student E. B and C
event
A subset of a sample space may consist of 1 or more outcomes
Survey
A survey is an investigation of one or more characteristics of a population. Most often, surveys are carried out on people by asking them questions. The most common types of surveys are done by interview, Internet, phone, or mail. In designing a survey, it is important to word the questions so that they do not lead to biased results, which are not representative of a population. Ex: A survey is conducted on a sample of female physicians to determine whether the primary reason for their career choice is financial stability.
Higher kurtosis than normal
An S-shaped Q-Q plot shows the data has ___________________________________.
Wider
As the confidence coefficient of a confidence interval increases the intervals get _______________.
A candidate for governor of a particular state claims to be favored by at least half of the voters. State this claim mathematically. Write the null and alternative hypotheses. Identify which hypothesis is the claim.
Claim: p≥0.5 H0: p≥0.5 Ha: p<0.5 Claim is H0 (see 7.1 in class practice)
The dean of a major university claims that mean time for students to earn a Master's degree is at most is 4.2 years. Write the null and alternative hypotheses. Identify which hypothesis is the claim.
Claim: µ≤4.2 H0: p≤4.2 Ha: µ>4.2 Claim is H0 (see 7.1 in class practice)
Measures of dispersion
Determine how variable the data is
Measures of location
Determine where the centre of the data is
These events ARE mutually exclusive, since it IS NOT possible for a voter to both have legally voted for the president in south carolina and HAVE legally voted for the president in texas
Determine whether the following events are mutually exclusive Event A. Randomly select a voter who legally voted for the president in south carolina Event B. Randomly select a voter who legally voted for the president in texas
A. -1.5, because probability values cannot be less than 0 D. 64/25, because probability values cannot be greater than 1
Determine which numbers could not be used to represent the probability of an event (2 answers) A. -1.5, because probability values cannot be less than 0 B. 320/1058, because probability values cannot be in fraction form C. 0.0002, because probability values must be rounded to two decimal places D. 64/25, because probability values cannot be greater than 1 E. 33.3%, this is because probability values cannot be greater than 1 F. 0, because probability values must be greater than 0
Skewness
Determines if the data is symmetric
Study error
Difference between the target population and study population
If neither the subjects nor the researchers know who was assigned what treatment
Double blind
Unit
Each member of a study population
The P-value for a hypothesis test is P = 0.034. Do you reject or fail to reject H0 when the level of significance is alpha=0.01
Fail to reject because P > alpha (0.034 > 0.01) (see 7.2 in class practice)
B. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive
Is the distribution a discrete probability distribution A. No, because some of the probabilities have values greater than 1 or less than 0 B. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive C. No, because the total probability is not equal to 1 D. Yes, because the distribution is symmetric
difference between parameter and statistic
It is important to note that a sample statistic can differ from sample to sample, whereas a population parameter is constant for a population.
Has no modes, no value around which the observations are concentrated.
Uniform
Suppose P(A)=0.3, P(B)=0.4, and P(A and B)=0.13 a. State if A and B are independent or dependent events explain why. b. State if A and B are mutually exclusive events or not and explain why.
a. Dependent because 0.3 x 0.4 ≠ .13 so the outcome of event A influences the outcome of event B b. No because 0.13 is not 0, so there is a slight chance it could be both A and B (see test 2 #15)
Empirical Probability
calculated from the results of an experiment
symmetric
can be divided and create 2 symmetrical sides
Solve the problem. From past figures, it is predicted that 43% of the registered voters in California will vote in the June primary. Does this statement describe: descriptive statistics? inferential statistics?
inferential statistics?
Solve the problem. The chances of winning the California Lottery are one chance in twenty-two million. Does this statement describe: inferential statistics? Descriptive statistics?
inferential statistics?
measures the variability of a distribution by giving us the range covered by the middle 50% of the data
inter-quartile range
Solve the problem. Identify the level of measurement for data that are the temperature of 90 refrigerators.
interval
Given H0: U ≤ 25 and Ha: μ > 25, determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed. left-tailed right-tailed two-tailed
right-tailed
Round-Off Rule
round off to one or more decimal place than occurs in the values of the variables
example: r = 0.931 what's the correlation?
strong and linear
Identify the sampling technique used. Every fifth person boarding a plane is searched thoroughly.
systematic
may not be subject to any clear bias, but it wouldn't be safe as taking a random sample
systematic sampling
"N", when calculating the median, refers to
the number of observations
Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. Find the equation of the regression line for the given data. # of absences x: 0 3 6 4 9 2 15 8 5 Final grade: 98 86 80 4 82 7192 55 76 82 y= 96.14x - 2.75 y= -96.14x + 2.75 y= -2.75x + 96.14 y= -2.75x - 96.14
y= -2.75x + 96.14
Population
μ
Solve the problem. A coin is tossed. Find the probability that the result is heads. 0.9 1 0.5 0.1
0.5
U-shaped
The Q-Q plot for an exponential distribution is _______________________________.
In a split stemplot, the first leaf holds numbers ________ and the second leaf holds numbers _________
0-4, 5-9
Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing an ace? 1/2 1/4 1/52 1/13
1/13
Mean Average Deviation
1/n*(sum of absolute differences from the mean)
Data measured over time
A run chart is a good way to record ____________________________.
B. Continuous, because distance is a random variable that is uncountable
Decide whether the graph represents a discrete random variable or a continuous random variable Distance a baseball travels after being hit A. Discrete, because distance is a random variable that is countable B. Continuous, because distance is a random variable that is uncountable
Interquartile Range (IQR)
IQR=Q3-Q1
Kurtosis
Indicates the frequency of extreme observations in the data
K=3
Kurtosis of a normal random variable
Left-skewed
Longer left tail
Right-skewed
Longer right tail
What is the formula for calculating the median if its an ODD NUMBER
N+1/2
n, 2n
The mean and variance of a Chi-squared(n) distribution are ______________ and __________________.
Median value
The middle line on a boxplot represents the _____________________________ of the data.
Class with the highest jump to the next class
The mode can be found from an ECDF graph by choosing the _______________________________________.
Study population
The set of elements from which a sample is actually selected
Provide an appropriate response. In a random sample, 10 students were asked to compute the distance they travel one way to school to the nearest tenth of a mile. The data is listed below. a) If a constant value k is added to each value, how will the standard deviation be affected? b) If each value is multiplied by a constant k, how will the standard deviation be affected? 1.1 5.2 3.6 5.0 4.8 1.8 2.2 5.2 1.5 0.8
The standard deviation will not be affected.
Stratified sampling
The type of sampling where researchers divide the population into subgroups, then randomly select a proportional number of individuals from those groups is called...
Max value <= Q3+1.5IQR
The upper whisker of a box and whisker plot ends at __________________________.
The MLE of theta
The value of theta that maximizes the likelihood function of theta given the data is called ______________________.
Scatter plot
This graph can be used to check for a sample correlation by looking at the linear pattern of dots.
Placebo Effect
This occurs when a subject reacts favorably to a placebo when in fact the subject has been given a fake treatment.
Estimation problem
This type of problem involves guessing the most likely value of a variable from the data.
Three Key Elements of a Well-Designment Experiment
Three key elements of a well-designed experiment are control, randomization, and replication. Because experiments can be ruined by a variety of factors, being able to control these influential factors is important
b^2*(s_x)^2
Variance of Y = a + bX with Var(X) = (s_x)^2
Stratified Sampling
When it is important for the sample to have members from each segment of the population, you should use stratified sampling. Depending on the focus of the study, members of the same population are divided into two or more subsets, called strata, that share a similar characteristic. A sample is then randomly selected from each of the strata. Using a stratified sample ensures that each segment of the population is represented. Ex: To collect a stratified sample of the number of people who live in Calcasieu Parish households, you could divide the households into socioeconomic levels then randomly select households from each level.
A. II and III
Which is true of the data shown in the histogram? (Histogram is skewed left) I. The distribution is skewed to the right II. The mean is probably smaller than the median III. We should use the median and IQR to summarize these data
D. II and IV
Which of the following is a basic experimental principle? I. Including men and women in the experiment II. Randomization III. Having at least three treatments IV. Replication
The explanatory variable goes on the ____ axis
X
The response variable goes on the _____ axis
Y
the slope and intercept of the least square regression line are found using this equation (2/2)
a = y - bX
Variable
a characteristic of interest
Statistic
a numerical description of a sample characteristic
relative frequency
a portion or percent of the data that falls in the class to find the relative frequency, divide f (frequency) by n (sample size)
frequency distribution
a table that shows classes or intervals of data entries with a count of the number of entries in each class the frequency (f) of a class is the # of data entries
Single Blinding
a technique in which the subject doesn't know whether he or she is receiving a treatment or a placebo
Unbiased Samples
allow each subject in the population an equal chance of being selected
Probability Experiment
an action, or trial, through which specific results (counts, measurements, or responses) are obtained
Solve the problem. Find the range of the data set represented by the graph. 20 15 10 5 0 1 2 3 4 5 6 7 a) 20 b) 17 c) 6 d) 5
c) 6
Solve the problem. Based on previous clients, a marriage counselor concludes that the majority of marriages that begin with cohabitation before marriage will result in divorce. Does this statement describe inferential statistics or descriptive statistics? inferential statistics descriptive statistics
inferential statistics
descriptive statistics
is the branch of statistics that involves the organization, summarization, and display of data. Ex: Tables, charts, averages
replication
is the repetition of an experiment under the same or similar conditions.
parameter
numerical description of a population characteristic. Ex: Average age of all people in the United States
an observation is considered a suspected outlier if it is:
less than Q1 - 1.5 (IQR) more than Q3 + 1.5 (IQR)
Skewed Left Distribution
longer left tail
Skewed Right
longer right tail
Ratio
same as interval except zero is a value that has meaning ex. distance
not all relationships can be classified as positive or negative
neither positive or negative
Double Binding
neither subjects nor researchers know who is receiving a treatment and who is receiving a placebo
Recruit participants for a study. Give them journals to record hour by hour their activities for the following day, including when they watch TV and when they consume snacks. Determine if snack consumption is higher during TV times. This is an example of an _______ ________
observational study
the values of the variables of interest are recorded as they naturally occur
observational study
Identify the data set's level of measurement. manuscripts rated "acceptable" or "unacceptable"
ordinal
Identify the data set's level of measurement. the final grades (A, B, C, D, and F) for students in a statistics class
ordinal
Identify the data set's level of measurement. the ratings of a movie ranging from "poor" to "good" to "excellent"
ordinal
Solve the problem. Identify the level of measurement for data that are the numbers on the shirts of a girl's soccer team. ratio ordinal nominal interval
ordinal
In a histogram, the SPREAD refers to the...
range
Event
set of outcomes for a statistical experiment
Intersecting Events
share at least one outcome; overlapping events
Decide which method of data collection you would use to collect data for the study. Specify either observational study, experiment, simulation, or survey. A study where you would like to determine the chance getting three girls in a family of three children
simulation
Outcome
the result of a single trial in a probability experiment
skewed right
the tail is on the right side
Classical (theoretical) probability
used when each outcome in a sample space is equally likely to occur P(E)= # of outcomes in E/ total sample space
cluster
uses clusters (naturally occurring sub groups)
stratified
uses strata (a shared characteristic)
Data
values (observations) the variable can assume
where individual's have selected themselves to be included; guranteed to be biased
volunteer sample
sample size
which is the number of subjects in a study, is another important part of experimental design.
sample
x bar
Use the z table to find the z score which has an area closest to 0.3 to the left under the normal curve. Find the corresponding x value for a normal distribution with a mean of 28 and a standard deviation of 3.4.
x=26.2150 (see test 3 #14)
equation of a straight line
y = a + bx
Solve the problem. Find the equation of the regression line for the given data. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 y= -1.885x + 0.758 y= -0.758x - 1.885 y= 0.758x + 1.885 y= 1.885x - 0.758
y= -1.885x + 0.758
Solve the problem. Seven guests are invited for dinner. How many ways can they be seated at a dinner table if the table is straight with seats only on one side? 4 720 40,320 5040
5040
Solve the problem. A single six-sided die is rolled. Find the probability of rolling a number less than 3. 0.25 0.333 0.5 0.1
0.333
For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 1 SD of the observation
68%
Solve the problem. Assume that blood pressure readings are normally distributed with a mean of 116 and a standard deviation of 4.8. If 36 people are randomly selected, find the probability that their mean blood pressure will be less than 118. 0.8615 0.9938 0.0062 0.8819
0.9938
Solve the problem. For the following data, approximate the mean number of phone calls per day. 8-11 48 12-15 16 16-19 42 20-23 34 24-27 45 18 16 37 19 17
18
Solve the problem. For the following data, approximate the mean number of phone calls per day. phone calls (per day) Freque. 8-11 48 12-15 16 16-19 42 20-23 34 24-27 45 16 37 19 17 18
18
S-shaped
The Q-Q plot for a uniform distribution is ___________________________.
Solve the problem. Find the critical value for a two-tailed test with α = 0.01 and n = 30. ±1.96 ±2.575 ±2.33 ±1.645
±2.575
Solve the problem. A student receives test scores of 62, 83, and 91. The student's final exam score is 88 and homework score is 76. Each test is worth 20% of the final grade, the final exam is 25% of the final grade, and the homework grade is 15% of the final grade. What is the student's mean score in the class? 90.6 76.6 80.6 85.6
80.6
Solve the problem. SAT verbal scores are normally distributed with a mean of 450 and a standard deviation of 100. Use the Empirical Rule to determine what percent of the scores lie between 250 and 550. 83.9% 68% 34% 81.5%
81.5%
Example: researchers want to determine if ppl tend to snack more while they watch TV.
Prospective example
In regards to the "4 possibilities for role-type classifications" what would Time and driving test outcome be?
Q --> C
In regards to the "4 possibilities for role-type classifications" what would SAT score and GPA of freshman be?
Q --> Q
In interquartile range, finding the median of the lower 50% is finding...
Q1
IQR
Q3 - Q1
How do you find the IQR?
Q3-Q1
T/F: in a randomized controlled experiment, we can draw causal conclusions
True
assesses the strength of a linear relationship; denoted by "r"
correlation coefficient
Continuous Data
data that can be measured, includes fractions and decimals
Nominal Data
data that can be placed into categories but cannot be ordered or ranked ex. favorite breakfast cereal
Qualitative Data
data that can be replaced into categories based on some characteristic or quality
Quantitative Data
data that is numerical in nature
Frequency Histogram
A bar graph that represents the frequency distribution of a data set. (Bars must touch) x-axis= class boundaries y-axis= frequencies
Q-Q plot
Plots of the data's quantiles against the quantiles of the N(0,1) distribution
Sample
Portion of a population used to make predictions about its properties
Cholesterol levels of 1,000 adults - we'd expect cholesterol levels of adults to consist of a few low numbers and a few very high #'s, with most in the middle
Unimodal graph
I and II
Which of the following summaries are changed by adding a constant to each data value? I. The mean II. The median III. The standard deviation
Solve the problem. The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Calculate the correlation coefficient, r. Age x: 4144 48 51 54 56 60 64 68 Pressure, y: 111 115 118 126 137 140 143 145 147. 0.908 0.890 0.960 0.998
0.960
Solve the problem. Calculate the correlation coefficient, r, for the data below. x -9 -7 0 -3 -5 -6 -4 -2 -1 -8 y -2 0 17 9 6 2 7 11 14 0 0.990 0.819 0.792 0.881
0.990
Solve the problem. The table lists the smoking habits of a group of college students. no yes Heavy man 135 41 5 woman 187 21 5 If a student is chosen at random, find the probability of getting someone who is a man or a woman. Round your answer to three decimal places. 0.918 0.197 0.803 1
1
Ex 2: Decide whether each number describes a population parameter or a sample statistic. 1. A survey of several hundred collegiate student-athletes in the United States found that, during the season of their sport, the average time spent on athletics by student-athletes is 50 hours per week. (Source: Penn Schoen Berland) The freshman class at a university has an average SAT math score of 514 In a random check of several hundred retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature.
1 . The average of 50 hours is based on a subset of the population, so it is a sample statistic 2. The average SAT math score of 514 is based on the entire freshman at this particular university, so it is a population parameter 3. The 34% of stores not storing fish at the proper temperature is based on a subset of the population, so it is a sample statistic
Ex 3: For each study, identify the population and the sample. Then determine which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?
1. A study of 2560 U.S. adults found that of adults not using the Internet, 23% are from households earning less than $30,000 annually, as shown in the figure pop: responses of us adults sample: responses of 2560 us adults in the study descriptive: 23% are from households earning less than 30k annually inference: lower-income cannot afford internet typically and the households are less likely to have internet 2. A study of 300 Wall Street analysts found that the percentage who incorrectly forecasted high-tech earnings in a recent year was 44%. pop: high tech earning forecast of all wall street analysts sample: forecast of the 300 wall street analyst in the study descriptive: the percentage who incorrectly forecasted high-tech earnings in a recent year was 44% inference: the stock market is difficult to estimate even for professionals
Designing a Statistical Study
1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. 3. Collect data. 4. Describe the data, using the descriptive statistics techniques. 5. Interpret the data and make decisions about the population using inferential statistics. 6. Identify any possible errors.
Use the four conditions of a Binomial Experiment to determine if the situation represents a Binomial experiment. Specifically state why it does or does not meet each of the four conditions. Selecting 5 cards, one at a time without replacement, from a standard deck of cards. The random variable is the number of red cards obtained.
1. Set number of trials (5) 2. Success or failure (success = red cards) 3. x = number of successes (number of red cards) 4. Independent - This is not binomial because without replacement the probability changes and the chance of success depends on the cards already chosen (see test 3 #3)
Solve the problem. Identify the class width used in the frequency distribution. Miles (per day) Frequency 1 - 6 28 7 - 12 21 13 - 18 8 19 - 24 11 7 6 5 12
6
The distribution of ages for the winners of the Tour de France from 1903 to 2012 is approximately bell shaped. The mean age is 28.1 years with a standard deviation of 3.4 years. Find the z score for the age of Bradley Wiggins who won in 2012 at the age of 32.
1.15 (see quiz 3)
Solve the problem. The heights (in inches) of 10 adult males are listed below. Find the sample standard deviation. 70 72 71 70 69 73 69 68 70 71 1.49 2.38 3 70
1.49
Solve the problem. Find the standardized test statistic t for a sample with n = 25, x = 21, s = 3, and α = 0.005 if Ha: μ > 20. Round your answer to three decimal places. 1.997 1.239 1.667 1.452
1.667
Solve the problem. You wish to test the claim that μ > 23 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 23.3000002, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 3.11 0.98 2.31 1.77
1.77
Solve the problem. You wish to test the claim that μ > 6 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 6.3, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 2.31 1.77 3.11 0.98
1.77
you wish to test the claim that μ > 6 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 6.3, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 2.31 1.77 3.11 0.98
1.77
Solve the problem. Given a sample with r = 0.321, n = 30, and α = 0.10, determine the standardized test statistic t necessary to test the claim ρ = 0. Round answers to three decimal places. 1.793 2.561 3.198 2.354
1.793
Find the standardized test statistic t for a sample with n = 12, x = 18.2000, s = 2.2, and α = 0.01 if H0: μ = 17. Round your answer to three decimal places. 1.890 2.001 1.991 2.132
1.890
The cholesterol levels (in milligrams per deciliter) of 30 adults are listed below. Find Q1. 154 156 165 165 170 171 172 180 184 185 189 189 190 192 195 198 198 200 200 200 205 205 211 215 220 220 225 238 255 265
180
Solve the problem. A sample of candies have weights that vary from 2.35 grams to 4.75 grams. Use this information to find the upper and lower limits of the first class if you wish to construct a frequency distribution with 12 classes. 2.35- 2.75 2.35- 2.65 2.35- 2.54 2.35- 2.55
2.35- 2.54
Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Find the standard error of estimate, se, given that y = 5.044x + 56.11. hrs x: 3 5 2 8 2 4 4 5 6 3 scores y: 65 80 60 88 66 78 85 90 90 71 9.875 7.913 8.912 6.305
6.305
Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if C must sit to the right of but not necessarily next to B? 48 20 60 24
60
Provide an appropriate response. The test scores of 30 students are listed below. Find P30. 31 41 45 48 52 55 56 63 65 67 67 69 70 70 74 75 78 79 79 80 81 83 85 85 87 90 92 95 99 67 56 90 63
63
Event C has 5 outcomes NO, because event C has MORE THAN one outcome
A computer is used to select randomly a number between 1 and 9 , inclusive. Event C is selecting a number less than 6 Event C has __ outcome(s) Is the event a simple event?
Symmetric data
A data set where the mean and median are approximately equal.
D. All homes in the posh California neighborhood D. The sales price of a home
A real estate broker wishes to estimate the average sales price of homes in a posh California neighborhood. To do so, she samples 50 recently sold homes in the neighborhood and finds the average sales price of the 50 homes to be $370,000. The population of interest to the broker is... A. The 50 recently sold homes B. All homes in the U.S. C. All homes in California D. All homes in the posh California neighborhood E. All homes with a sales price of $350,000 or more The variable of interest is... A. The size of a home B. The average age of the home C. The number of homes in the neighborhood D. The sales price of a home
B. Has one factor (shampoo type) blocked by gender and whether hair is dyed
A researcher wants to compare the effect of a new type of shampoo on hair condition. The researcher believes that men and women may react to the shampoo differently. Additionally, the researcher believes that the shampoo will react differently on hair that is dyed. The subjects are split into four groups: men who dye their hair; men who do not dye their hair; women who dye their hair; women who do not dye their hair. Subjects in each group are randomly assigned to the new shampoo and the old shampoo. This experiment... A. Has two factors (shampoo type and whether hair is dyed) blocked by gender B. Has one factor (shampoo type) blocked by gender and whether hair is dyed C. Has three factors (shampoo type, gender, whether hair is dyed) D. Is completely randomized E. Has two factors (gender and whether hair is dyed) blocked by shampoo type
No evidence of a linear relationship
A value of rxy close to 0 indicates ____________.
Stronger evidence of a linear relationship
A value of rxy closer to 1 indicates ____________.
A. Material for insulators B. 2 - Baking temperature and cooling method C. 8 D. Likeliness to break during adverse weather
Ceramics engineers are testing a new formulation for the material used to make insulators for power lines. They will try baking the insulators at four different temperatures, followed by either slow or rapid cooling. They want to try every combination of the baking and cooling options to see which produces insulators least likely to break during adverse weather conditions A. What are the experimental units? B. How many factors are there? C. How many treatments are there? D. What is the response variable?
Causation
Ceteris paribus a change in the explanatory variable results in a change in the response variable.
Alternate hypothesis
Challenging hypothesis to H0; if we get enough evidence we may reject H0 and accept this alternate hypothesis.
Use the minimum and maximum data entries and the number of classes to determine the class width, lower class limits, and upper class limits: min = 9 max = 64 number of classes = 7
Class width: (64-9)/7 ≈ 8 Lower limits: 9, 17, 25, 33, 41, 49, 57 Upper limits: 16, 24, 32, 40, 48, 56, 64 (see quiz 2)
Data at the nominal (name) level of measurement
Data at the nominal level of measurement are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level.
Data at the ordinal (order/rank) level of measurement
Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful
Lower quartile
Data value below which 25% of the data lies
A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other
For the given pair of events, classify the two events as independent or dependent Randomly selecting a consumer from California Randomly selecting a consumer who owns a television A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other B. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other C. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other D. The two events are independent because the occurrence of affects the probability of the occurrence of the other
A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other
For the given pair of events, classify the two events as independent or dependent Randomly selecting a fan at the Super Bowl Randomly selecting a football player at the Super Bowl A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other B. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other C. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other D. The two events are independent because the occurrence of affects the probability of the occurrence of the other
The statement represents a claim. Write its complement and state which is H0 and which is Ha. µ = 8.3
H0: µ = 8.3 (claim) Ha: µ ≠ 8.3 (see 7.1 in class practice)
The mean age of bus drivers in Chicago is 48.6 years. Write the null and alternative hypotheses.
H0: µ=48.6 Ha: µ≠48.6 (see 7.1 in class practice)
The mean IQ of statistics teachers is greater than 160. Write the null and alternative hypotheses.
H0: µ≤160 Ha: µ>160 (see 7.1 in class practice)
Breaking the range of values into intervals and count how many observations fall into each interval
Histogram
What are the two types of charts used for a Quantitative variable?
Histogram Stemplot
fundamental counting principle
If one event can occur in m ways and a second can occur in n ways, the number of ways the 2 events can occur in sequence is mxn
Relative frequency
In a density histogram the area of the rectangle for each class is the ____________________________________.
Experiment and its processes
In an experiment, a researcher deliberately applies a treatment before observing the responses. A treatment is applied to part of the population, called a treatment group, and responses are observed. Another part of the population may be used as a control group, in which no treatment is applied. The subjects in both groups are called experimental units. In many cases, subjects in this group are given a placebo, which is a harmless, fake treatment that is made to look like the real treatment. The responses of both groups can be compared and studied
Observational Study
In an observational study, a researcher does no influence the responses. A researcher observes and measures characteristics of interest of part of a population but does not change the existing conditions.
Explanatory variable
Independent variable in a causal problem.
Mode
Most frequently occurring data value in a data set
Binary data
Non-numerical data that has two categories
Problem, Plan, Data, Analysis, Conclusion
PPDAC stands for _________________________________.
State whether the information given is a statistic or a parameter and say why. The median height of the entire Phoenix Mercury WNBA team is 73 inches.
Parameter because it is describing the population of the Phoenix Mercury team. (see test 1 #2)
When patients improve because they are receiving treatment even though they are not actually receiving treatment
Placebo effect
Solve the problem. Suppose you want to test the claim that μ ≠ 3.5. Given a sample size of n = 33 and a level of significance of α = 0.05 when should you reject H0 ? Reject H0 if the standardized test statistic is greater than 2.33 or less than -2.33 Reject H0 if the standardized test statistic is greater than 2.575 or less than -2.575. Reject H0 if the standardized test statistic is greater than 1.96 or less than -1.96. Reject H0 if the standardized test statistic is greater than 1.645 or less than -1.645
Reject H0 if the standardized test statistic is greater than 1.96 or less than -1.96
Solve the problem. Suppose you want to test the claim that μ < 65.4. Given a sample size of n = 35 and a level of significance of α = 0.01 when should you reject H0? Reject H0 if the standardized test is less than -2.575. Reject H0 if the standardized test statistic is less than -1.96. Reject H0 if the standardized test statistic is less than -1.28. Reject H0 if the standardized test statistic is less than -1.645.
Reject H0 if the standardized test statistic is less than -1.28.
The P-value for a hypothesis test is P = 0.006. Do you reject or fail to reject H0 when the level of significance is alpha=0.01?
Reject because P < alpha (0.006 < 0.01) (see 7.2 in class practice)
The owner of a professional basketball team claims that the mean attendance at games is over 22,000 and therefore the team needs a new arena. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. Sketch the distribution of the test statistic and label the P-value region.
Right-tailed (see 7.1 in class practice)
Solve the problem. For the mathematics part of the SAT the mean is 514 with a standard deviation of 113, and for the mathematics part of the ACT the mean is 20.6 with a standard deviation of 5.1. Bob scores a 660 on the SAT and a 27 on the ACT. Use z-scores to determine on which test he performed better. ACT SAT
SAT
Measure of spread
SD and IQR
Population
Set of all observations of interest for a study
Interpreting a histogram means knowing the 4 features, which are
Shape Center Spread Outliers
K<3
Shows a data set has fewer extreme observations than normal
K>3
Shows a data set has more extreme observations than normal
Whenever including an omitted variable causes us to rethink the direction of an association, this is called ____________ __________
Simpson's paradox
For the given data , construct a frequency distribution and frequency histogram of the data using five classes. Describe the shape of the histogram as symmetric, uniform, skewed left, or skewed right. Data set: California Pick Three Lottery 8 6 7 6 0 9 1 7 8 4 1 5 7 5 9 7 5 3 9 9 8 8 3 9 8 8 9 0 2 7 skewed right symmetric uniform skewed left
Skewed Left
Sample standard deviation
Sqrt(sample variance)
Test the claim that µ ≤ 40, given that standard deviation = 4.3, alpha = 0.01, and the sample statistics are n = 40 and mean = 41.8.
Standardized test statistic ≈ 2.65 Critical value = 2.33 Reject H0 There is enough evidence to reject the claim. (see 7.2 in class practice)
Explain how the question may be biased and suggest a way to reword the question to remove bias. Why does eating whole grain foods improve your health?
The question implies that whole grain foods are good for your health. "How does eating whole grain foods impact your health?" (see quiz 1)
Quitting methods
Treatments
None
Your stats teacher tells you your test score was the 3rd Quartile for the class. Which is true? I. You got 75% on the test II. You can't really tell what this means without knowing the standard deviation III. You cant' really tell what this means unless the class distribution is nearly symmetric
Observational Study
a study in which the researcher draws conclusions strictly by observing what is happening or what has happened ex. observing how children interact on the playground
Experiment
a study in which the researcher manipulates one (or more) of the variable(s) and determines how the change influences the response variable(s). The researcher controls the experiment by applying a treatment ex. different soil content used to test plant growth
Sample
a subset of members selected from the population
Sample
a subset of the population (Ex: a few in the larger classroom)
Blinding
a technique where the subject does not know whether they are receiving a treatment or a placebo
Binding
a technique where the subjects do not know whether they are receiving a treatment or a placebo.
Random Variable
a variable whose value is determined by chance
Use the given frequency distribution to find the (a) class width. (b) class midpoints of the first class. (c) class boundaries of the first class. Phone Calls (per day) (a) 3 (b) 9.5 (c) 7.5-11.5 (a) 3 (b) 10.5 (c) 8-11 (a) 4 (b) 10.5 (c) 8-11 (a) 4 (b) 9.5 (c) 7.5-11.5
a) 4 (b) 9.5 (c) 7.5-11.5
Solve the problem. Two high school students took equivalent language tests, one in German and one in French. The student taking the German test, for which the mean was 66 and the standard deviation was 8, scored an 82, while the student taking the French test, for which the mean was 27 and the standard deviation was 5, scored a 35. Compare the scores. a)A score of 82 with a mean of 66 and a standard deviation of 8 is better. b)The two scores are statistically the same. c)You cannot determine which score is better from the given information. d)A score of 35 with a mean of 27 and a standard deviation of 5 is better.
a)A score of 82 with a mean of 66 and a standard deviation of 8 is better.
SAT scores have a bell-shaped distribution with a mean score of 1490 with a standard deviation of 220. a. Use the Empirical Rule to determine the percent of scores that fall between 1270 and 1710. b. Use the Empirical Rule to determine the percent of scores that fall between 1050 and 1490.
a. 68% b. 47.5% (see test 1 #21)
Use the graphs of normal distributions A, B, and C (test 3 #6-7) a. Which normal distribution has the largest mean? b. Which normal distribution has the largest standard deviation?
a. B b. A
Identify the sampling technique in each situation. a. Questioning students as they leave a university library, a researcher asks 358 students about their drinking habits. b. Chosen at random, 580 customers at a car dealership are contacted and asked their opinions of the service they received. c. Soybeans are planted on a 48-acre field. The field is divided into one acre subplots. A sample is taken from each subplot to estimate the harvest.
a. Convenience sampling b. Simple random sampling c. Stratified sampling (see quiz 1)
State if the probabilities are classical, empirical, or subjective. a. The probability that a randomly selected student makes use of office hours if 8 out of 93 students made use of office hours this past week. b. The probability that Matt will be lonely during office hours next week. c. The probability of being dealt a pair of kings in a game of poker.
a. Empirical b. Subjective c. Classical (see test 2 #12)
The number of points scored by Kyrie Irving in every game of the 2017 NBA playoffs are listed below: 23, 37, 13, 28, 24, 22, 16, 27, 11, 23, 29, 42, 24, 24, 19, 38, 40, 26 a. Calculate the mean, median, and mode of the data. Round to the nearest hundredth. b. Calculate the range, standard deviation, and variance of the data. Use your calculator and round to the nearest hundredth. c. Calculate the coefficient of variation. Round to the nearest hundredth. d. What is the level of measurement for the data? Explain your answer. e. Use the data to construct a frequency distribution with 5 classes. f. Construct a Relative Frequency Histogram for the data. g. Create a stem and leaf plot for the data. h. Suppose you wanted to create a pie chart from the data. Calculate the central angle for each category. Round answers to the nearest hundredth. - Boston: 17 championships - LA: 16 championships - Chicago: 6 championships
a. Mean = 24.94, Median = 24, Mode = 24 b. Range = 29, SD = 8.56, Variance = 73.27 c. 33.06% d. Ratio because the distance between the values has no meaning. Also, in this case a value of 0 would mean none, whereas in interval, 0 is on a scale. e. see test 1 #15 f. see test 1 #16 g. see test 1 #17 h. Boston: 158.40 deg, LA: 147.60 deg, Chicago: 55.39 deg (see test 1 #11-18)
Determine whether or not the Central Limit Theorem can be applied to the distribution of sample means and state the reason why. a. The mean height of all NBA players is 79 inches with a standard deviation of 5 inches. 10 NBA players are randomly selected and the mean of the 10 heights is calculated. b. The salaries at a Fortune 500 company are skewed right with a mean of $85,000 and a standard deviation of $11,000. 50 employees are randomly selected and the mean of the salaries is calculated.
a. No because n<30 and the population is not normal. b. Yes because n is greater than or equal to 30 (see test 3 #16-17)
A survey of 55 US law firms found that the average hourly billing rate was $425. a. Identify the population and the sample. b. Determine which part of the survey represents the descriptive branch of statistics and make an inference based on the survey.
a. Population: average billing rate of US law firms Sample: 55 US law firms b. 55 US law firms had an average billing rate of $425 Inference: The average billing rate of US law firms is about $425. (see quiz 1)
In a class of 32 students, 11 students made use of office hours and 7 students made an appointment at the math center. 4 of the students who used office hours also made an appointment at the math center. a. Create a venn diagram or two-way table to model the situation. b. What is the probability of selecting a student who used office hours but did not make an appointment? c. What is the probability of selecting a student who did not get extra help?
a. See test 2 #17 b. 0.219 c. 0.563
You flip three coins. Let the random variable X represent the number of heads that come up. a. Use a tree diagram to find the sample space for flipping three coins. b. Create a probability distribution for the random variable X. c. What is the expected number of heads obtained from flipping three coins?
a. See test 2 #18 b. See test 2 #18 c. 1.5
State the type of sample (simple random, stratified, cluster, systematic, or convenience). a. Assigning each student a number and choosing every third number after the number 5. b. Randomly choosing 5 classrooms in Bannow and asking all the students in each room to complete a survey.
a. Systematic b. Cluster (see test 1 #3-4)
The average time spent sleeping (in hours) for a group of medical residents at a hospital can be approximated by a normal distribution with a mean of 6.1 hours and a standard deviation of 1.0 hours. a. What is the shortest time spent sleeping that would still place a resident in the top 5% of sleeping times? b. Between what two values does the middle 50% of the sleep times lie?
a. invnorm(.95, 6.1, 1) = 7.74 b. invnorm(.25, 6.1, 1) = 5.43 invnorm(.75, 6.1, 1) = 6.77 - Between 5.43 hours and 6.77 hours. (see quiz 6)
The number of points scored by Kyrie Irving in every game of the 2017 NBA playoffs are listed below: 23, 27, 13, 28, 24, 22, 16, 37, 11, 23, 29, 42, 24, 24, 19, 38, 40, 26 a. Give the five number summary for the data and construct a box and whisker plot. b. Use the interquartile range (IQR) to identify any outliers. c. Find the value that corresponds to the 78th percentile. What percentile is 22?
a. min=1, Q1=22, Q2=24,Q3= 29, max=42 see test 2 #1 b. IQR=7 Outliers: 11, 40, 42 c. 78th percentile = 37 22 = 27.8th percentile
Consider the data set below. 39, 36, 30, 27, 26, 24, 28, 35, 39, 60, 50, 41, 35, 32, 51 a. Find the Five Number Summary. b. Create a box and whisker plot for the data.
a. min=24, Q1=28, Q2=35, Q3=41, max=60 b. see quiz 3 #3
Assume that the salaries of elementary school teachers in the US are normally distributed with a mean of $31,000 and a standard deviation of $2500. Suppose a teacher is selected at random. a. Sketch the distribution. b. Find the probability that he or she makes less than $28,000. c. Find the probability that he or she makes more than $35,000. d. Find the probability that he or she makes between $29,000 and $33,000.
a. see test 3 #11 b. normalcdf(-1x10^99, 28000, 31000, 2500) = 0.1151 c. normalcdf(35000, 1x10^99, 31000, 2500) = 0.0548 d. normalcdf(29000, 33000, 31000, 2500) = 0.5763
Law of Large Numbers
as an experiment is repeated over and over, the empirical probability of an event approached the theoretical (actual) probability of the event
Law of Large Numbers
as the number of trials for an experiment increases, the empirical probability of an event approaches the theoretical probability of the event
simple random
assigns every member a number & then uses a random number table to select
systematic
assigns numbers & then systematically select numbers
Descriptive Statistics
numerical summaries of a sample
systemtic sample
s a sample in which each member of the population is assigned a number. The members of the population are ordered in some way, a starting number is randomly selected, and then sample members are selected at regular intervals from the starting number. Ex: In the Calcasieu Parish example you could assign a different number to each household, randomly choose a starting number, then select every 100th household
sample space
set of all possible outcomes { }
Determine whether the data are qualitative or quantitative. the number of seats in a movie theater
quantitative
Solve the problem. Classify the number of seats in a movie theater as qualitative data or quantitative data.
quantitative data
Solve the problem. A local bank needs information concerning the checking account balances of its customers. A random sample of 15 accounts was checked. The mean balance was $686.75 with a standard deviation of $256.20. Find a 98% confidence interval for the true mean. Assume that the account balances are normally distributed. ($513.17, $860.33) ($238.23, $326.41) ($487.31, $563.80) ($326.21, $437.90)
($513.17, $860.33)
Solve the problem. Find the z-scores for which 98% of the distribution's area lies between -z and z. (-0.99,0.99) (-1.96, 1.96) (-1.645, 1.645) (-2.33, 2.33)
(-2.33, 2.33)
Suppose you are using α =0.05 to test the claim that μ ≠ 34 using a P-value. You are given the sample statistic n= 35 33.1, and s = 2.7. Find the P-value. 0.0244 0.1003 0.0448 0.0591
0.0448
Rank the probabilities of 10%, 1/5, 0.06 from the least likely to occur to the most likely to occur. 0.06, 10%, 1/5 0.06, 1/5, 10% 10%, 1/5, 0.06 1/5, 10%, 0.06
0.06,10%, 1/5
Solve the problem. The distribution of cholesterol levels in teenage boys is approximately normal with mean= 170 and standard deviation= 30 (Source: U.S. National Center for Health Statistics). Levels above 200 warrant attention. Find the probability that a teenage boy has a cholesterol level greater than 200. 0.3419 0.1587 0.8413 0.2138
0.1587
Solve the problem. A delivery route must include stops at three cities. If the route is randomly selected, find the probability that the cities will be arranged in alphabetical order. Round your answer to three decimal places. 0.03703704 0.16666667 0.33333333 0.125
0.16666667
Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. Major Frequency Mathematics 216 English 207 Engineering 86 Business 176 Education 204 What is the probability that a randomly selected student graduating with a Master's degree has a major of Education? Round your answer to three decimal places. 0.298 0.771 0.005 0.229
0.229
Solve the problem. The lengths of pregnancies are normally distributed with a mean of 264 days and a standard deviation of 15 days. If 36 women are randomly selected, find the probability that they have a mean pregnancy between 264 days and 266 days. 0.5517 0.7881 0.2881 0.2119
0.2881
Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. class cc carrier no cc Freshman 19 41 Sophomore 40 0 If a student is selected at random, find the probability that he or she is a freshman given that the student owns a credit card. Round your answers to three decimal places. 0.322 0.678 0.190 0.317
0.322
Find the standard error of estimate, se, for the data below, given that y = -1.885x + 0.758. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -2 3 4 1 -4 -5 8 0.613 0.011 0.312 0.981
0.613
Solve the problem. A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the manager's sales representatives travel per month and the amount of sales (in thousands of dollars) per month. Calculate the correlation coefficient, r. Miles traveled x 5 6 13 10 11 18 6 4 14 Sales y: 41 43 88 72 75 71 58 65 130 0.561 0.791 0.632 0.717
0.632
An airline knows from experience that the distribution of the number of suitcases that get lost each week on a certain route is approximately normal with mean= 15.5 and standard deviation= 3.6. What is the probability that during a given week the airline will lose between 10 and 20 suitcases? 0.1056 0.8314 0.4040 0.3944
0.8314
Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Calculate the correlation coefficient r. hrs x: 5 7 4 10 4 6 6 7 8 5 Scores y: 66 81 61 89 67 79 86 91 91 5 0.654 0.761 0.847 0.991
0.847
Find the area under the standard normal curve to the left of z = 1.25. 0.7682 0.8944 0.1056 0.2318
0.8944
Solve the problem. An airline knows from experience that the distribution of the number of suitcases that get lost each week on a certain route is approximately normal with mean = 15.5 and standard deviation= 3.6. What is the probability that during a given week the airline will lose less than 20 suitcases? 0.1056 0.3944 0.8944 0.4040
0.8944
Solve the problem. Find the area of the indicated region under the standard normal curve. 0.0968 0.0823 0.9032 0.9177
0.9032
Solve the problem. IQ test scores are normally distributed with a mean of 99 and a standard deviation of 11. An individual's IQ score is found to be 109. Find the z-score corresponding to this value. -1.10 0.91 -0.91 1.10
0.91
Solve the problem. A coffee machine dispenses normally distributed amounts of coffee with a mean of 12 ounces and a standard deviation of 0.2 ounce. If a sample of 9 cups is selected, find the probability that the mean of the sample will be less than 12.1 ounces. 0.3216 0.0668 0.9332 0.2123
0.9332
Solve the problem. Find the area under the standard normal curve to the left of z = 1.5 0.1599 0.7612 0.9332 0.0668
0.9332
Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. What is the best predicted value for y given Assume that the variables x and y have a significant correlation. Temp x: 72 85 91 90 88 98 75 100 # of absences y: 3 7 10 10 8 15 4 15 15 13 12 14
12
Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. What is the best predicted value for y given x = 95 Assume that the variables x and y have a significant correlation. Temp x: 72 85 91 90 88 98 75 100 80 # of absences y: 3 7 10 10 8 15 4 15 5 15 13 12 14
12
Provide an appropriate response. The mean score of a placement exam for entrance into a math class is 80, with a standard deviation of 10. Use the Empirical Rule to find the percentage of scores that lie between 60 and 80. (Assume the data set has a bell-shaped distribution.) 95% 34% 47.5% 68%
47.5%
Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if A and B must sit together? 120 48 12 24
48
Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if A and B must sit together? 24 48 120 12
48
Solve the problem. If a couple has nine boys and two girls, how many gender sequences are possible? 11 55 16 8
55
Provide an appropriate response. Find the range of the data set represented by the graph. 6 20 5 17
6
Solve the problem. For the following data, approximate the mean miles per day. Miles (per day) Frequency 1-2 23 3-4 16 5-6 26 7-8 30 9-10 29 6 5 25 7
6
Solve the problem. If a couple has seven boys and eight girls, how many gender sequences are possible? 16 15 6435 8
6435
Experiment is binomial A success in this experiment is baby recovers
About 40% of babies born with a certain ailment recover fully. A hospital is caring for five babies born with this ailment. The random variable represents the number of babies that recover fully. Decide whether the experiment is a binomial experiment
Has 2 modes around which the observations are concentrated.
Bimodal
When reaching a conclusion in a hypothesis test, what is the relationship between the P-value and the significance level?
If the P-value is less than or equal to alpha, you reject the null hypothesis. If the P-value is greater than alpha, you fail to reject the null hypothesis. (see 7.1 in class practice)
B. P(A and B) = 0 because A and B cannot occur at the same time
If two events are mutually exclusive, why is P(A and B) = 0 A. P(A and B) = 0 because A and B each have the same probability B. P(A and B) = 0 because A and B cannot occur at the same time C. P(A and B) = 0 because A and B are independent D. P(A and B) = 0 Because A and B are complements of each other
Descriptive problem
Involves finding the value of a parameter or attribute.
Causal problem
Involves seeing if one variable is "caused by" or correlated with another.
Ordinal data
Non-numerical data that has an underlying order
Failure to submit to assigned treatment
Noncompliance
Estimates
Normal letters indicate __________________________, actual numbers calculated from the sample.
Percentiles on the ECDF axis
One can find quantiles of a distribution graph using ______________________.
Fill in the missing value of the probability distribution (test 2 #19) and use your calculator to find the variance and standard deviation of the probability distribution.
P(5) = 0.3 Standard Deviation = 5.679 Variance = 32.251
______ ___________ emphasize how the different categories relate to each other
Pie chart
What are the two types of variables?
Quantitative categorical
Estimators
Random variables representing properties of the sample.
A manufacturer claims that the mean lifetime of its fluorescent bulbs is 1500 hours. A homeowner selects 40 bulbs and finds the mean lifetime to be 1480 hours with a population standard deviation of 80 hours. Test the manufacturers claim. Use alpha = 0.05.
Standardized test statistic ≈ -1.58 Critical value z0 = ±1.96 Fail to reject H0 At the 5% level of significance, there is not enough evidence to reject the manufacturer's claim µ = 1500. (see 7.2 in class practice)
Determine whether the numerical value is a parameter or a statistic and explain your reasoning. A survey of 1004 US adults found that 52% think China's emergence as a world power is a major threat to the well-being of the United States.
Statistic because 52% is referring to a sample of 1004 US adults (see quiz 1)
D. 54 A. Less than the median A. 51 A. Skewed left (mean is smaller than the median)
Stem Leaf 3 44 4 022 5 11111335555 6 00222233 The median is... A. 51 B. 52 C. 53 D. 54 E. 55 The mean is... A. Less than median B. Larger than median C. Equal to median The mode is... A. 51 B. 52 C. 53 D. 54 E. 55 The distribution is... A. Skewed left B. Skewed right C. Symmetric
SAT scores are normally distributed. In a recent year, the mean score was 1498 and the standard deviation was 316. Student A received a score of 1240 and Student B received a score of 2200. Calculate the z score for each student's score.
Student A = -0.82 Student B = 2.22 (see quiz 5)
It is the sum of n Z^2 RVs.
W follows a Chi-squared(n) distribution if ______________________________________.
Cluster Sample
When the population falls into naturally occurring subgroups, each having similar characteristics, a cluster sample may be the most appropriate. To select cluster samples, divide the population into groups, called clusters, and select all the members in one or more (but not all) of the clusters Ex: In the Calcasieu Parish example you could divide the households into clusters according to zip codes, then select all the households in one or more, but not all, zip codes
D. A and C
Which is a quantitative variable? A. Salary B. Religious affiliation C. Grams of fat in a cheeseburger D. A and C E. None of these
B. The scores of students on a very easy exam in which most score perfectly, but a few do very poorly (When the mean is smaller than the median, that typically means a data set that is skewed to the left, leaving the bulk of the scores in the upper range)
Which of the following is likely to have a mean that is smaller than the median? A. The salaries of all NFL players B. The scores of students on a very easy exam in which most score perfectly but a few do very poorly C. Test scores on a standardized test D. The scores of students on a very difficult exam in which most score poorly, but a few do very well
Find the P-value for the hypothesis test with the standardized test statistic z. Decide whether to reject H0 for the level of significance alpha. a. Right-tailed test; z = 0.52; alpha = 0.05 b. Two-tailed test; z = 1.95; alpha = 0.05
a. normalcdf(.52, 1x10^99, 0, 1) = 0.3015 Fail to reject because P > alpha (0.3015 > 0.05) b. P = 2(area of standard test statistic) = 2(0.0256) = 0.0512 Fail to reject because P > alpha (0.0512 > 0.05) (see 7.2 in class practice)
The mean number of green jelly beans in a bag of jelly beans is 26 per bag with a standard deviation of 4. A sample of 64 bags of jelly beans is taken from the population. a. What is the probability that the mean number of green jelly beans per bag for the sample is between 20 and 28 green jelly beans? b. What is the probability that the mean number of jelly beans for the sample is greater than 27 that week? c. What is the probability that a SINGLE bag of jelly beans selected has more than 27 green jelly beans?
a. normalcdf(20, 28, 26, 4/8) = 0.999968 ≈ 1 b. normalcdf(27, 1x10^99, 26, 4/8) = 0.0228 c. normalcdf(27, 1x10^99, 26, 4) = 0.4013 (see test 3 #18)
the slope and intercept of the least square regression line are found using this equation (1/2):
b = r (sY/sX)
frequency histogram
bar graph that represents the frequency distribution
empirical (statistical) probability
based on observations obtained from probability experiments P(E)= frequency of E/ total frequency =f/n
a sample that produces data that's not representative of the population
biased sample
Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the mode speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 bimodal 201.2 no mode 201.4
bimodal
Ordinal Level of Measurement
both qualitative & quantitative -can be put into order
Double blind study
both the observer and the participants are unaware of certain conditions in the experiment
Identify the data set's level of measurement. the nationalities listed in a recent survey (for example, Asian, European, or Hispanic).
nominal
random
every member of a population has an equal chance of being selected
Random Sample
every member of the population has an equal chance of being selected
Identify whether the statement describes inferential statistics or descriptive statistics. There is a relationship between smoking cigarettes and getting emphysema.
inferential statistics
Solve the problem. Based on previous clients, a marriage counselor concludes that the majority of marriages that begin with cohabitation before marriage will result in divorce. Does this statement describe inferential statistics or descriptive statistics?
inferential statistics
Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A card is drawn from a standard deck of 52 playing cards. A: The result is a 7. B: The result is a jack. mutually exclusive not mutually exclusive
mutually exclusive
Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A die is rolled. A: The result is an odd number. B: The result is an even number. mutually exclusive not mutually exclusive
mutually exclusive
an increase in one variable is associated with an decrease in the other
negative relationship
Double-blind experiment
neither the experimenter nor the subjects know if the subjects are receiving a treatment or a placebo
double-blind experiment
neither the experimenter nor the subjects know whether the subjects are receiving a treatment or a placebo. The experimenter is informed after all the data have been collected. This type of experimental design is preferred by researchers.
Solve the problem. Given the size of a human's brain, x, and their score on an IQ test, y, would you expect a positive correlation, a negative correlation, or no correlation? no correlation negative correlation positive correlation
no correlation
Identify the data set's level of measurement. hair color of women on a high school tennis team
nominal
Use your calculator to find P(z<-2.58 or z>2.58). State what you type into your calculator. Sketching a diagram may be useful.
normalcdf(-1x10^99, -2.58, 0, 1) = 0.0049 normalcdf(2.58, 1x10^99, 0, 1) = 0.0049 0.0049 + 0.0049 = 0.0098 (see quiz 5)
Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A student is selected at random. A: The student is taking a math course. B: The student is a business major. not mutually exclusive mutually exclusive
not mutually exclusive
statistic
numerical description of a sample characteristic. Ex: Average age of people from a sample of three states.
Confounding variable
occurs when an experimenter cannot tell the difference between the effects of different factors on a variable
Biased Samples
omit a portion of the population
Solve the problem. Identify the level of measurement for data that are the ratings of a movie ranging from poor to good to excellent.
ordinal
1. Find the mean of the data set 2. Find the difference between each observation and the mean 3. Square each # in the new data set 4. Add up the new data set and divide the numbers by n-1 (the sample minus 1) These are the steps to find the....?
standard deviation steps
Identify the sampling technique used. A market researcher randomly selects 200 drivers under 55 years of age and 200 drivers over 55 years of age.
stratified
Solve the problem. A market researcher randomly selects 200 drivers under 35 years of age and 100 drivers over 35 years of age. What sampling technique was used?
stratified
Solve the problem. A market researcher randomly selects 200 drivers under 35 years of age and 100 drivers over 35 years of age. What sampling technique was used? random stratified systematic cluster convenience
stratified
Solve the problem. Thirty-five sophomores, 35 juniors and 49 seniors are randomly selected from 230 sophomores, 280 juniors and 577 seniors at a certain high school. What sampling technique is used? random systematic stratified convenience cluster
stratified
completely randomized design
subjects are assigned to different treatment groups through random selection. In some experiments, it may be necessary for the experimenter to use blocks, which are groups of subjects with similar characteristics. A commonly used experimental design is a randomized block design.
In a histogram, SHAPE refers to
symmetry/skewedness modality
Q2
the median of the data set
Test the claim about the population mean µ at the level of significance alpha. Assume the population is normally distributed. Claim: µ ≠ 35; alpha 0.05; standard deviation = 2.7 Sample statistics: mean = 34.1; n = 35
z = -1.97 Claim: µ ≠ 35; H0: µ = 35; Ha µ ≠ 35 (two-tailed) P = 2(area of standard test statistic) = 2(0.0244) = 0.0488 Reject because 0.0488 < 0.05 At the 5% level of significance, there is sufficient evidence to support the claim that the mean is ≠ 35. (see 7.2 in class practice)
Provide an appropriate response. Find the z-score for the value 55, when the mean is 58 and the standard deviation is 3. z = 0.90 z = -0.90 z = -1.00 z = -1.33
z= -1.00
Solve the problem. Find the z-score for the value 70, when the mean is 76 and the standard deviation is 2. z= -.89 z= -3.00 z= .89 z= -3.50
z= -3.00
Use the Standard Normal Table to find the z value that would have an area of 0.9916 to the left of it under the standard normal curve.
z=2.39 (see quiz 6)
Solve the problem. Suppose you are using α = 0.01 to test the claim that μ ≤ 29 using a P-value. You are given the sample statistics n = 40, x = 30.8, and s = 4.3. Find the P-value. 0.1030 0.0211 0.0040 0.9960
±
the _______ of the relationship is determined by how _______ the data follow the _____ of the relationship.
strength, closely, form
Replication
the repetition of an experiment under the same or similar conditions
Intersection
the set of outcomes that is contained in both event A and event B at the same time Intersection = and, upside-down u
Union
the set of outcomes that is contained in event A or event B or both. Union = or , u
Exp(2)
A Chi-squared(2) RV has the same pdf as __________________________.
100p% Likelihood interval
A ______________________ is the set of all theta for which the RLF is at least p.
Pareto Chart
A bar graph for qualitative data, with the bars arranged in descending order according to frequencies
Inherent Zero
An inherent zero is a zero that implies "none." For example, the amount of money you have in a savings account could be zero dollars. The zero represents no money; it is an inherent zero. A temperature of 0°𝐶𝐶 does not represent a condition in which no heat is present. The 0°𝐶𝐶 temperature is simply a position on the Celsius scale; it is not an inherent zero
In 2014, the mean starting salary for an undergraduate student graduating with a degree in mathematics or statistics was $53,000 with a standard deviation of $4,000. 35 students graduated with math degrees from Fairfield in 2014. Use Chebychev's Theorem to estimate at least how many Fairfield math grads had a starting salary between $45,000 and $61,000.
At least 26 grads had a starting salary between $45,000 and $61,000. (see test 1 #20)
Chi-squared(1)
For large n, -2logR(theta) converges to _________________________.
N(n,2n)
For large n, Chi-squared(n) is approximately distributed as ______________________.
Normal Distribution
Bell Shaped; data is symmetric around the center
This is an example of EMPIRICAL probability, since THE STATED PROBABILITY IS CALCULATED BASED ON OBSERVATIONS FROM THE COMPANY RECORDS
Classify the following example as classical, empirical or subjective. Explain why. According to company records, the probability that a washing machine will need repairs during a six-year period is 0.09
This is an example of CLASSICAL probability, since EVERY COMBINATION OF 6 NUMBERS HAS AN EQUAL CHANCE OF BEING DRAWN
Classify the following example as classical, empirical or subjective. Explain why. The probability of choosing 6 numbers from 1 to 52 that match the 6 numbers drawn by a certain lottery is 1/20358520 = 0.00000005
Statistical Inference
Conclusion drawn about a population from a sample
The events are DEPENDENT because the outcome of returning a rental movie after the due date AFFECTS the probability of the outcome of receiving a late fee
Determine whether the events are independent or dependent Returning a rented movie after the due date and receiving a late fee
These events ARE NOT mutually exclusive since IT IS POSSIBLE TO SELECT A FEMALE HISTORY MAJOR WHO IS 21 YEARS OLD
Determine whether the following events are mutually exclusive Event A. Randomly select a female history major Event B. Randomly select a history major who is 21 years old
C. True
Determine whether the statement is true or false. If two events are mutually exclusive, they have no outcomes in common A. False, if two events are mutually exlcusive, they have some outcomes in common B. False, if two events are mutually exclusive they have every outcome in common C. True
Measures of center (2)
Mean and median
1
The area under a density histogram is ___________.