STAT 231 MIDTERM REVIEW

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

uniform

(or rectangular) all the same output

Solve the problem. A group of 49 randomly selected students has a mean age of 22.4 years with a standard deviation of 3.8. Construct a 98% confidence interval for the population mean. (20.3, 24.5) (21.1, 23.7) (19.8, 25.1) (18.8, 26.3)

(21.1, 23.7)

Empirical study

Study in which the outcomes are different every trial

Target population

The population whose attributes we are interested in.

convenience

choosing members at convenience

the ________ "r" is to ___, the more ______ and strong it is.

closer, linear

population

collection of all outcomes, responses, measurements, or counts that are of interest

types of form

linear curvilinear clusters

Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive, A die is rolled. A: The result is a 3. B: The result is an odd number. not mutually exclusive mutually exclusive

not mutually exclusive

Mutually Exclusive Events

don't share any outcomes; cannot occur at the same time

Simple Event

event that consists of a single outcome

Systematic Sample

every member of the population is given an assigned number and ordered. Then members are randomly selected

Example: Researchers choose a random sample of 1,760 U.S eligible voters and collect data on their opinions regarding their political preferences This is an example of a _________ __________

sample survey

Poll a sample of individuals with the following q's: while watching TV, do you eat more snacks (a) more than usual (b) less than usual (c) same amount This is an example of a

sample survey

Randomization

subjects are randomly assigned to different groups

sample

subset, or part of the population

mode

the MOST occurring

When calculating the median and "N" is EVEN, the median is

the mean of the 2 center observations

Q3

the median from Q2 to the end of the set

finding the line that best fits the pattern of the linear relationship

linear regression

Range

maximum - minimum

N = 32, so the median will be the

mean of 2 observations

We use ________ and _______ as measures of center and spread ONLY for reasonably symmetric distributions with no outliers

mean, standard deviation

Find the mean, variance, and standard deviation of the binomial distribution for which n=250 and p=0.69.

mean=172.5 variance=53.475 standard deviation=7.3127 (see test 3 #4)

Solve the problem. Identify the level of measurement for data that are the number of milligrams of tar in 79 cigarettes.

ratio

cumulative frequency

the sum of the frequencies in that class & previous classes

T/F the lurking variable has an effect on both the explanatory and response variables

true

Solve the problem. What method of data collection would you use to collect data for a study where you would like to determine the chance getting three girls in a family of three children?

use a simulation

Solve the problem. Find the z-score for the value 88, when the mean is 95 and the standard deviation is 7. z = -1.14 z = -1.00 z = 0.85 z = -0.85

z = -1.00

State the type of correlation (positive, negative, or zero) you would expect from the set of data. a. The amount of time a candle has been burning and the height of the candle. b. The number of siblings a student has and their grade point average.

a. negative b. zero (test 2 #9-10)

Parameter

the numerical description of a population characteristic

matched-pair design

where subjects are paired up according to a similarity. One subject in each pair is randomly selected to receive one treatment while the other subject receives a different treatment.

In regards to the "4 possibilities for role-type classifications" what would light type and nearsightedness be?

C --> C

In regards to the "4 possibilities for role-type classifications" (2 variables) what would gender and test scores be?

C --> Q

Take category or label values, and place an individual into one of several groups

Categorical

Predictive problem

Involves trying to estimate the future value of a variable.

If the graph is skewed right...

Mean is larger than median

If the graph is skewed LEFT

Mean is smaller than median

The group's receiving different things

Treatment groups

A. To make inferences about a population based on information from a random sample

What is the objective of statistics? A. To make inferences about a population based on information from a random sample B. To make inferences about a sample with a high degree of reliability C. To state, beyond a shadow of a doubt, a conclusion about a population D. to make inferences about a random sample based on information from the population E. To use numbers in as many different ways as possible

Descriptive statistics

branch of statistics that involves the organization, summarization, and display of data

Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. In California's Pick Three lottery, a person selects a 3-digit number. The probability of winning California's Pick Three lottery is 1/1000 empirical probability classical probability subjective probability

classical probability

Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. The probability that a newborn baby is a boy is 1/2. empirical probability classical probability subjective probability

classical probability

Solve the problem. At a local community college, five statistics classes are randomly selected and all of the students from each class are interviewed. What sampling technique is used? systematic random convenience stratified cluster

cluster

Compound Events

concerning 2 or more events and their relationship

Confounding Variable

confuses the study and effects your results because it is related to variables of interest in the study

Qualitative data

consist of attributes, labels, or nonnumerical entries. Ex: Major, Place of birth, Eye color

Solve the problem. The average age of the students in a statistics class is 22 years. Does this statement describe:

descriptive statistics?

Deviation

deviation of an entry (x) in a population data set is the difference between the entry and the meah

Sampling Error

differences that can occur between the sample and population

a return rate that after a certain point fails to increase proportionately to additional outlays of investment

diminishing returns

When describing the relationship between 2 quantitative variables using a scatterplot, we look at:

direction form strength outliers

Solve the problem. Identify the level of measurement for data that are the temperature of 90 refrigerators. nominal interval ordinal ratio

interval

Census

is a count or measure of an entire population. Taking a census provides complete information, but it is often costly and difficult to perform.

Solve the problem. An elementary school claims that the standard deviation in reading scores of its fourth grade students is less than 3.45. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. right-tailed two-tailed left-tailed

left-tailed

ogive (cumulative frequency graph)

line graph that displays the cumulative frequency of each class as its upper class boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on vertical axis

cumulative frequency graph

line graph that represents the cumulative frequency

A _________ variable is a variable that is not among the explanatory or response variables in a study, but could substantially affect your interpretation among those variables.

lurking

Find the mean and standard deviation of the sampling distribution of sample means when the mean=50, standard deviation=6, and n=25.

mean = 50 standard error of the mean = 1.2 (see test 3 #15)

We use _________ and __________ as measures of center and spread for all other cases.

median, IQR

Solve the problem. Identify the level of measurement for data that are a list of 1247 social security numbers. ratio ordinal nominal interval

nominal

Solve the problem. Identify the level of measurement for data that are the nationalities listed in a recent survey (for example, Asian, European, or Hispanic).

nominal

Ex 1: In a recent survey, 834 employees in the United States were asked if they thought their jobs were highly stressful. Of the 834 respondents, 517 said yes. Identify the population and the sample. Describe the sample data set

population: responses of all employees in the US sample: responses of the 834 employees in the US data set: 517 who said yes & 317 who said no; so if there is 817 total and we know 517 said yes, calculation * 834 (all) -517 (yes) =317 (nos)

an increase in one of the variables is associated with an increase in the other

positive relationship

Solve the problem. The names of 70 contestants are written on 70 cards. The cards are placed in a bag, and three names are picked from the bag. What sampling technique was used?

random

Identify the data set's level of measurement. number of milligrams of tar in 85 cigarettes

ratio

Identify the data set's level of measurement. the lengths (in minutes) of the top ten movies with respect to ticket sales in 2007

ratio

Solve the problem. The numbers of touchdowns scored by a major university in five randomly selected games are given below. Identify the level of measurement. 1 5 4 5 5

ratio

The dependent variable; the outcome of the study

response variable

choosing the individuals from the population that will be included in the sample

sampling

list of potential individuals to be sampled - doesn't match the population of interest; may/may not be biased

sampling frame

Complete the frequency distribution (midpoint, relative frequency, and cumulative frequency) and use the distribution to construct a frequency histogram. Class; f(x) 0-7; 8 8-15; 8 16-23; 3 24-31; 3 32-39; 3

see quiz 2 #2

Compliment of Event E

set of all outcomes in a sample space that are not included in event E E' (E prime)

occurs whenever including a lurking variable causes you to rethink the direction of an association

simpsons paradox

SAT math scores of 1,000 future engineers and scientists is an example of

skewed left distribution

"SALARY.... Most people earn the low/medium range of salaries except CEOs, athletes, etc." This is an example of a _______ __________ ___________________

skewed right distribution

Age of death from trauma (car accidents, murder, suicide) is an example of

skewed right distribution

outlier

the most removed from the entire set

Statistics

the science of collecting, organizing, analyzing, and interpreting data in order to make a decision

Statistics

the science of collecting, organizing, analyzing, and interpreting data in order to make decisions

Goal of Statistics

to collect data from a small part (subset) of a larger group so that we can learn something about the larger group

Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Find the equation of the regression line for the given data. hrs x: 3 5 2 8 2 4 4 5 6 Scores Y: 65 80 60 88 66 78 85 90 90 71 y= -56.11x - 5.044 y= 56.11x - 5.044 y= 5.044x + 56.11 y= -5.044x + 56.11

y= 5.044x + 56.11

individuals report variables values frequently by giving their opinions

sample

Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Calculate the correlation coefficient, r. temp x: 74 87 93 92 90 100 77 102 82 # of absences y: 5 9 12 12 10 17 6 17 7 0.881 0.819 0.980 0.890

.0980

Solve the problem. A study of 1000 randomly selected flights of a major airline showed that 782 of the flights arrived on time. What is the probability of a flight arriving on time? 500/109 391/500 109/500 500/391

391/500

Solve the problem. The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Find the standard error of estimate, se, given that age x: 38 41 45 48 51 53 57 61 65 pressure y: 116 120 123 131 142 145 148 150 152 5.572 3.099 6.981 4.199

4.199

Given the equation of a regression line is y = 5x - 6, what is the best predicted value for y given x = 10? Assume that the variables x and y have a significant correlation. 44 55 9 56

44

is unitless; just measures the strength of a linear relationship

"r"

Solve the problem. A multiple regression equation is y = -35,000 + 130x1 + 20,000x2, where x1 is a person's age, x2 is the person's grade point average in college, and y is the person's income. Predict the income for a person who is 26 years old and had a college grade point average of 2.3. $485,299 $84,380 $49,380 $14,380

$14,380

Solve the problem. A random sample of 10 parking meters in a beach community showed the following incomes for a day. Assume the incomes are normally distributed. 3.60 4.50 2.8 6.3 2.6 5.2 6.75 4.25 8 3 Find the 95% confidence interval for the true mean. ($1.35, $2.85) ($2.11, $5.34) ($3.39, $6.01) ($4.81, $6.31)

($3.39, $6.01)

Solve the problem. Find the z-score for which 99% of the distribution's area lies between -z and z. (-1.645, 1.645) (-2.575, 2.575) (-1.96, 1.96) (-1.28, 1.28)

(-2.575, 2.575)

Sample variance

(1/n-1) * sum of (x-mean)^2

Solve the problem. In a recent study of 42 eighth graders, the mean number of hours per week that they watched television was 19.6 with a standard deviation of 5.8 hours. Find the 98% confidence interval for the population mean. (19.1, 20.4) (17.5, 21.7) (18.3, 20.9) (14.1, 23.2)

(17.5, 21.7)

Solve the problem. Construct a 98% confidence interval for the population mean, . Assume the population has a normal distribution. A study of 14 bowlers showed that their average score was 192 with a standard deviation of 8. (186.3, 197.7) (328.3, 386.9) (115.4, 158.8) (222.3, 256.1)

(186.3, 197.7)

Solve the problem. Construct a 95% confidence interval for the population mean, . Assume the population has a normal distribution. A random sample of 16 fluorescent light bulbs has a mean life of 645 hours with a standard deviation of 31 hours. (321.7, 365.8) (628.5, 661.5) (531.2, 612.9) (876.2, 981.5)

(628.5, 661.5)

Solve the problem. Construct a 95% confidence interval for the population mean, . Assume the population has a normal distribution. A sample of 25 randomly selected students has a mean test score of 81.5 with a standard deviation of 10.2. (87.12, 98.32) (77.29, 85.71) (66.35, 69.89) (56.12, 78.34)

(77.29, 85.71)

Solve the problem. A random sample of 40 students has a test score with average = 81.5 and s = 10.2. Construct the confidence interval for the (mean symbol) population mean, if c = 0.90. (66.3, 89.1) (78.8, 84.2) (51.8, 92.3) (71.8, 93.5)

(78.8, 84.2)

Choose the potion of the sentence that illustrates descriptive statistics. Students who came into office hours (a) had a mean score of 88 on the first test. It would appear that (b) attending office hours improves performance.

(a) (see test 1 #1)

Solve the problem. Calculate the correlation coefficient, r, for the data below. x:-12 -10 -3 -6 -8-9 -7 -5-4 -11 Y:14 -3 11 0 1 4 8 -2 9 10 -0.104 -0.549 -0.581 -0.132

-0.104

Solve the problem. Determine the standardized test statistic, z, to test the claim about the population proportion p = 0.250 given n=48 and p = 0.231. Use α = 0.01. -1.18 -0.304 -2.87 -0.23

-0.304

Solve the problem. Use a standard normal table to find the z-score that corresponds to the cumulative area of 0.01. -0.255 0.255 -2.33 2.33

-2.33

Solve the problem. Use a standard normal table to find the z-score that corresponds to the cumulative area of 0.01. -0.255 0.255 -2.33 2.33

-2.33

Solve the problem. Use the regression equation to predict the value of y for x = 3.2. Assume that the variables x and y have a significant correlation. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 -5.274 0.541 6.790 4.311

-5.274

Solve the problem. Use the regression equation to predict the value of y for x = 3.2. Assume that the variables x and y have a significant correlation. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 -5.274 0.541 6.790 4.311

-5.274

Interval Level of Measurement

-data can be ordered -meaningful differences between data entries can be calculated -zero entry is just a position

Ratio Level of Measurement

-zero means nonexistent -meaningful differences are multiples of other data entries

Solve the problem. The distribution of blood types for 100 Americans is listed in the table. If one donor is selected at random, find the probability of selecting a person with blood type A+ or A-. Blood type: o+ o- A+ A- B+ B- AB+ AB- 37 6 34 6 10 2 4 1 .4 .34 .6 .45

.4

A coin is tossed. Find the probability that the result is heads. .1 1 0.5 0.9

.5

Solve the problem. At the local racetrack, the favorite in a race has odds 3:2 in favor of winning. What is the probability that the favorite wins the race? 0.8 0.2 0.6 0.4

.6

Solve the problem. The events A and B are mutually exclusive. If P(A) = 0.2 and P(B) = 0.1, what is P(A and B)? 0.3 0.02 0 0.5

0

Solve the problem. The events A and B are mutually exclusive. If P(A) = 0.2 and P(B) = 0.1, what is P(A and B)? 0.5 0.3 0 0.02

0

Solve the problem. Assume that the heights of men are normally distributed with a mean of 67.9 inches and a standard deviation of 2.8 inches. If 64 men are randomly selected, find the probability that they have a mean height greater than 68.9 inches. 0.8188 0.0021 9.9671 0.9005

0.0021

suppose you are using α = 0.01 to test the claim that μ ≤ 29 using a P-value. You are given the sample statistics n = 40, x = 30.8, and s = 4.3. Find the P-value. 0.1030 0.0211 0.0040 0.9960

0.0040

Solve the problem. Determine the margin of error if the grade point averages for 10 randomly selected students from a class of 125 students has a mean of = 2.7. Assume the grade point average of the 125 students has a mean of = 2.9. 0.2 2.6 -0.2 2.8

0.2

Solve the problem. A survey of 250 homeless persons showed that 86 were veterans. Find a point estimate p, for the population proportion of homeless persons who are veterans. 0.34400002 0.524 0.256 0.65599998

0.34400002

Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) Major Frequency Mathematics 230 English 206 Engineering 86 Business 176 Education 222 What is the probability that a randomly selected student with a Master's degree majored in English or Mathematics? Round your answer to three decimal places. 0.224 0.474 0.526 0.250

0.474

Solve the problem. Use the standard normal distribution to find P(0 < z < 2.25). 0.5122 0.7888 0.4878 0.8817

0.4878

Solve the problem. Find the standard error of estimate, se, for the data below, given that y = -2.5x x: -1 -2 -3 -4 y: 2 6 7 10 0.532 0.349 0.675 0.866

0.866

Solve the problem. The table lists the smoking habits of a group of college students. no yes Heavy man 135 52 5 woman 187 21 5 If a student is chosen at random, find the probability of getting someone who is a man or a non-smoker. Round your answer to three decimal places. 0.941 0.820 0.948 0.936

0.936

Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing a red card? 1/2 1/13 1/52 1/4

1/2

Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing a red card? 1/2 1/52 1/4 1/13

1/2

Solve the problem. Use Bayes' theorem to solve this problem. A storeowner purchases stereos from two companies. From Company A, 600 stereos are purchased and 11% are found to be defective. From Company B, 350 stereos are purchased and 1% are found to be defective. Given that a stereo is defective, find the probability that it came from Company A. 77/139 132/139 7/139 12/139

132/139

Solve the problem. Identify the midpoint of the first class. Weight (in Pounds) Frequ. 135-139 6 140-144 4 145-149 11 150-154 15 155-160 8 137 139 135 11

137

Solve the problem. The cholesterol levels (in milligrams per deciliter) of 30 adults are listed below. Find Q1. 154 156 165 165 170 171 172 180 184 185 189 189 190 192 195 198 198 200 200 200 205 205 211 215 220 220 225 238 255 265 171 180 184.5 200

180

Provide an appropriate response. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 118.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 195.8 201.2 196.7 196.1

196.1

Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 201.2 192.2 196.7 196.1

196.1

Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the median speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 201.2 196.1 196.7 192.2

196.1

Solve the problem. Find the value of E, the margin of error, for c = 0.90, n = 10 and s = 3.6. 2.06 2.09 0.66 1.57

2.09

Solve the problem. The average IQ of students in a particular calculus class is 110, with a standard deviation of 5. The distribution is roughly bell-shaped. Use the Empirical Rule to find the percentage of students with an IQ above 120. 15.85% 2.5% 13.5% 11.15%

2.5%

Solve the problem. The average IQ of students in a particular calculus class is 110, with a standard deviation of 5. The distribution is roughly bell-shaped. Use the Empirical Rule to find the percentage of students with an IQ above 120. 2.5% 11.15% 15.85% 13.5%

2.5%

Provide an appropriate response. Grade points are assigned as follows: A = 4, B = 3, C = 2, D = 1, and F = O. Grades are weighted according to credit hours. If a student receives an A in a four-unit class, a D in a two-unit class, a B in a three-unit class and a C in a three-unit class, what is the student's grade point average?

2.75

Solve the problem. Grade points are assigned as follows: A = 4, B = 3, C = 2, D = 1, and F = O. Grades are weighted according to credit hours. If a student receives an A in a four-unit class, a D in a two-unit class, a B in a three-unit class and a C in a three-unit class, what is the student's grade point average? 2.75 2.50 3.00 1.75

2.75

A freshman's first semester grades are as follows: Biology: - Grade: C - Grade Points Earned: 2.0 - Credits: 4 Calculus 1: - Grade: B - Grade Points Earned: 3.0 - Credits: 4 Spanish: - Grade: B+ - Grade Points Earned: 3.3 - Credits: 3 Philosophy: - Grade: A- - Grade Points Earned: 3.6 - Credits: 3 Creative Writing: - Grade: A - Grade Points Earned: 4.0 - Credits: 1 Using the credits as weights, use a weighted mean to calculate the students grade point average for the first semester. Round to the nearest hundredth.

2.98 (see test 1 #19)

A restaurant offers a $12 dinner special that has 5 choices of an appetizer, 10 choices for an entree, and 4 choices for a dessert. How many different meals are available when you select an appetizer, an entree, and a dessert?

200 meals (see quiz 4)

The box an whisker plot (test 2 #4) shows the cost per DVD for a sample of 44 DVDs. How many DVDs cost between $14 and $20?

22 DVDs

Solve the problem. A city in the Pacific Northwest recorded its highest temperature at 74 degrees Fahrenheit and its lowest temperature at 23 degrees Fahrenheit for a particular year. Use this information to find the upper and lower limits of the first class if you wish to construct a frequency distribution with 10 classes. 23-27 18-28 23-28 23-29

23- 27

Solve the problem. Given that P(A or B)=1/6, P(A)=1/7, and P(A and B)=1/8, find P(B) 73/168 1/16 31/168 25/168

25/168

Solve the problem. The grade point averages for 10 students are listed below. Find the range of the data. 2.0 3.2 1.8 2.9 .9 4.0 3.3 2.9 3.6 .8 3.2 1.4 2.45 2.8

3.2

Solve the problem. Find the critical value, tc for c = 0.99 and n = 10. 2.2821 2.262 1.833 3.250

3.250

Solve the problem. For the following data set, approximate the sample standard deviation. Height (in inches) Frequency 50-52 5 53-55 8 56-58 12 59-61 13 62-64 11 2.57 .98 1.86 3.85

3.85

Solve the problem. If an individual is selected at random, what is the probability that he or she has a birthday in July? Ignore leap years. 31/365 1/365 12/365 364/365

31/365

Solve the problem. If a couple plans to have five children, how many gender sequences are possible? 5 32 25 3125

32

Solve the problem. SAT verbal scores are normally distributed with a mean of 446 and a standard deviation of 91. Use the Empirical Rule to determine what percent of the scores lie between 355 and 446. 47.5% 34% 68% 49.9%

34%

Solve the problem. The ages of 10 grooms at their first marriage are listed below. Find the midquartile. 35.1 24.3 46.6 41.6 32.9 26.8 39.8 21.5 45.7 33.9 43.7 34.5 34.1 34.2

34.2

B. Stratified

35 sophomores, 39 juniors, and 29 seniors are randomly selected from 475 sophomores, 517 juniors, and 550 seniors at a certain high school. What sampling technique is used? A. Simple random B. Stratified C. Convenience

Solve the problem. The birth weights for twins are normally distributed with a mean of 2353 grams and a standard deviation of 647 grams. Use z-scores to determine which birth weight could be considered unusual. 1200g 2000g 3600g 2353g

3600 g

Professor Duckett has 4 Celtics jerseys, 7 Celtics hats, and 13 Celtics t-shirts. How many different outfits can he choose from when he goes to the game?

364 (test 2 #13)

Solve the problem. A study of 1000 randomly selected flights of a major airline showed that 782 of the flights arrived on time. What is the probability of a flight arriving on time? 391/500 500/391 109/500 500/109

391/500

Solve the problem. A researcher found a significant relationship between a person's age, x1, the number of hours a person works per week, x2, and the number of accidents, y, the person has per year. The relationship can be represented by the multiple regression equation y = -3.2 + 0.012x1 + 0.23x2. Predict the number of accidents per year (to the nearest whole number) for a person whose age is 41 and who works 31 hours per week. 5 6 4 3

4

Solve the problem. A tire company finds the lifespan for one brand of its tires is normally distributed with a mean of 47,500 miles and a standard deviation of 3000 miles. If the manufacturer is willing to replace no more than 10% of the tires, what should be the approximate number of miles for a warranty? 51,340 52,435 42,565 43,660

43,660

The median is the value that separates data into the bottom ______ and top 50%

50%

Solve the problem. Seven guests are invited for dinner. How many ways can they be seated at a dinner table if the table is straight with seats only on one side? 40,320 720 4 5040

5040

Solve the problem. Identify the midpoint of the first class. Height (in inches) Frequency 50-52 5 53-55 8 56-58 12 59-61 13 62-64 11 52 50 51 49.5

51

Provide an appropriate response. Use the ogive below to approximate the cumulative frequency for 24 hours. Student Answer: 75 27 17 63

63

Solve the problem. Lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 16 days. Use the Empirical Rule to determine the percentage of women whose pregnancies are between 252 and 284 days. 50% 68% 95% 99.7%

68%

Here are the number of hours that nine students spend on the computer on a typical day: 1 6 7 5 5 8 11 12 15 The median number of hours spent on the computer is:

7

Provide an appropriate response. Find the sample standard deviation. 2 6 15 9 11 22 1 4 8 19

7.1

Use the grouped data formulas to find the indicated mean or standard deviation. A random sample of 30 high school students is selected. Each student is asked how many hours he or she spent on the Internet during the previous week. The results are shown in the histogram. Estimate the sample mean. 8.1 8.3 7.9 7.7

7.9

Provide an appropriate response. Use the histogram below to approximate the mode heart rate of adults in the gym. a) 70 b) 2 c) 55 d) 42

70

Provide an appropriate response. The scores of the top ten finishers in a recent golf tournament are listed below. Find the median score. 67 67 68 71 72 72 72 72 73 76

72

Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. What is the best predicted value for y given Assume that the variables x and y have a significant correlation. # of absences x: 0 3 6 4 9 2 15 8 Final grade y: 98 86 80 82 71 92 55 76 76 78 79 77

77

Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. What is the best predicted value for y given x = 7? Assume that the variables x and y have a significant correlation. # of absences x: 0 3 6 4 9 2 15 8 5 Final Grade y: 98 86 80 82 71 92 76 78 79 77

77

Solve the problem. The lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 15 days. A baby is premature if it is born three weeks early. What percentage of babies are born prematurely? 6.81% 8.08% 10.31% 9.21%

8.08%

Solve the problem. In a survey of 2480 golfers, 15% said they were left-handed. The survey's margin of error was 3%. Find the confidence interval for p. 84.5% 98.5% 95% 80%

84.5%

Find the weighted mean of the data. The scores and the percents of the final grade for a statistics student are shown below. What is the student's mean score? HW: 85, 5% Quizzes: 80, 35% Project: 100, 20% Speech: 90, 15% Final exam: 93, 25%

89% (see quiz 2)

If Q3 is the median of the top 1/2 of the data, since there are 16 observations in that half, Q3 is the mean of the (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) & (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) observations in that half

8th and 9th observations

Use the grouped data formulas to find the indicated mean or standard deviation. The manager of a bank recorded the amount of time a random sample of customers spent waiting in line during peak business hours one Monday. The frequency distribution below summarizes the results. Approximate the sample mean. Round your answer to one decimal place. 9.0 13.5 9.2 7.7

9.2

Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Construct a 95% prediction interval for y, the number of days absent, given x = 95 degrees, y = 0.449x - 30.27 and se = 0.934. tep x: 72 85 91 90 88 98 75 100 80 # of absences y: 3 7 10 10 8 15 4 15 5 4.321 < y < 6.913 3.176 < y < 5.341 9.957 < y < 14.813 6.345 < y < 8.912

9.957 < y < 14.813

Provide an appropriate response. The mean score of a competency test is 77, with a standard deviation of 4. Use the Empirical Rule to find the percentage of scores between 69 and 85. (Assume the data set has a bell-shaped distribution.) 95% 68% 50% 99.7%

95%

Solve the problem. A competency test has scores with a mean of 69 and a standard deviation of 4. A histogram of the data shows that the distribution is normal. Use the Empirical Rule to find the percentage of scores between 61 and 77. 95% 68% 99.7% 50%

95%

Solve the problem. Assume that the heights of men are normally distributed with a mean of 69.0 inches and a standard deviation of 2.8 inches. The U.S. Marine Corps requires that men have heights between 64 and 78 inches. Find the percentage of men meeting these height requirements. 31.12% 96.26% 99.93% 3.67%

96.26%

For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 3 SD of the observation

99.7%

Right-skewed data

A data set where mean >> median

Attribute

A function of variates collected in a study

When the experiment is repeated many times the proportion of intervals that will contain the true value will converge to p%

A p% confidence interval for a variable satisfies _____________________________________________________________.

No evidence for association

A relative risk value close to 1 indicates _________________________.

Strong evidence for association

A relative risk value far from 1 indicates _________________________.

Sampling Error

A sampling error is the difference between results of a sample and those of the population.

Simulation

A simulation is the use of a mathematical or physical model to reproduce the conditions of a situation or process. Collecting data often involves the use of computers. Simulations allow you to study situations that are impractical or even dangerous to create in real life, and often they save time and money. Ex: Automobile manufactures use simulations with dummies to study the effects of crashes on humans.

D. No, because the probability of success is different for each trial

A state lottery randomly chooses 6 balls numbered from 1 through 43 without replacement. You choose 6 numbers and purchase a lottery ticket. The random variable represents the number of matches on your ticket to the numbers drawn in the lottery. Determine whether this experiment is binomial A. No, there are more than two outcomes for each trial B. Yes, there are a fixed number of trials and the trials are independent of each other C. Yes, the probability of success is the same for each trial D. No, because the probability of success is different for each trial

C. 0.555 B. 0.5 D. 0.6 D. The conditional distribution of opinion given gender

A student organization is trying to decide whether or not to offer more movies on campus. A random sample of 1000 students was asked if they were in favor of more movies on campus. The results by gender are shown in the table below In Favor 330 (Male) 225 (Female) No Opinion 165 (Male) 180 (Female) Opposed 55 (Male) 45 (Female) What proportion of the sampled students is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 What proportion of the sampled females is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 What proportion of sampled males is in favor of more movies on campus? A. 0.33 B. 0.5 C. 0.555 D. 0.6 To answer the original question regarding whether or not to offer more movies on campus, which distribution should the student organization study? A. The joint distribution of gender and opinion B. The marginal distribution of gender C. The conditional distribution of gender given opinion D. The conditional distribution of opinion given gender

Experiment is binomial success in this experiment is selecting a worker who is reducing the amount of vacation

A survey asks 1200 workers, "Has the economy forced you to reduce the amount of vacation you plan to take this year?" Forty-six percent of those surveyed say they are reducing the amount of vacation. Twenty workers participating in the survey are randomly selected. The random variable represents the number of workers who are reducing the amount of vacation. Decide whether the experiment is a binomial experiment.

Conveince Sample

A type of sample that often lead to biased studies (so it is not recommended) is a convenience sample. A convenience sample consists only of members of the population that are easy to get.

Outlier

Any point outside the whiskers of a box plot is an __________________________.

Q1 [32] - 1.5 (9.5) [IQR] = 17.75 Q3 [41.5] + 1.5 (9.5) [IQR] = 55.75 What does this tell us?

Anything below 17.75 and above 55.75 is a suspected outlier.

Shrinks

As p increases the 100p% likelihood interval __________________________.

Shoe sizes - since the shoe sizes of women are usually smaller than those of men, we'd expect 2 places where the bars are higher. This is an example of...

Bimodal graph

The key to establishing ____________ is to rule out the possibility of any________variable, or in other words, to ensure that individuals differ only with respect to the values of the ________ variable

Causation, lurking, explanatory

Median

Central observation when they are arranged in increasing order, or average of two central ones

Quantitative Data

Consists of numerical measurements or counts

The group that tries to quit w/o drugs or therapy

Control group

Stresses that the values of the experiments explanatory variables have been assigned by researchers as opposed to naturally

Controlled experiment

Null hypothesis

Conventional wisdom, if there is not evidence to reject it we fail to reject it.

Data at the interval level of measurement

Data at the interval level of measurement can be ordered, and meaningful differences between data entries can be calculated. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero

Data at the ratio level of measurement

Data at the ratio level of measurement are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data entries can be formed so that one data entry can be meaningfully expressed as a multiple of another.

C. 19.5 (Lies in the middle of the 20 values)

Data on the mileage of 20 randomly selected cars is listed below (ordered for convenience) 12 13 15 16 16 17 18 18 19 19 20 20 22 23 24 26 26 27 27 29 What is the median mileage for these 20 cars? A. 17.5 B. 19 C. 19.5 D. 20

The events ARE NOT mutually exclusive, since there IS AT LEAST 1 PRESIDENTIAL CANDIDATE WHO LOST THE POPULAR VOTE and LOST THE ELECTION

Determine whether the events in the accompanying Venn diagram are mutually exclusive

The events ARE NOT mutually exclusive, since there are SOME movies that are rated PG-13 and RECEIVE MOSTLY POSITIVE REVIEWS

Determine whether the events in the accompanying Venn diagram are mutually exclusive

D. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) - P(A and B)

Determine whether the following statement is true or false The probability that event A or event B will occur is P(A or B) = P(A) + P(B) - P(A or B) A. True B. False, the probability that A or B will occur is P(A or B) = P(A) * P(B) C. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) D. False, the probability that A or B will occur is P(A or B) = P(A) + P(B) - P(A and B)

A. The random variable is discrete, because it has a countable number of possible outcomes

Determine whether the random variable x is discrete or continuous Let x represent the number of fish caught during a fishing tournament A. The random variable is discrete, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is continuous, because it has a countable number of possible outcomes D. The random variable is discrete, because it has an uncountable number of possible outcomes

D. The random variable is discrete, because it has a countable number of possible outcomes

Determine whether the random variable x is discrete or continuous Let x represent the number of people with blood type A in a random sample of 21 people A. The random variable is continuous, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is discrete, because it has an uncountable number of possible outcomes D. The random variable is discrete, because it has a countable number of possible outcomes

D. The random variable is discrete, because it has a countable number of possible outcomes

Determine whether the random variable x is discrete or continuous Let x represent the number of statistics students now reading a book A. The random variable is continuous, because it has a countable number of possible outcomes B. The random variable is continuous, because it has an uncountable number of possible outcomes C. The random variable is discrete, because it has an uncountable number of possible outcomes D. The random variable is discrete, because it has a countable number of possible outcomes

A. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to 1 - P(none of the items)

Explain how the complement can be used to find the probability of getting at least one item of a particular type A. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to 1 - P(none of the items) B. The complement of "at least one" is "all". So, the probability of getting at least one item is equal to 1 - P(all items) C. The complement of "at least one" is "all". So, the probability of getting at least one item is equal to P(all items) - 1 D. The complement of "at least one" is "none". So, the probability of getting at least one item is equal to P(none of the items) - 1

D. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other

For the given pair of events, classify the two events as independent or dependent Finding that your cell phone works Finding that your DVD player works A. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other B. The two events are independent because the occurrence of one affects the probability of the occurrence of the other C. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other D. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other

Population attributes

Greek letters denote ____________________________, unknown constants.

The buyer of a local hiking club store recommends against buying the new digital altimeters because they vary more than the old altimeters, which had a standard deviation of one yard. Write the null and alternative hypotheses.

H0: Standard deviation ≤ 1 Ha: Standard deviation > 1 (see 7.1 in class practice)

The statement represents a claim. Write its complement and state which is H0 and which is Ha. µ ≤ 0.93

H0: p ≤ 0.93 (claim) Ha: p > 0.93 (see 7.1 in class practice)

People in an experiment who behave differently from how they would normally behave

Hawthorne effect

B. Each trial is independent of other trials if the outcome of one trial does not affect the outcome of any of the other trials

In a binomial experiment, what does it mean to say that each trial is independent of other trials A. Each trial is independent of other trials if no more than one trial occurs at a time B. Each trial is independent of other trials if the outcome of one trial does not affect the outcome of any of the other trials C. Each trial is independent of other trials if the sum of all the possible trial outcomes equals 1 D. Each trial is independent of other trials if the outcome of one trial affects the outcome of another trial

A. No, because the total probability is not equal to 1

Is the probability distribution a discrete distribution A. No, because the total probability is not equal to 1 B. Yes, because the distribution is symmetric C. No, because some of the probabilities have values greater than 1 or less than 0 D. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive

The relative likelihood function

L(theta)/L(thetahat)

A. 67 mph [(18x72)+(30x64)]/48 = 67 (48 = total number of people ticketed)

Last weekend police ticketed 18 men whose mean speed was 72 miles per hour, and 30 women going an average of 64 miles per hour. Overall, what was the mean speed of all the people ticketed? A. 67 mph B. 68 mph C. It cannot be determined D. None of those E. 69 mph

A brewery claims that the mean amount of beer in their bottles is at least 12 ounces. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. Sketch the distribution of the test statistic and label the P-value region.

Left-tailed (see 7.1 in class practice)

C. Drawing one card from a standard deck, not replacing it, and the selecting another card

List an example of two events that are dependent A. Rolling a die twice B. Selecting a ball numbered 1 through 12 from a bin, replacing it, and then selecting a second numbered ball from the bin C. Drawing one card from a standard deck, not replacing it, and the selecting another card D. Tossing a coin and getting a head, and then rolling a 6-sided die and obtaining a 6

B. Rolling a die twice

List an example of two events that are independent A. Not putting money in a parking meter and getting a parking ticket B. Rolling a die twice C. A father having hazel eyes and a daughter having hazel eyes D. Selecting a queen from a standard deck, not replacing it, and then selecting a queen from the deck

If the graph is symmetric...

Mean = median

Which 2 have NO outliers? 1) Mean and SD 2) Mean and IQR 3) IQR and SD 4) Median and IQR

Mean and SD

a + b*xbar

Mean of Y = a + bX with E(X) = xbar

Range

Measure of dispersion calculated as the maximum - minimum value.

Sample correlation coefficient

Measures both the strength and direction of the linear relationship between two random variables.

Example: a dietician obtains the amounts of sugar from 100 centigrade in each of 10 different cereals - 3 24 30 47 43 47 23 44 24 39 Which is the best measure of center? Why?

Median because there's an outlier

What are the 5 things included in a boxplot?

Min Q1 M Q3 Max

To model a normal problem with known mean and known variance, we use ____________________ as the pivotal quantity.

N(0,1)

Relative Risk

P(A|B)/P(A|B') for categorical variables A, B.

Quartiles

Q1, Q2, Q3

Determine whether the data are qualitative or quantitative and determine the level of measurement of the data set (i.e. nominal, ordinal, interval, or ration). The top five music albums for 2012 are listed: 1. Adele "21" 2. Michael Buble "Christmas" 3. Drake "Take Care" 4. Taylor Swift "Red" 5. One Direction "Up All Night"

Qualitative Ordinal because the data can be ranked (see quiz 1)

Ex 1: The table shows sports-related head injuries treated in U.S. emergency rooms during a recent five-year span for several sports. Which data are qualitative data, and which are quantitative data? (Source: BMC Emergency Medicine)

Qualitative: name of sport Quantitative: number of head injuries treated

Quantitative data

Quantitative data consist of numbers that are measurements or counts. Ex: Age, Weight of a letter, Temperature

Best way to prevent treatment groups of individuals from differing each other in ways other than the treatment assigned

Randomized assignment to treatment

The most reliable way to determine whether the explanatory variable is actually causing changes in the response variable

Randomized controlled double blind experiment

Discrete data

Refers to numerical data that can be counted

Continuous data

Refers to numerical data that can be measured

Observational study

Study where data collectors have no control over the conditions of the experiment

Experimental study

Study where data collectors have some control over the conditions of the experiment

Those ppl whom no specific treatment was imposed

Subjects

The results of rolling a 6-sided die 1,000 times is an example of

Symmetric uniform distribution

What do you think is the shape of the distribution of the age at which a child takes it's first steps?

Symmetric-Unimodal

Z/root(Chi-squared(n)/n)

T follows a T-distribution with n df if it is the independent ratio ________________________________.

To model a normal problem with n observations and unknown variance, we use ________________________ as the pivotal quantity.

T(n-1)

D. Breed

The SPCA collects the following data about the dogs they house. Which is categorical? A. Weight B. Number of days housed C. Veterinary costs D. Breed E. Age

E. Timeplot (Timeplots show how data has changed over time, and unlike bar charts, do not have categories)

The SPCA has kept these data records for the past 20 years. If they want to show the trend in the number of dogs they have housed, what kind of plot should they make? A. Bar graph B. Histogram C. Boxplot D. Pie chart E. Timeplot

A. preserves the individual data values

The advantage of making a stem-and-leaf display instead of a dotplot is that a stem-and-leaf display... A. preserves the individual data values B. satisfies the area principle C) A stem-and-leaf display is for quantitative data, while a dotplot shows categorical data D) none of these E) shows the shape of the distribution better than a dotplot

mean

The average of a set of numbers

D. An expected value of 0 means that the average money gained is equal to the average money spent, representing the break-even point

The expected value of an accountants profit an loss analysis is 0. Explain what this means A. Since the expected value cannot be less than 0, an expected value of 0 means that the average money gained is equal to or less than the average money spent B. An expected value cannot be equal to 0 C. An expected value of 0 means that there was not any money gained or spent D. An expected value of 0 means that the average money gained is equal to the average money spent, representing the break-even point

Four Levels of Measurement

The four levels of measurement, in order from lowest to highest, are nominal, ordinal, interval, and ratio

D. Yes. The probability of the event is close to 0

The frequency distribution shows the number of voters (in millions) according to age. consider the event below. Can it be unusual? A voter chosen at random is between 21 and 24 years old A. No. The probability of the event is not close to 0 B. No. The probability of the event is not close to 1 C. Yes. The probability of the event is close to 1 D. Yes. The probability of the event is close to 0

The MLE of g(theta) = g(thetahat)

The invariance property of the MLE says ___________________________________.

Min value >= Q1-1.5IQR

The lower whisker of a box and whisker plot ends at ___________________________.

B. The 50 monthly interest rates have a distribution which is skewed to the right. (Mean is larger than the median)

The mean and the median of a sample of 50 home mortgage monthly interest rates are 6.5 and 5.0 percent, respectively. Comment on the distribution of the 50 home mortgage interest rates. A. The 50 monthly interest rates have a distribution which is skewed to the left. B. The 50 monthly interest rates have a distribution which is skewed to the right. C. A few mortgages are very low, which makes the median smaller than the mean. D. Half of the mortgages are at rates greater than 6.5 percent. E. A few mortgages are at very low rates, pulling the mean up.

C. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10

The probability of winning an instant prize game is 1/10. The offs of winning a different instant prize game are 1:10. If you want the best chance of winning, which game should you play? A. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/9, which is greater than 1/10 B. You should play the second game because the probability of winning a game with 1:10 odds of winning is 1/9, which is greater than 1/10 C. You should play the first game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10 D. You should play the second game because the probability of winning a game with 1:10 odds of winning is 1/11, which is less than 1/10

Variate

The property of the unit of interest in the study

C. Skewed to the right (Most of the data would be focused in the lower range, with few in the upper, making this skewed to the right)

The salaries of MLB players range from several hundred thousand dollars per year to very few earning in the millions. Suppose a histogram is made of all last year's salaries of major league baseball players. Which shape would best describe the shape of this histogram? A. Skewed to the left B. Bell shaped C. Skewed to the right D. Bimodal

Solve the problem. The mean age of bus drivers in Chicago is 56.9 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is not sufficient evidence to reject the claim = 56.9. There is sufficient evidence to support the claim = 56.9. There is not sufficient evidence to support the claim = 56.9. There is sufficient evidence to reject the claim = 56.9.

There is not sufficient evidence to reject the claim = 56.9.

Solve the problem. The mean age of bus drivers in Chicago is greater than 56.2 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to support the claim μ > 56.2. There is not sufficient evidence to reject the claim μ > 56.2. There is sufficient evidence to reject the claim μ > 56.2. There is not sufficient evidence to support the claim μ > 56.2.

There is not sufficient evidence to support the claim > 56.2.

Solve the problem. The mean score for all NBA games during a particular season was less than 91 points per game. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to reject the claim μ < 91. There is sufficient evidence to support the claim μ < 91. There is not sufficient evidence to reject the claim μ < 91. There is not sufficient evidence to support the claim μ < 91

There is not sufficient evidence to support the claim μ < 91

Solve the problem. The mean IQ of statistics teachers is greater than 130. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to reject the claim μ > 130. There is not sufficient evidence to support the claim μ > 130. There is not sufficient evidence to reject the claim μ > 130. There is sufficient evidence to support the claim μ > 130.

There is not sufficient evidence to support the claim μ > 130.

Solve the problem. The mean age of bus drivers in Chicago is greater than 56.2 years. If a hypothesis test is performed, how should you interpret a decision that fails to reject the null hypothesis? There is sufficient evidence to support the claim μ > 56.2. There is not sufficient evidence to reject the claim μ > 56.2. There is sufficient evidence to reject the claim μ > 56.2. There is not sufficient evidence to support the claim μ > 56.2.

There is not sufficient evidence to support the claim μ > 56.2.

Solve the problem. The mean age of bus drivers in Chicago is 47.4 years. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis? There is sufficient evidence to support the claim μ = 47.4. There is not sufficient evidence to reject the claim μ = 47.4. There is not sufficient evidence to support the claim μ = 47.4. There is sufficient evidence to reject the claim μ = 47.4.

There is sufficient evidence to reject the claim μ = 47.4.

The mean score for all NBA games during a particular season was less than 104 points per game. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis?

There is sufficient evidence to support the claim µ<104 (see 7.1 in class practice)

Solve the problem. The mean IQ of statistics teachers is greater than 160. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis? There is not sufficient evidence to support the claim μ > 160. There is sufficient evidence to support the claim μ > 160. There is not sufficient evidence to reject the claim μ > 160. There is sufficient evidence to reject the claim μ > 160.

There is sufficient evidence to support the claim μ > 160.

Use the 4 conditions of a Binomial Experiment to decide if the following situation represents a Binomial Experiment. If it satisfies all conditions, state the values of p, q, n, and all possible values of the random variable. A survey found that 49% of US household own a dedicated gaming console. Eight US households are randomly selected. The random variable represents the number of US households that own a dedicated console.

This is a binomial experiment because: 1. There is a set number of trials 2. There is a possibility of success or failure 3. The outcomes are independent 4. Random variable x = the number of successes n=8, p=0.49, q=0.51 Possible x values: 0, 1, 2, 3, 4, 5, 6, 7, 8 (see quiz 5)

The log likelihood function

To take derivatives and find the MLE of theta, we typically first find __________________________________.

A researcher claims that 71% of voters favor gun control. Determine whether the hypothesis test for the claim is left-tailed, right-tailed, or two tailed. Sketch the distribution of the test statistic and label the P-value region.

Two-tailed (see 7.1 in class practice)

The mean age of bus drivers in Chicago is 51.3 years. Identify the type I and type II errors for the hypothesis test of this claim.

Type I: rejecting H0: µ=51.3 when µ=51.3 Type II: failing to reject H0: µ=51.3 when µ≠51.3 (see 7.1 in class practice)

The mean IQ of statistics teachers is greater than 130. Identify the type I and type II errors for the hypothesis test of this claim.

Type I: rejecting H0: µ≤130 when µ≤130 Type II: failing to reject H0: µ≤130 when µ>130 (see 7.1 in class practice)

Solve the problem. For a sample of 20 IQ scores the mean score is 105.8. The standard deviation, , is 15. Determine whether a normal distribution or a t-distribution should be used or whether neither of these can be used to construct a confidence interval. Use a t-distribution Use a normal distribution Neither a normal distribution nor a t-distribution can be used.

Use a normal distribution.

D. It would be unusual because the probability of having no HD televisions is less than 0.05

Using the data given below, determine whether it would be unusual for a household to have no HD televisions A. It would not be unusual because 52 people have no HD televisions in the town B. It would not be unusual because the probability of having no HD televisions is more than 0.05 C. It would be unusual because 52 people have no HD televisions in the town D. It would be unusual because the probability of having no HD televisions is less than 0.05

A fast food outlet chain claims that the mean waiting time in line is less than 3.8 minutes. A random sample of 60 customers has a mean of 3.7 minutes with a population standard deviation of 0.6 minutes. If alpha = 0.05, use the P-value to test the fast food outlets claim.

We are testing for µ and standard deviation is known. Sample size, n, is 60 ≥ 30. Use z-test. H0: µ ≥ 3.8 Ha: µ <3.8 (claim) - Left-tailed test alpha = 0.05 z = -1.29 P-value = normalcdf(-1x10^99, -1.29, 0, 1) = 0.0985 Fail to reject H0. There is not enough evidence to support the fast food outlets claim that the mean waiting time is less than 3.8 minutes. (see 7.2 in class practice)

B. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities is 1

What are two conditions that determine a probability distribution A. The probability of each value of the discrete random variable is greater than 0 and less than 1, and the sum of all the probabilities can be any amount B. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities is 1 C. The probability of each value of the discrete random variable is between 0 and 1, and the sum of all the probabilities can be any amount D. The probability of each value of the discrete random variable is greater than 0 and less than 1, and the sum of all the probabilities is 1

C. The probability of event B occurring, given that event A has occurred

What does the notation P(B|A) mean A. The probability of both event A and event B occurring B. The probability of event A occurring, given that event B has occurred C. The probability of event B occurring, given that event A has occurred D. The probability of event B occurring, divided by the probability of event A occurring

C. A discrete probability distribution lists each possible value a random variable can assume, together with its probability

What is a discrete probability distribution A. A discrete probability distribution lists each possible value a random variable can assume B. A discrete probability distribution exclusively lists probabilities C. A discrete probability distribution lists each possible value a random variable can assume, together with its probability D. None of the above

A. The outcome of a probability experiment is often a count or a measure. When this occurs, the outcome is called a random variable

What is a random variable A. The outcome of a probability experiment is often a count or a measure. When this occurs, the outcome is called a random variable B. A variable is random when it has a finite or countable number of possible outcomes that can be listed C. A variable is random when it has an uncountable number of possible outcomes D. The outcome of a probability experiment is often a category or label. When this occurs, the outcome is called a random variable

An outcome is the result of a single probability experiment. An event is a set of one or more possible outcomes.

What is the difference between an outcome and an event? A. An outcome is the result of a single probability experiment. An event is the set of all possible outcomes B. An event is the result of a single probability experiment. An outcome is the set of all possible events C. An outcome is the result of a single probability experiment. An event is a set of one or more possible outcomes. D. An event is the result of a single probability experiment. An outcome is a set of one or more possible events

C. Two events are independent when the occurrence of one event does not affect the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event affects the probability of the occurrence of the other event

What is the difference between independent and dependent events? A. Two events are independent when the occurrence of one event affects the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event does not affect the probability of the occurrence of the other event B. Two events are independent if only one of the two events can occur. Two events are dependent if they can occur at the same time C. Two events are independent when the occurrence of one event does not affect the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event affects the probability of the occurrence of the other event D. Two events are independent if they occur at the same time. Two events are dependents if only one of the two events can occur.

frequency polygon

a line graph that emphasizes the continuous change in frequencies. x-axis= Midpoint Y-Axis= frequencies *graph must extent and end at zero

Randomization

a process of randomly assigning subjects to different treatment groups

In a sample of 2,016 US adults, 383 said Franklin Roosevelt was the best president since WWII. Two US adults are selected at random without replacement. a. Find the probability that both adults think Roosevelt was the best president. b. Find the probability that neither adult thinks Roosevelt was the best president. c. Find the probability that at least one of the adults thinks Roosevelt was the best president.

a. 0.036 b. 0.656 c. 0.344 (see quiz 4)

A bag of marbles contains 7 red marbles, 5 blue marbles, and 9 green marbles. Two marbles are selected from the bag without replacement. a. What is the probability of getting a red and a blue? b. What is the probability of getting two reds? c. What is the probability of getting no greens? d. What is the probability of getting at least 1 green?

a. 0.083 b. 0.1 c. 0.314 d. 0.686 (test 2 #16)

The table (quiz 4 #4) shows data for students at University of Oklahoma Health Science Center. a. Find the probability that a randomly selected student is male, given that the student is a nursing major. b. Find the probability that the student is male or a nursing major.

a. 0.115 b. 0.533

The distribution of cholesterol levels in teenage boys is approximately normal with a mean of 170 and a standard deviation of 30. a. Find the probability that a teenage boy has a cholesterol level less than 144. b. Find the probability that a teenage boy has a cholesterol level greater than 240.5. c. In a sample of 30 teenage boys, how many would you expect to have a cholesterol level less than 144?

a. 0.1922 b. 0.0094 c. 5.7660 (see test 3 #12)

Jaylen Brown currently has an abysmal free throw percentage of 59%. Assume that making or missing a free throw has no effect on his next free throw. Suppose he takes his 17 free throws in his next game. a. Find the probability that he makes exactly 10 free throws. b. Find the probability that he makes at least 14 free throws. c. Find the probability that he makes less than 14 free throws.

a. 0.1936 b. 0.390 c. 0.961 (see test 3 #5)

Use the table (test 2 #14) to find each probability. a. Probability of selecting a junior. b. Probability of not selecting a junior. c. probability of selecting a student that plays sports, given that the student is a senior. d. Probability that the student is a freshman and plays sports. e. Probability that the student is a freshman or plays sports.

a. 0.239 b. 0.761 c. 0.272 d. 0.107 e. 0.531

Sixty percent of US adults trust national newspapers to present the news fairly and accurately. You randomly select 9 US adults. Let x equal the number of adults from the 9 selected that think newspapers present the news fairly. a. Find P(x=5), the probability that exactly 5 of the 9 adults though news was presented fairly. b. Find the probability that at least 6 of the 9 adults thought the news was presented fairly.

a. 0.2508 b. 0.4826 (see quiz 5)

The table below shows the results of an experiment at which a ball is dropped, x, to the height of its first bounce, y. Drop height (x): 100 90 80 70 60 Bounce height (y): 26 23 21 18 16 a. Calculate the correlation coefficient and interpret your results. b. Give the equation for the line of regression. c. Make a prediction for the bounce height if the ball is dropped from a height of 50 cm or explain why it is not meaningful to do so.

a. 0.998 Strong positive correlation b. y = 0.25x + 0.8 c. Not meaningful because this is outside the original range of data.

State if the random variable is continuous or discrete. a. The number of red Sour Patch Kids in a bag of candy. b. The vertical height of an Olympic pole jumper.

a. Discrete b. Continuous (see test 3 #1-2)

In a survey of US men, the heights in the 20-29 age group were normally distributed with a mean of 69.4 inches and a standard deviation of 2.9 inches. Use your calculator to find the probability that a randomly selected study participant has a height that is a. less than 66 inches b. between 66 and 72 inches c. more than 72 inches

a. normalcdf(-1x10^99, 66, 69.4, 2.9) = 0.1205 b. normalcdf(66, 72, 69.4, 2.9) = 0.6945 c. normalcdf(72, 1x10^99, 69.4, 2.9) = 0.1850 (see quiz 6)

A probability experiment consists of rolling a 6-sided die and spinning a spinner that has four colors; red, blue, green, and yellow. You are equally likely to land on each color. a. create a tree diagram to describe the sample space. b. What is the probability of rolling a number less than 6 on the die and having the spinner land on yellow?

a. see quiz 4 #2 b. 0.208

The caloric contents and sodium contents of 10 hotdogs are listed in the data below. Calories, x: 150 170 120 120 90 180 170 140 90 110 Sodium, y: 420 470 350 360 270 550 530 460 380 330 a. Give the equation for the regression line. Round values to the nearest hundredth. b. Use the regression line to make a prediction for the sodium content of a hotdog with 140 calories or explain why it is not meaningful to make such a prediction. c. Use the regression line to make a prediction for the sodium content of a hotdog with 210 calories or explain why it is not meaningful to make such a prediction.

a. y = 2.47x + 80.81 b. 426.61 mg c. This is not meaningful because 210 is outside of the original data values. (see quiz 3)

SAT writing scores are normally distributed with mean = 448 and standard deviation = 114. a. Use the z-table to determine what percentage of scores were greater than 500. b. If 1000 SAT writing scores are randomly selected, how many of the scores would be greater then 500?

a. z = 0.3228 b. About 323 (see quiz 6)

Find the critical value and rejection region for the type of z-test with level of significance alpha. a. Right-tailed test, alpha = 0.01 b. Left-tailed test, alpha = 0.05 c. Two-tailed test, alpha = 0.01

a. z0 = 2.33 Rejection region: z > 2.33 b. z0 = -1.645 Rejection region: z < -1.645 c. -z0 = -2.575, z0 = 2.575 Rejection region: z < -2.575, z > 2.575 (see 7.2 in class practice)

Represents the distribution of a quantitative variable by visually displaying the 5 number summary and any observations that were classified as a suspected outlier using the 1.5 IQR criterion

boxplot

Solve the problem. Compare the scores: a score of 220 on a test with a mean of 200 and a standard deviation of 21 and a score of 90 on a test with a mean of 80 and a standard deviation of 8. a) A score of 220 with a mean of 200 and a standard deviation of 21 is better. b) The two scores are statistically the same. c) A score of 90 with a mean of 80 and a standard deviation of 8 is better. d) You cannot determine which score is better from the given information.

c) A score of 90 with a mean of 80 and a standard deviation of 8 is better.

Interval Data

can be ordered and differences have meaning ex. temperature scales

Ordinal Data

can be put into order ex. top 10 cities in the US or letter grades

When we are given a histogram of the data (w/o actual data) we _________ (can/cannot) determine the ___________ (mean/median/mode), but only determine what could be a possible value for the ___________ and what values of the __________.

cannot, median, median, median

Solve the problem. Classify the statement as an example of classical probability, empirical probability, or subjective probability. In California's Pick Three lottery, a person selects a 3-digit number. The probability of winning California's Pick Three lottery is 1/1000. empirical probability classical probability subjective probability

classical probability

Identify the sampling technique used. A researcher for an airline interviews all of the passengers on five randomly selected flights.

cluster

Identify the sampling technique used. A researcher randomly selected 25 of the nation's middle schools and interviewed all of the teachers at each school.

cluster

Solve the problem. A researcher for an airline interviews all of the passengers on five randomly selected flights. What sampling technique is used? systematic random convenience stratified cluster

cluster

Solve the problem. At a local community college, five statistics classes are randomly selected and all of the students from each class are interviewed. What sampling technique is used?

cluster

Qualitative Data

consists of attributes, labels, or nonnumerical entries (quality)

Census

consists of data from an entire population. But, unless a population is small, it is usually impractical to obtain all the population data. In most studies, information must be obtained from a random sample.

Data

consists of information coming from observations, counts, measurements, or responses

Data

consists of information coming from observations, counts, measurements, or responses.

Census

count or measure of an entire population

Sampling

count or measure part of the population

When calculating the median, what's the 2nd step

counting the number of observations

two major branches of stat

descriptive inferential

From past figures, it is predicted that 19% of the registered voters in California will vote in the June primary.

descriptive statistics

Random sample

every member of the population has an equal chance of being selected.

Simple Random Sample

every possible sample of the same size has the same chance of being selected from the population

Decide which method of data collection you would use to collect data for the study. Specify either observational study, experiment, simulation, or survey. A study where a drug was given to 23 patients and a placebo to another group of 23 patients to determine if the drug has an effect on a patient's illness

experiment

Recruit participants; while they're being interviewed, 1/2 sit in a waiting room w/ snacks and TV on. Other 1/2 sit in waiting room with just snacks. Researchers determine whether ppl consume more snacks in the TV setting. This is an example of an

experiment

researchers interfere and they assign the values of the explanatory variables to the individuals

experiment

The independent variable; the variable that claims to explain, predict, or affect the response

explanatory variable

Solve the problem. Given H0: µ ≥18and P = 0.085. Do you reject or fail to reject H0 at the 0.05 level of significance? reject H0 fail to reject H0 not sufficient information to decide

fail to reject H0

Solve the problem. Given H0: μ = 25, Ha: μ ≠ 25, and P = 0.028. Do you reject or fail to reject H0 at the 0.01 level of significance? reject H0 fail to reject H0 not sufficient information to decide

fail to reject H0

T/F Association imply's causation

false

T/F every relationship between 2 quantitative variables has a linear form

false

a graph's general shape

form

We want to explore if the score on a test if affected by the test-takers gender. Which is the: - explanatory variable - response variable

gender - explanatory test score - response

Inferential Statistics

generalize the information learned from the sample to the entire population

blocks

groups of subjects with similar characteristics. A commonly used experimental design is a randomized block design. The experimenter divides the subjects with similar characteristics into blocks, and then, within each block, randomly assigns subjects to treatment groups. Ex: An experimenter who is testing the effects of a new weight loss drink may first divide the subjects into age categories, and then, within each age group, randomly assigned subjects to either the treatment group or the control group

Match the statement with the appropriate letter. a. Nominal b. Ordinal c. Interval d. Ratio e. Observational f. Experiment g. Simulation h. Survey i. Control j. Placebo k. Double Blind l. Selection Bias m. Response Bias i. The level of measurement for qualitative data. ii. A fake treatment given to the control in an experiment. iii. The level of measurement for data where the data can be ordered, but the differences between data has no meaning. iv. A model used to reproduce conditions that would be difficult to observe otherwise. v. Issues that occur as a result of errors in the measurement of data. vi. The level of measurement where difference between data values has meaning, but zero does not mean the absence of value.

i. a) Nominal ii. j) Placebo iii. b) Ordinal iv. g) Simulation v. m) Response Bias vi. c) Interval (see test 1 #5-10)

Solve the problem. Classify the events as dependent or independent. The events of getting two aces when two cards are drawn from a deck of playing cards and the first card is replaced before the second card is drawn. independent dependent

independent

A recent test in a statistics class had a mean score of 78 and a standard deviation of 8.5. Use your calculator to find the score that is needed to be in the top 15%.

invnorm(.85, 78, 8.5) = 86.8097 (see test 3 #13)

Sampling

is a count or measure of part of a population and is more commonly used in statistical studies. To collect unbiased data, a researcher much ensure that the sample is representative of the population. Appropriate sampling techniques must be used to ensure the inferences about the population are valid.

simple random sample

is a sample in which every possible sample of the same size has the same chance of being selected. -Random numbers can be generated by random number table, a software program, or calculator. Assign a number to each member of the population. -Members of the population that correspond to those numbers become members of the sample

among all the lines that look good on your data, choose the one that has the smallest sum of squared vertical deviations

least squares criterion

Classical Probability

long term expected results

For the dot plot below, what is the maximum and what is the minimum entry? max: 14; min: 12 max: 54; min: 12 max: 54; min: 15 max: 17; min: 12

max: 17; min: 12

In a histogram, CENTER refers to the...

median

Parameters

numerical summaries of a population

Observations that fall outside the overall pattern

outliers

correlation is heavily influenced by ________

outliers

Solve the problem. What method of data collection would you use to collect data for a study where a drug was given to 57 patients and a placebo to another group of 57 patients to determine if the drug has an effect on a patient's illness? use sampling use a simulation take a census perform an experiment

perform an experiment

Two types of statistical data sets studied in STAT 231

population sample

Subjective Probability

probability based in intuition, educated guesses, and estimates made by someone who is knowledgeable in the field.

any plan that relies on random selection

probability sampling plan

1. measure of spread and mean is a measure of center 2. the only way its equal to 0 is if all the observations have the same value 3. Its strongly influenced by outliers

properties of standard deviation

the values of the variables of interest are recorded forward in time

prospective

2 types of observational studies

prospective retrospective

Nominal Level of Measurement

qualitative ONLY (no math)

Solve the problem. Classify the number of seats in a movie theater as qualitative data or quantitative data. qualitative data quantitative data

quantitative data

Calculate the correlation coefficient and describe the type of correlation (strong/weak, positive/negative) for the data below. Interpret what the correlation coefficient tells you. Earnings per share, x: 2.79 5.10 4.53 3.06 3.70 2.20 Dividends per share, y: 0.52 2.40 1.46 0.88 1.04 0.22

r = 0.976 Strong positive correlation (see quiz 3)

Identify the sampling technique used. A lobbyist for a major airspace firm assigns a number to each legislator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the legislators corresponding to these numbers.

random

Solve the problem. A lobbyist for a major airspace firm assigns a number to each legislator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the legislators corresponding to these numbers. What sampling technique was used?

random

the technique that specifies the dependence of the response

regression

Replication

reputation of an experiment on more than one subject in order to measure variation within and between groups

the values of the variables are recorded backward in time

retrospective

Solve the problem. A car maker claims that its new sub-compact car gets better than 52 miles per gallon on the highway. Determine whether the hypothesis test for this is left-tailed, right-tailed, or two-tailed. left-tailed two-tailed right-tailed

right-tailed

inferential statistics

s is the branch of statistics that involves using a sample data to draw conclusions about a population. A basic tool in the study of inferential statistics is probability to "infer"

Use a stem and leaf plot to display the data. The data represent the scores of a biology class on a midterm exam. 75, 85, 90, 80, 87, 67, 82, 88, 95, 91, 73, 80, 83, 92, 94, 68, 75, 91, 79, 95, 87, 76, 91, 85

see quiz 2 #3

Below are the NBA MVP winners from the last 10 years. Make a Pareto Chart for the data. 2016-17 Russel Westbrook 2015-16 Stephen Curry 2014-15 Stephen Curry 2013-14 Kevin Durant 2012-13 LeBron James 2011-12 LeBron James 2010-11 Derrick Rose 2009-10 LeBron James 2008-09 LeBron James 2001-08 Kobe Bryan

see test 1 #25

Sample Space

set of all outcomes for a probability experiment

if people are sampled completely at random; doesn't mean it's equally representative

simple random sample

Scenario: Kidney Stones - Treatment A and B - 700 subjects with kidney stones participated - Found that success rate of treatment A was higher than B - 2 groups of patients: (1) those with large stones (2) those with small stones. -Results: treatment B was more effective. This study was an example of ___________ ___________ The _______ _______ is the ________ variable

simpson's paradox, kidney stone, lurking

Here are the number of hours that nine students spend on the computer on a typical day: 1 6 7 5 5 8 11 12 15 --------------------- The median # of hours is 7...why?

since n=9, the median is 9+1/2 = 5th observation in the ordered list, which is 7.

Age of death from natural causes (heart disease, cancer, etc) is an example of

skewed left distribution

Prices of 1,000 California homes is an example of

skewed right distribution

a measure of spread; gives the average between a data point and the mean

standard deviation definition

collecting data

study design

Stratified Sample

subdivide the population into at least two different subgroups so that subjects within the same subgroup share similar characteristics, then select a random sample from each subgroup.

Cluster Sampling

subdivide the population into sections, then randomly select entire clusters and choose all the members from those selected clusters.

Inferential statistics

the branch of statistics that involves using a sample to draw conclusions about a population

Scenario: A psychologist selects a sample of kids ages 6-13 and measures their shoe size and childs vocabulary. What's the lurking variable?

the child's age

Population

the collection of all outcomes, responses, measurements, or counts that are of interest (Ex: everyone in a classroom)

Population

the complete collection of all elements, objects, individuals, or events to be studied

Q1

the median from the beginning to Q2

Independent Events

the occurrence of one event does NOT affect the occurrence of another event

Range of Probabilities Rule

the probability of an event E is between 0 and 1 0 ≤ P(E) ≤ 1

Statistics

the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

midpoint

the sum of the lower and upper limits of the class divided by two

median

the value that lies in the middle

Solve the problem. A researcher claims that 73% of voters favor gun control. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. right-tailed left-tailed two-tailed

two-tailed

a sample that produces data that represents the population

unbiased sample

Solve the problem. What method of data collection would you use to collect data for a study where a political pollster wishes to determine if his candidate is leading in the polls?

use sampling

if r is close to zero it is a...

weak linear relationship

Solve the problem. Given a sample with r = -0.765, n = 22, and α = 0.02, determine the critical values t0 necessary to test the claim E = 0. ± 2.080 ± 2.831 ± 2.528 ± 1.721

± 2.528

Solve the problem. Given a sample with r = 0.321, n = 30, and = 0.10, determine the critical values t0 necessary to test the claim Ε = 0. ± 1.311 ± 0.683 ± 1.701 ± 2.462

±1.701

Solve the problem. Find the area under the standard normal curve between z = 0 and z = 3. 0.4641 0.9987 0.0010 0.4987

0.4987

Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) Major Frequency Mathematics 216 English 207 Engineering 85 Business 175 Education 215 What is the probability that a randomly selected student with a Master's degree majored in Business, Education or Engineering? Round your answer to three decimal places. 0.334 0.290 0.529 0.471

0.529

Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. (assume that a student majors in only one subject) major Frequency mathematics 216 English 207 Engineering 79 Business 179 Education 226 What is the probability that a randomly selected student with a Master's degree majored in Business, Education or Engineering? Round your answer to three decimal places. 0.532 0.282 0.468 0.337

0.532

Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. has cc no cc 13 47 22 18 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a sophomore. Round your answer to three decimal places. 0.629 0.450 0.550 0.220

0.550

Solve the problem. A community college student interviews everyone in a statistics class to determine who owns a car. What sampling technique is used?

convenience

where individuals happen to be at the right place at the right time and and place to suit the schedule of the researcher; biased

convenience sample

Solve the problem. Classify the events as dependent or independent. Event A: A red candy is selected from a package with 30 colored candies and eaten. Event B: A blue candy is selected from the same package and eaten. dependent independent

dependent

Solve the problem. Classify the events as dependent or independent. Events A and B where P(A) = 0.8, P(B) = 0.1, and P(A and B) = 0.07 dependent independent

dependent

Randomization

process of randomly assigning subjects to different treatment groups.

What are the 3 study designs

Observational Experiment Sample survey

What are the two types of charts used for a Categorical variable?

Pie chart bar chart

Upper quartile

Data value below which 75% of the data lies

Percentile

Data values which divide data into 100 parts; p% of the data lies below the percentile qp..

Example: recruit participants - ask them to recall, for each hour of the previous day, whether they were watching TV, and what snacks they consumed each hour. Determine if food consumption was higher during the TV times

Retrospective study

Response variable

Dependent variable in a causal problem.

Sampling error

Difference between the sample and study population

Relative frequency equation

Frequency/sum of all frequencies x 100

What are the 3 numerical measures?

Mean Median Mode

Blind study

Test patients are unaware of certain conditions in the experiment

A. 2000

The IQR for the following data is approximately... Minimum: 2000 Q1: 3500 Median: 4500 Q3: 5500 Maximum: 8000 A. 2000 B. 4000 C. 6000 D. 7000 E. None of these

Discrete Data

data that can be counted (0, 1, 2, 3). never has values that are decimals or fractions

example: -0.74 what's the correlation?

moderately strong and linear

When calculating the median, whats the first step

putting the observations in order

Solve the problem. Calculate the correlation coefficient, r, for the data below. x: -10 -8 -1 -4 -6 -7 -5 -3 -2 -9 y: 2 -3 -15 -10 -0.885 -0.778 -0.995 -0.671

-0.995

Solve the problem. Find the critical value for a left-tailed test with α = 0.025 and n = 50. -2.575 -1.96 -1.645 -2.33

-1.96

Solve the problem. Construct a 95% prediction interval for y given x = -3.5, = 2.097x - 0.552 and se = 0.976. x: -5 -3 4 1-1 -2 0 2 3-4 Y: -10 -8 9 1 -2 -6 -1 3 6 -8 -3.187 < y < -2.154 -10.367 < y < -5.417 -4.598 < y < -1.986 -12.142 < y < -6.475

-10.367 < y < -5.417

Solve the problem. Construct a 95% prediction interval for y given x = -3.5, y = 2.097x - 0.552 and se = 0.976. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: -10 -8 9 1 -2 -6 -1 3 6 -8 -3.187 < y < -2.154 -10.367 < y < -5.417 -4.598 < y < -1.986 -12.142 < y < -6.475

-10.367 < y < -5.417

You wish to test the claim that μ = 940 at a level of significance of α = 0.01 and are given sample statistics n = 35, x = 910 and s = 82. Compute the value of the standardized test statistic. Round your answer to two decimal places. -3.82 -2.16 -5.18 -4.67

-2.16

Solve the problem. Given a sample with r = -0.541, n = 20, and α = 0.01, determine the standardized test statistic t necessary to test the claim Ε = 0. Round answers to three decimal places. -5.132 -3.251 -4.671 -2.729

-2.729

skewed left

-the tail is on the left side

Solve the problem. A coffee machine dispenses normally distributed amounts of coffee with a mean of 12 ounces and a standard deviation of 0.2 ounce. If a sample of 9 cups is selected, find the probability that the mean of the sample will be greater than 12.1 ounces. 0.9332 0.2123 0.0668 0.3216

0.0668

Solve the problem. The lengths of pregnancies of humans are normally distributed with a mean of 268 days and a standard deviation of 15 days. Find the probability of a pregnancy lasting less than 250 days. 0.1151 0.0066 0.1591 0.0606

0.1151

Of the cartons produced by a company, 5% have a puncture, 8% have a smashed corner, and 0.4% have both a puncture and a smashed corner. Find the probability that a randomly selected carton has a puncture or has a smashed corner.

0.126 (see quiz 4)

Solve the problem. A survey of 100 fatal accidents showed that 13 were alcohol related. Find a point estimate for p, the population proportion of accidents that were alcohol related. 0.87 0.149 0.13 0.115

0.13

Solve the problem. A survey of 2650 golfers showed that 392 of them are left-handed. Find a point estimate for p, the population proportion of golfers that are left-handed. 0.129 0.174 0.852 0.148

0.148

Solve the problem. A random sample of 150 students has a grade point average with a standard deviation of 0.78. Find the margin of error if c = 0.98. 0.12 0.08 0.11 0.15

0.15

Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. has cc no cc 24 36 37 3 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a freshman. Round your answer to three decimal places. 0.400 0.240 0.600 0.393

0.400

Use the z-table (quiz 5 #3) to find the shaded area.

0.4878

Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. class cc carrier no cc Freshman 13 47 Sophomore 22 18 If a student is selected at random, find the probability that he or she owns a credit card given that the student is a sophomore. Round your answer to three decimal places. 0.550 0.450 0.220 0.629

0.550

Solve the problem. Find the area of the indicated region under the standard normal curve. 0.309 0.6562 1.309 0.3438

0.6562

Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table class has cc no cc freshman 11 49 sophomore 27 13 0.289 0.270 0.711 0.980

0.711

Solve the problem. Find the area under the standard normal curve to the right of z = -1.25. 0.7193 0.6978 0.5843 0.8944

0.8944

Solve the problem. Find the standardized test statistic t for a sample with n = 15, x = 5.4000001, s = 0.8, and α = 0.05 if H0: μ ≤ 5.0999999. Round your answer to three decimal places. 1.728 1.452 1.631 1.312

1.452

Solve the problem. A researcher found a significant relationship between a person's age, x1, the number of hours a person works per week, x2, and the number of accidents, y, the person has per year. The relationship can be represented by the multiple regression equation y = -3.2 + 0.012x1 + 0.23x2. Predict the number of accidents per year (to the nearest whole number) for a person whose age is 37 and who works 54 hours per week. 10 11 9 12

10

Solve the problem. The access code to a house's security system consists of five digits. How many different codes are available if each digit can be repeated? 3125 5 100,000 32

100,000

Solve the problem. Use Bayes' theorem to solve this problem. A storeowner purchases stereos from two companies. From Company A, 550 stereos are purchased and 1% are found to be defective. From Company B, 850 stereos are purchased and 6% are found to be defective. Given that a stereo is defective, find the probability that it came from Company A. 66/113 11/113 102/113 17/113

11/113

Solve the problem. The Environmental Protection Agency must visit nine factories for complaints of air pollution. In how many different ways can a representative visit five of these to investigate this week? 45 362,880 15,120 5

15,120

Solve the problem. For the following data, approximate the mean number of phone calls per day. 8-11 31 12-15 34 16-19 28 20-23 30 24-27 6 16 15 14 26 17

16

B. 175 219.5 299 350 549 (Five-number summary consists of the minimum, 1st Quartile, Median, 3rd Quartile, and maximum)

175 199 205 234 259 275 299 304 317 345 355 384 549 What is the five-number summary? A. 175 234 299 345 549 B. 175 219.5 299 350 549 C. 175 219.5 299 350 384 D. 175 234 299 331 549

C. One outlier: 549

175 199 205 234 259 275 299 304 317 345 355 384 549 IQR = 111 (?) Which of the following is true? A. No outliers present B. One outlier: 175 C. One outlier: 549 D. Two outliers: 175 and 549

bimodal

2 numbers are most occurring

Solve the problem. A card is drawn from a standard deck of 52 playing cards. Find the probability that the card is an ace or a king. 4/13 2/13 8/13 1/13

2/13

Solve the problem. The grade point averages for 10 students are listed below. Find the range of the data. 2.0 3.2 1.8 2.9 .9 4.0 3.3 2.9 3.6 .8 2.8 1.4 2.45 3.2

3.2

Provide an appropriate response. The mean SAT verbal score is 478, with a standard deviation of 98. Use the Empirical Rule to determine what percent of the scores lie between 380 and 478. (Assume the data set has a bell-shaped distribution.) 34% 47.5% 68% 49.9%

34%

Solve the problem. SAT verbal scores are normally distributed with a mean of 426 and a standard deviation of 94. Use the Empirical Rule to determine what percent of the scores lie between 332 and 426. 49.9% 34% 68% 47.5%

34%

Solve the problem. A random sample of 40 students has a test score average with a standard deviation of 11.7. Find the margin of error if c = 0.98. 1.81 1.85 4.31 0.68

4.31

Provide an appropriate response. A teacher gives a 20-point quiz to 10 students. The scores are listed below. What percentile corresponds to the score of 12? 80 8 10 7 15 16 12 19 14 9 13 12 25 40

40

Provide an appropriate response. The lengths of phone calls from one household (in minutes) were 2, 4, 6, 7, and 8 minutes. Find the midrange for this data. 2 minutes 5 minutes 6 minutes 10 minutes

5 min

Solve the problem. For the following data set, approximate the sample standard deviation of phone calls per day. 8-11 18 12-15 23 16-19 38 20-23 47 24-27 32 2.9 18.8 3.2 5.1

5.1

Solve the problem. The scores of the top ten finishers in a recent LPGA Valley of the Stars Tournament are listed below. (Source: Los Angeles Times) 71 67 67 72 73 68 72 72 Find the mode score. 67 73 72 76

72

Solve the problem. IQ test scores are normally distributed with a mean of 100 and a standard deviation of 15. Find the x-score that corresponds to a z-score of -1.645. 82.3 75.3 91.0 79.1

75.3

Solve the problem. Use the ogive below to approximate the number in the sample. 28 100 80 341

80

Solve the problem. Given H0: p = 0.85 and α = 0.10, which level of confidence should you use to test the claim? 80% 95% 99% 90%

90%

For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 2 SD of the observation

95%

Confounding Variable

A confounding variable occurs when an experimenter cannot tell the difference between the effects of different factors on the variable. Ex: To attract more customers, a coffee shop owner experiments by remodeling the shop using bright colors. At the same time, a shopping mall nearby has its grand opening. If business increases, it cannot be determined whether it was from the remodel or the opening of the shopping mall.

B. $14 (12+15+17+22+14)/5 = 16

A consumer group surveyed the prices for white cotton extra-long twin sheet sets in five different department stores and reported the average price as $16. We visited four of the five stores, and found the prices to be $12, $15, $17, and $22. Assuming that the consumer group is correct, what is the price of the item at the store that we did not visit? A. $10 B. $14 C. $15 D. $17

Left-skewed data

A data set where mean << median

E. B and C

A study is conducted on students taking a statistic class. Several variables are recorded in the survey. Which variables are quantitative? A. Type of car the student owns B. Number of credit hours taken during that semester C. The time the student waited in line at the bookstore to pay for his/her textbooks D. Home state of the student E. B and C

event

A subset of a sample space may consist of 1 or more outcomes

Survey

A survey is an investigation of one or more characteristics of a population. Most often, surveys are carried out on people by asking them questions. The most common types of surveys are done by interview, Internet, phone, or mail. In designing a survey, it is important to word the questions so that they do not lead to biased results, which are not representative of a population. Ex: A survey is conducted on a sample of female physicians to determine whether the primary reason for their career choice is financial stability.

Higher kurtosis than normal

An S-shaped Q-Q plot shows the data has ___________________________________.

Wider

As the confidence coefficient of a confidence interval increases the intervals get _______________.

A candidate for governor of a particular state claims to be favored by at least half of the voters. State this claim mathematically. Write the null and alternative hypotheses. Identify which hypothesis is the claim.

Claim: p≥0.5 H0: p≥0.5 Ha: p<0.5 Claim is H0 (see 7.1 in class practice)

The dean of a major university claims that mean time for students to earn a Master's degree is at most is 4.2 years. Write the null and alternative hypotheses. Identify which hypothesis is the claim.

Claim: µ≤4.2 H0: p≤4.2 Ha: µ>4.2 Claim is H0 (see 7.1 in class practice)

Measures of dispersion

Determine how variable the data is

Measures of location

Determine where the centre of the data is

These events ARE mutually exclusive, since it IS NOT possible for a voter to both have legally voted for the president in south carolina and HAVE legally voted for the president in texas

Determine whether the following events are mutually exclusive Event A. Randomly select a voter who legally voted for the president in south carolina Event B. Randomly select a voter who legally voted for the president in texas

A. -1.5, because probability values cannot be less than 0 D. 64/25, because probability values cannot be greater than 1

Determine which numbers could not be used to represent the probability of an event (2 answers) A. -1.5, because probability values cannot be less than 0 B. 320/1058, because probability values cannot be in fraction form C. 0.0002, because probability values must be rounded to two decimal places D. 64/25, because probability values cannot be greater than 1 E. 33.3%, this is because probability values cannot be greater than 1 F. 0, because probability values must be greater than 0

Skewness

Determines if the data is symmetric

Study error

Difference between the target population and study population

If neither the subjects nor the researchers know who was assigned what treatment

Double blind

Unit

Each member of a study population

The P-value for a hypothesis test is P = 0.034. Do you reject or fail to reject H0 when the level of significance is alpha=0.01

Fail to reject because P > alpha (0.034 > 0.01) (see 7.2 in class practice)

B. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive

Is the distribution a discrete probability distribution A. No, because some of the probabilities have values greater than 1 or less than 0 B. Yes, because the probabilities sum to 1 and are all between 1 and 0, inclusive C. No, because the total probability is not equal to 1 D. Yes, because the distribution is symmetric

difference between parameter and statistic

It is important to note that a sample statistic can differ from sample to sample, whereas a population parameter is constant for a population.

Has no modes, no value around which the observations are concentrated.

Uniform

Suppose P(A)=0.3, P(B)=0.4, and P(A and B)=0.13 a. State if A and B are independent or dependent events explain why. b. State if A and B are mutually exclusive events or not and explain why.

a. Dependent because 0.3 x 0.4 ≠ .13 so the outcome of event A influences the outcome of event B b. No because 0.13 is not 0, so there is a slight chance it could be both A and B (see test 2 #15)

Empirical Probability

calculated from the results of an experiment

symmetric

can be divided and create 2 symmetrical sides

Solve the problem. From past figures, it is predicted that 43% of the registered voters in California will vote in the June primary. Does this statement describe: descriptive statistics? inferential statistics?

inferential statistics?

Solve the problem. The chances of winning the California Lottery are one chance in twenty-two million. Does this statement describe: inferential statistics? Descriptive statistics?

inferential statistics?

measures the variability of a distribution by giving us the range covered by the middle 50% of the data

inter-quartile range

Solve the problem. Identify the level of measurement for data that are the temperature of 90 refrigerators.

interval

Given H0: U ≤ 25 and Ha: μ > 25, determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed. left-tailed right-tailed two-tailed

right-tailed

Round-Off Rule

round off to one or more decimal place than occurs in the values of the variables

example: r = 0.931 what's the correlation?

strong and linear

Identify the sampling technique used. Every fifth person boarding a plane is searched thoroughly.

systematic

may not be subject to any clear bias, but it wouldn't be safe as taking a random sample

systematic sampling

"N", when calculating the median, refers to

the number of observations

Solve the problem. The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. Find the equation of the regression line for the given data. # of absences x: 0 3 6 4 9 2 15 8 5 Final grade: 98 86 80 4 82 7192 55 76 82 y= 96.14x - 2.75 y= -96.14x + 2.75 y= -2.75x + 96.14 y= -2.75x - 96.14

y= -2.75x + 96.14

Population

μ

Solve the problem. A coin is tossed. Find the probability that the result is heads. 0.9 1 0.5 0.1

0.5

U-shaped

The Q-Q plot for an exponential distribution is _______________________________.

In a split stemplot, the first leaf holds numbers ________ and the second leaf holds numbers _________

0-4, 5-9

Solve the problem. If one card is drawn from a standard deck of 52 playing cards, what is the probability of drawing an ace? 1/2 1/4 1/52 1/13

1/13

Mean Average Deviation

1/n*(sum of absolute differences from the mean)

Data measured over time

A run chart is a good way to record ____________________________.

B. Continuous, because distance is a random variable that is uncountable

Decide whether the graph represents a discrete random variable or a continuous random variable Distance a baseball travels after being hit A. Discrete, because distance is a random variable that is countable B. Continuous, because distance is a random variable that is uncountable

Interquartile Range (IQR)

IQR=Q3-Q1

Kurtosis

Indicates the frequency of extreme observations in the data

K=3

Kurtosis of a normal random variable

Left-skewed

Longer left tail

Right-skewed

Longer right tail

What is the formula for calculating the median if its an ODD NUMBER

N+1/2

n, 2n

The mean and variance of a Chi-squared(n) distribution are ______________ and __________________.

Median value

The middle line on a boxplot represents the _____________________________ of the data.

Class with the highest jump to the next class

The mode can be found from an ECDF graph by choosing the _______________________________________.

Study population

The set of elements from which a sample is actually selected

Provide an appropriate response. In a random sample, 10 students were asked to compute the distance they travel one way to school to the nearest tenth of a mile. The data is listed below. a) If a constant value k is added to each value, how will the standard deviation be affected? b) If each value is multiplied by a constant k, how will the standard deviation be affected? 1.1 5.2 3.6 5.0 4.8 1.8 2.2 5.2 1.5 0.8

The standard deviation will not be affected.

Stratified sampling

The type of sampling where researchers divide the population into subgroups, then randomly select a proportional number of individuals from those groups is called...

Max value <= Q3+1.5IQR

The upper whisker of a box and whisker plot ends at __________________________.

The MLE of theta

The value of theta that maximizes the likelihood function of theta given the data is called ______________________.

Scatter plot

This graph can be used to check for a sample correlation by looking at the linear pattern of dots.

Placebo Effect

This occurs when a subject reacts favorably to a placebo when in fact the subject has been given a fake treatment.

Estimation problem

This type of problem involves guessing the most likely value of a variable from the data.

Three Key Elements of a Well-Designment Experiment

Three key elements of a well-designed experiment are control, randomization, and replication. Because experiments can be ruined by a variety of factors, being able to control these influential factors is important

b^2*(s_x)^2

Variance of Y = a + bX with Var(X) = (s_x)^2

Stratified Sampling

When it is important for the sample to have members from each segment of the population, you should use stratified sampling. Depending on the focus of the study, members of the same population are divided into two or more subsets, called strata, that share a similar characteristic. A sample is then randomly selected from each of the strata. Using a stratified sample ensures that each segment of the population is represented. Ex: To collect a stratified sample of the number of people who live in Calcasieu Parish households, you could divide the households into socioeconomic levels then randomly select households from each level.

A. II and III

Which is true of the data shown in the histogram? (Histogram is skewed left) I. The distribution is skewed to the right II. The mean is probably smaller than the median III. We should use the median and IQR to summarize these data

D. II and IV

Which of the following is a basic experimental principle? I. Including men and women in the experiment II. Randomization III. Having at least three treatments IV. Replication

The explanatory variable goes on the ____ axis

X

The response variable goes on the _____ axis

Y

the slope and intercept of the least square regression line are found using this equation (2/2)

a = y - bX

Variable

a characteristic of interest

Statistic

a numerical description of a sample characteristic

relative frequency

a portion or percent of the data that falls in the class to find the relative frequency, divide f (frequency) by n (sample size)

frequency distribution

a table that shows classes or intervals of data entries with a count of the number of entries in each class the frequency (f) of a class is the # of data entries

Single Blinding

a technique in which the subject doesn't know whether he or she is receiving a treatment or a placebo

Unbiased Samples

allow each subject in the population an equal chance of being selected

Probability Experiment

an action, or trial, through which specific results (counts, measurements, or responses) are obtained

Solve the problem. Find the range of the data set represented by the graph. 20 15 10 5 0 1 2 3 4 5 6 7 a) 20 b) 17 c) 6 d) 5

c) 6

Solve the problem. Based on previous clients, a marriage counselor concludes that the majority of marriages that begin with cohabitation before marriage will result in divorce. Does this statement describe inferential statistics or descriptive statistics? inferential statistics descriptive statistics

inferential statistics

descriptive statistics

is the branch of statistics that involves the organization, summarization, and display of data. Ex: Tables, charts, averages

replication

is the repetition of an experiment under the same or similar conditions.

parameter

numerical description of a population characteristic. Ex: Average age of all people in the United States

an observation is considered a suspected outlier if it is:

less than Q1 - 1.5 (IQR) more than Q3 + 1.5 (IQR)

Skewed Left Distribution

longer left tail

Skewed Right

longer right tail

Ratio

same as interval except zero is a value that has meaning ex. distance

not all relationships can be classified as positive or negative

neither positive or negative

Double Binding

neither subjects nor researchers know who is receiving a treatment and who is receiving a placebo

Recruit participants for a study. Give them journals to record hour by hour their activities for the following day, including when they watch TV and when they consume snacks. Determine if snack consumption is higher during TV times. This is an example of an _______ ________

observational study

the values of the variables of interest are recorded as they naturally occur

observational study

Identify the data set's level of measurement. manuscripts rated "acceptable" or "unacceptable"

ordinal

Identify the data set's level of measurement. the final grades (A, B, C, D, and F) for students in a statistics class

ordinal

Identify the data set's level of measurement. the ratings of a movie ranging from "poor" to "good" to "excellent"

ordinal

Solve the problem. Identify the level of measurement for data that are the numbers on the shirts of a girl's soccer team. ratio ordinal nominal interval

ordinal

In a histogram, the SPREAD refers to the...

range

Event

set of outcomes for a statistical experiment

Intersecting Events

share at least one outcome; overlapping events

Decide which method of data collection you would use to collect data for the study. Specify either observational study, experiment, simulation, or survey. A study where you would like to determine the chance getting three girls in a family of three children

simulation

Outcome

the result of a single trial in a probability experiment

skewed right

the tail is on the right side

Classical (theoretical) probability

used when each outcome in a sample space is equally likely to occur P(E)= # of outcomes in E/ total sample space

cluster

uses clusters (naturally occurring sub groups)

stratified

uses strata (a shared characteristic)

Data

values (observations) the variable can assume

where individual's have selected themselves to be included; guranteed to be biased

volunteer sample

sample size

which is the number of subjects in a study, is another important part of experimental design.

sample

x bar

Use the z table to find the z score which has an area closest to 0.3 to the left under the normal curve. Find the corresponding x value for a normal distribution with a mean of 28 and a standard deviation of 3.4.

x=26.2150 (see test 3 #14)

equation of a straight line

y = a + bx

Solve the problem. Find the equation of the regression line for the given data. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -1 3 4 1 -4 -5 8 y= -1.885x + 0.758 y= -0.758x - 1.885 y= 0.758x + 1.885 y= 1.885x - 0.758

y= -1.885x + 0.758

Solve the problem. Seven guests are invited for dinner. How many ways can they be seated at a dinner table if the table is straight with seats only on one side? 4 720 40,320 5040

5040

Solve the problem. A single six-sided die is rolled. Find the probability of rolling a number less than 3. 0.25 0.333 0.5 0.1

0.333

For distributions with a "Normal shape" (kinda shaped like a unimodal graph) ----approximately ______% falls within 1 SD of the observation

68%

Solve the problem. Assume that blood pressure readings are normally distributed with a mean of 116 and a standard deviation of 4.8. If 36 people are randomly selected, find the probability that their mean blood pressure will be less than 118. 0.8615 0.9938 0.0062 0.8819

0.9938

Solve the problem. For the following data, approximate the mean number of phone calls per day. 8-11 48 12-15 16 16-19 42 20-23 34 24-27 45 18 16 37 19 17

18

Solve the problem. For the following data, approximate the mean number of phone calls per day. phone calls (per day) Freque. 8-11 48 12-15 16 16-19 42 20-23 34 24-27 45 16 37 19 17 18

18

S-shaped

The Q-Q plot for a uniform distribution is ___________________________.

Solve the problem. Find the critical value for a two-tailed test with α = 0.01 and n = 30. ±1.96 ±2.575 ±2.33 ±1.645

±2.575

Solve the problem. A student receives test scores of 62, 83, and 91. The student's final exam score is 88 and homework score is 76. Each test is worth 20% of the final grade, the final exam is 25% of the final grade, and the homework grade is 15% of the final grade. What is the student's mean score in the class? 90.6 76.6 80.6 85.6

80.6

Solve the problem. SAT verbal scores are normally distributed with a mean of 450 and a standard deviation of 100. Use the Empirical Rule to determine what percent of the scores lie between 250 and 550. 83.9% 68% 34% 81.5%

81.5%

Example: researchers want to determine if ppl tend to snack more while they watch TV.

Prospective example

In regards to the "4 possibilities for role-type classifications" what would Time and driving test outcome be?

Q --> C

In regards to the "4 possibilities for role-type classifications" what would SAT score and GPA of freshman be?

Q --> Q

In interquartile range, finding the median of the lower 50% is finding...

Q1

IQR

Q3 - Q1

How do you find the IQR?

Q3-Q1

T/F: in a randomized controlled experiment, we can draw causal conclusions

True

assesses the strength of a linear relationship; denoted by "r"

correlation coefficient

Continuous Data

data that can be measured, includes fractions and decimals

Nominal Data

data that can be placed into categories but cannot be ordered or ranked ex. favorite breakfast cereal

Qualitative Data

data that can be replaced into categories based on some characteristic or quality

Quantitative Data

data that is numerical in nature

Frequency Histogram

A bar graph that represents the frequency distribution of a data set. (Bars must touch) x-axis= class boundaries y-axis= frequencies

Q-Q plot

Plots of the data's quantiles against the quantiles of the N(0,1) distribution

Sample

Portion of a population used to make predictions about its properties

Cholesterol levels of 1,000 adults - we'd expect cholesterol levels of adults to consist of a few low numbers and a few very high #'s, with most in the middle

Unimodal graph

I and II

Which of the following summaries are changed by adding a constant to each data value? I. The mean II. The median III. The standard deviation

Solve the problem. The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Calculate the correlation coefficient, r. Age x: 4144 48 51 54 56 60 64 68 Pressure, y: 111 115 118 126 137 140 143 145 147. 0.908 0.890 0.960 0.998

0.960

Solve the problem. Calculate the correlation coefficient, r, for the data below. x -9 -7 0 -3 -5 -6 -4 -2 -1 -8 y -2 0 17 9 6 2 7 11 14 0 0.990 0.819 0.792 0.881

0.990

Solve the problem. The table lists the smoking habits of a group of college students. no yes Heavy man 135 41 5 woman 187 21 5 If a student is chosen at random, find the probability of getting someone who is a man or a woman. Round your answer to three decimal places. 0.918 0.197 0.803 1

1

Ex 2: Decide whether each number describes a population parameter or a sample statistic. 1. A survey of several hundred collegiate student-athletes in the United States found that, during the season of their sport, the average time spent on athletics by student-athletes is 50 hours per week. (Source: Penn Schoen Berland) The freshman class at a university has an average SAT math score of 514 In a random check of several hundred retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature.

1 . The average of 50 hours is based on a subset of the population, so it is a sample statistic 2. The average SAT math score of 514 is based on the entire freshman at this particular university, so it is a population parameter 3. The 34% of stores not storing fish at the proper temperature is based on a subset of the population, so it is a sample statistic

Ex 3: For each study, identify the population and the sample. Then determine which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?

1. A study of 2560 U.S. adults found that of adults not using the Internet, 23% are from households earning less than $30,000 annually, as shown in the figure pop: responses of us adults sample: responses of 2560 us adults in the study descriptive: 23% are from households earning less than 30k annually inference: lower-income cannot afford internet typically and the households are less likely to have internet 2. A study of 300 Wall Street analysts found that the percentage who incorrectly forecasted high-tech earnings in a recent year was 44%. pop: high tech earning forecast of all wall street analysts sample: forecast of the 300 wall street analyst in the study descriptive: the percentage who incorrectly forecasted high-tech earnings in a recent year was 44% inference: the stock market is difficult to estimate even for professionals

Designing a Statistical Study

1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. 3. Collect data. 4. Describe the data, using the descriptive statistics techniques. 5. Interpret the data and make decisions about the population using inferential statistics. 6. Identify any possible errors.

Use the four conditions of a Binomial Experiment to determine if the situation represents a Binomial experiment. Specifically state why it does or does not meet each of the four conditions. Selecting 5 cards, one at a time without replacement, from a standard deck of cards. The random variable is the number of red cards obtained.

1. Set number of trials (5) 2. Success or failure (success = red cards) 3. x = number of successes (number of red cards) 4. Independent - This is not binomial because without replacement the probability changes and the chance of success depends on the cards already chosen (see test 3 #3)

Solve the problem. Identify the class width used in the frequency distribution. Miles (per day) Frequency 1 - 6 28 7 - 12 21 13 - 18 8 19 - 24 11 7 6 5 12

6

The distribution of ages for the winners of the Tour de France from 1903 to 2012 is approximately bell shaped. The mean age is 28.1 years with a standard deviation of 3.4 years. Find the z score for the age of Bradley Wiggins who won in 2012 at the age of 32.

1.15 (see quiz 3)

Solve the problem. The heights (in inches) of 10 adult males are listed below. Find the sample standard deviation. 70 72 71 70 69 73 69 68 70 71 1.49 2.38 3 70

1.49

Solve the problem. Find the standardized test statistic t for a sample with n = 25, x = 21, s = 3, and α = 0.005 if Ha: μ > 20. Round your answer to three decimal places. 1.997 1.239 1.667 1.452

1.667

Solve the problem. You wish to test the claim that μ > 23 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 23.3000002, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 3.11 0.98 2.31 1.77

1.77

Solve the problem. You wish to test the claim that μ > 6 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 6.3, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 2.31 1.77 3.11 0.98

1.77

you wish to test the claim that μ > 6 at a level of significance of α = 0.05 and are given sample statistics n = 50, x = 6.3, and s = 1.2. Compute the value of the standardized test statistic. Round your answer to two decimal places. 2.31 1.77 3.11 0.98

1.77

Solve the problem. Given a sample with r = 0.321, n = 30, and α = 0.10, determine the standardized test statistic t necessary to test the claim ρ = 0. Round answers to three decimal places. 1.793 2.561 3.198 2.354

1.793

Find the standardized test statistic t for a sample with n = 12, x = 18.2000, s = 2.2, and α = 0.01 if H0: μ = 17. Round your answer to three decimal places. 1.890 2.001 1.991 2.132

1.890

The cholesterol levels (in milligrams per deciliter) of 30 adults are listed below. Find Q1. 154 156 165 165 170 171 172 180 184 185 189 189 190 192 195 198 198 200 200 200 205 205 211 215 220 220 225 238 255 265

180

Solve the problem. A sample of candies have weights that vary from 2.35 grams to 4.75 grams. Use this information to find the upper and lower limits of the first class if you wish to construct a frequency distribution with 12 classes. 2.35- 2.75 2.35- 2.65 2.35- 2.54 2.35- 2.55

2.35- 2.54

Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Find the standard error of estimate, se, given that y = 5.044x + 56.11. hrs x: 3 5 2 8 2 4 4 5 6 3 scores y: 65 80 60 88 66 78 85 90 90 71 9.875 7.913 8.912 6.305

6.305

Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if C must sit to the right of but not necessarily next to B? 48 20 60 24

60

Provide an appropriate response. The test scores of 30 students are listed below. Find P30. 31 41 45 48 52 55 56 63 65 67 67 69 70 70 74 75 78 79 79 80 81 83 85 85 87 90 92 95 99 67 56 90 63

63

Event C has 5 outcomes NO, because event C has MORE THAN one outcome

A computer is used to select randomly a number between 1 and 9 , inclusive. Event C is selecting a number less than 6 Event C has __ outcome(s) Is the event a simple event?

Symmetric data

A data set where the mean and median are approximately equal.

D. All homes in the posh California neighborhood D. The sales price of a home

A real estate broker wishes to estimate the average sales price of homes in a posh California neighborhood. To do so, she samples 50 recently sold homes in the neighborhood and finds the average sales price of the 50 homes to be $370,000. The population of interest to the broker is... A. The 50 recently sold homes B. All homes in the U.S. C. All homes in California D. All homes in the posh California neighborhood E. All homes with a sales price of $350,000 or more The variable of interest is... A. The size of a home B. The average age of the home C. The number of homes in the neighborhood D. The sales price of a home

B. Has one factor (shampoo type) blocked by gender and whether hair is dyed

A researcher wants to compare the effect of a new type of shampoo on hair condition. The researcher believes that men and women may react to the shampoo differently. Additionally, the researcher believes that the shampoo will react differently on hair that is dyed. The subjects are split into four groups: men who dye their hair; men who do not dye their hair; women who dye their hair; women who do not dye their hair. Subjects in each group are randomly assigned to the new shampoo and the old shampoo. This experiment... A. Has two factors (shampoo type and whether hair is dyed) blocked by gender B. Has one factor (shampoo type) blocked by gender and whether hair is dyed C. Has three factors (shampoo type, gender, whether hair is dyed) D. Is completely randomized E. Has two factors (gender and whether hair is dyed) blocked by shampoo type

No evidence of a linear relationship

A value of rxy close to 0 indicates ____________.

Stronger evidence of a linear relationship

A value of rxy closer to 1 indicates ____________.

A. Material for insulators B. 2 - Baking temperature and cooling method C. 8 D. Likeliness to break during adverse weather

Ceramics engineers are testing a new formulation for the material used to make insulators for power lines. They will try baking the insulators at four different temperatures, followed by either slow or rapid cooling. They want to try every combination of the baking and cooling options to see which produces insulators least likely to break during adverse weather conditions A. What are the experimental units? B. How many factors are there? C. How many treatments are there? D. What is the response variable?

Causation

Ceteris paribus a change in the explanatory variable results in a change in the response variable.

Alternate hypothesis

Challenging hypothesis to H0; if we get enough evidence we may reject H0 and accept this alternate hypothesis.

Use the minimum and maximum data entries and the number of classes to determine the class width, lower class limits, and upper class limits: min = 9 max = 64 number of classes = 7

Class width: (64-9)/7 ≈ 8 Lower limits: 9, 17, 25, 33, 41, 49, 57 Upper limits: 16, 24, 32, 40, 48, 56, 64 (see quiz 2)

Data at the nominal (name) level of measurement

Data at the nominal level of measurement are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level.

Data at the ordinal (order/rank) level of measurement

Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful

Lower quartile

Data value below which 25% of the data lies

A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other

For the given pair of events, classify the two events as independent or dependent Randomly selecting a consumer from California Randomly selecting a consumer who owns a television A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other B. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other C. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other D. The two events are independent because the occurrence of affects the probability of the occurrence of the other

A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other

For the given pair of events, classify the two events as independent or dependent Randomly selecting a fan at the Super Bowl Randomly selecting a football player at the Super Bowl A. The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other B. The two events are dependent because the occurrence of one does not affect the probability of the occurrence of the other C. The two events are dependent because the occurrence of one affects the probability of the occurrence of the other D. The two events are independent because the occurrence of affects the probability of the occurrence of the other

The statement represents a claim. Write its complement and state which is H0 and which is Ha. µ = 8.3

H0: µ = 8.3 (claim) Ha: µ ≠ 8.3 (see 7.1 in class practice)

The mean age of bus drivers in Chicago is 48.6 years. Write the null and alternative hypotheses.

H0: µ=48.6 Ha: µ≠48.6 (see 7.1 in class practice)

The mean IQ of statistics teachers is greater than 160. Write the null and alternative hypotheses.

H0: µ≤160 Ha: µ>160 (see 7.1 in class practice)

Breaking the range of values into intervals and count how many observations fall into each interval

Histogram

What are the two types of charts used for a Quantitative variable?

Histogram Stemplot

fundamental counting principle

If one event can occur in m ways and a second can occur in n ways, the number of ways the 2 events can occur in sequence is mxn

Relative frequency

In a density histogram the area of the rectangle for each class is the ____________________________________.

Experiment and its processes

In an experiment, a researcher deliberately applies a treatment before observing the responses. A treatment is applied to part of the population, called a treatment group, and responses are observed. Another part of the population may be used as a control group, in which no treatment is applied. The subjects in both groups are called experimental units. In many cases, subjects in this group are given a placebo, which is a harmless, fake treatment that is made to look like the real treatment. The responses of both groups can be compared and studied

Observational Study

In an observational study, a researcher does no influence the responses. A researcher observes and measures characteristics of interest of part of a population but does not change the existing conditions.

Explanatory variable

Independent variable in a causal problem.

Mode

Most frequently occurring data value in a data set

Binary data

Non-numerical data that has two categories

Problem, Plan, Data, Analysis, Conclusion

PPDAC stands for _________________________________.

State whether the information given is a statistic or a parameter and say why. The median height of the entire Phoenix Mercury WNBA team is 73 inches.

Parameter because it is describing the population of the Phoenix Mercury team. (see test 1 #2)

When patients improve because they are receiving treatment even though they are not actually receiving treatment

Placebo effect

Solve the problem. Suppose you want to test the claim that μ ≠ 3.5. Given a sample size of n = 33 and a level of significance of α = 0.05 when should you reject H0 ? Reject H0 if the standardized test statistic is greater than 2.33 or less than -2.33 Reject H0 if the standardized test statistic is greater than 2.575 or less than -2.575. Reject H0 if the standardized test statistic is greater than 1.96 or less than -1.96. Reject H0 if the standardized test statistic is greater than 1.645 or less than -1.645

Reject H0 if the standardized test statistic is greater than 1.96 or less than -1.96

Solve the problem. Suppose you want to test the claim that μ < 65.4. Given a sample size of n = 35 and a level of significance of α = 0.01 when should you reject H0? Reject H0 if the standardized test is less than -2.575. Reject H0 if the standardized test statistic is less than -1.96. Reject H0 if the standardized test statistic is less than -1.28. Reject H0 if the standardized test statistic is less than -1.645.

Reject H0 if the standardized test statistic is less than -1.28.

The P-value for a hypothesis test is P = 0.006. Do you reject or fail to reject H0 when the level of significance is alpha=0.01?

Reject because P < alpha (0.006 < 0.01) (see 7.2 in class practice)

The owner of a professional basketball team claims that the mean attendance at games is over 22,000 and therefore the team needs a new arena. Determine whether the hypothesis test for this claim is left-tailed, right-tailed, or two-tailed. Sketch the distribution of the test statistic and label the P-value region.

Right-tailed (see 7.1 in class practice)

Solve the problem. For the mathematics part of the SAT the mean is 514 with a standard deviation of 113, and for the mathematics part of the ACT the mean is 20.6 with a standard deviation of 5.1. Bob scores a 660 on the SAT and a 27 on the ACT. Use z-scores to determine on which test he performed better. ACT SAT

SAT

Measure of spread

SD and IQR

Population

Set of all observations of interest for a study

Interpreting a histogram means knowing the 4 features, which are

Shape Center Spread Outliers

K<3

Shows a data set has fewer extreme observations than normal

K>3

Shows a data set has more extreme observations than normal

Whenever including an omitted variable causes us to rethink the direction of an association, this is called ____________ __________

Simpson's paradox

For the given data , construct a frequency distribution and frequency histogram of the data using five classes. Describe the shape of the histogram as symmetric, uniform, skewed left, or skewed right. Data set: California Pick Three Lottery 8 6 7 6 0 9 1 7 8 4 1 5 7 5 9 7 5 3 9 9 8 8 3 9 8 8 9 0 2 7 skewed right symmetric uniform skewed left

Skewed Left

Sample standard deviation

Sqrt(sample variance)

Test the claim that µ ≤ 40, given that standard deviation = 4.3, alpha = 0.01, and the sample statistics are n = 40 and mean = 41.8.

Standardized test statistic ≈ 2.65 Critical value = 2.33 Reject H0 There is enough evidence to reject the claim. (see 7.2 in class practice)

Explain how the question may be biased and suggest a way to reword the question to remove bias. Why does eating whole grain foods improve your health?

The question implies that whole grain foods are good for your health. "How does eating whole grain foods impact your health?" (see quiz 1)

Quitting methods

Treatments

None

Your stats teacher tells you your test score was the 3rd Quartile for the class. Which is true? I. You got 75% on the test II. You can't really tell what this means without knowing the standard deviation III. You cant' really tell what this means unless the class distribution is nearly symmetric

Observational Study

a study in which the researcher draws conclusions strictly by observing what is happening or what has happened ex. observing how children interact on the playground

Experiment

a study in which the researcher manipulates one (or more) of the variable(s) and determines how the change influences the response variable(s). The researcher controls the experiment by applying a treatment ex. different soil content used to test plant growth

Sample

a subset of members selected from the population

Sample

a subset of the population (Ex: a few in the larger classroom)

Blinding

a technique where the subject does not know whether they are receiving a treatment or a placebo

Binding

a technique where the subjects do not know whether they are receiving a treatment or a placebo.

Random Variable

a variable whose value is determined by chance

Use the given frequency distribution to find the (a) class width. (b) class midpoints of the first class. (c) class boundaries of the first class. Phone Calls (per day) (a) 3 (b) 9.5 (c) 7.5-11.5 (a) 3 (b) 10.5 (c) 8-11 (a) 4 (b) 10.5 (c) 8-11 (a) 4 (b) 9.5 (c) 7.5-11.5

a) 4 (b) 9.5 (c) 7.5-11.5

Solve the problem. Two high school students took equivalent language tests, one in German and one in French. The student taking the German test, for which the mean was 66 and the standard deviation was 8, scored an 82, while the student taking the French test, for which the mean was 27 and the standard deviation was 5, scored a 35. Compare the scores. a)A score of 82 with a mean of 66 and a standard deviation of 8 is better. b)The two scores are statistically the same. c)You cannot determine which score is better from the given information. d)A score of 35 with a mean of 27 and a standard deviation of 5 is better.

a)A score of 82 with a mean of 66 and a standard deviation of 8 is better.

SAT scores have a bell-shaped distribution with a mean score of 1490 with a standard deviation of 220. a. Use the Empirical Rule to determine the percent of scores that fall between 1270 and 1710. b. Use the Empirical Rule to determine the percent of scores that fall between 1050 and 1490.

a. 68% b. 47.5% (see test 1 #21)

Use the graphs of normal distributions A, B, and C (test 3 #6-7) a. Which normal distribution has the largest mean? b. Which normal distribution has the largest standard deviation?

a. B b. A

Identify the sampling technique in each situation. a. Questioning students as they leave a university library, a researcher asks 358 students about their drinking habits. b. Chosen at random, 580 customers at a car dealership are contacted and asked their opinions of the service they received. c. Soybeans are planted on a 48-acre field. The field is divided into one acre subplots. A sample is taken from each subplot to estimate the harvest.

a. Convenience sampling b. Simple random sampling c. Stratified sampling (see quiz 1)

State if the probabilities are classical, empirical, or subjective. a. The probability that a randomly selected student makes use of office hours if 8 out of 93 students made use of office hours this past week. b. The probability that Matt will be lonely during office hours next week. c. The probability of being dealt a pair of kings in a game of poker.

a. Empirical b. Subjective c. Classical (see test 2 #12)

The number of points scored by Kyrie Irving in every game of the 2017 NBA playoffs are listed below: 23, 37, 13, 28, 24, 22, 16, 27, 11, 23, 29, 42, 24, 24, 19, 38, 40, 26 a. Calculate the mean, median, and mode of the data. Round to the nearest hundredth. b. Calculate the range, standard deviation, and variance of the data. Use your calculator and round to the nearest hundredth. c. Calculate the coefficient of variation. Round to the nearest hundredth. d. What is the level of measurement for the data? Explain your answer. e. Use the data to construct a frequency distribution with 5 classes. f. Construct a Relative Frequency Histogram for the data. g. Create a stem and leaf plot for the data. h. Suppose you wanted to create a pie chart from the data. Calculate the central angle for each category. Round answers to the nearest hundredth. - Boston: 17 championships - LA: 16 championships - Chicago: 6 championships

a. Mean = 24.94, Median = 24, Mode = 24 b. Range = 29, SD = 8.56, Variance = 73.27 c. 33.06% d. Ratio because the distance between the values has no meaning. Also, in this case a value of 0 would mean none, whereas in interval, 0 is on a scale. e. see test 1 #15 f. see test 1 #16 g. see test 1 #17 h. Boston: 158.40 deg, LA: 147.60 deg, Chicago: 55.39 deg (see test 1 #11-18)

Determine whether or not the Central Limit Theorem can be applied to the distribution of sample means and state the reason why. a. The mean height of all NBA players is 79 inches with a standard deviation of 5 inches. 10 NBA players are randomly selected and the mean of the 10 heights is calculated. b. The salaries at a Fortune 500 company are skewed right with a mean of $85,000 and a standard deviation of $11,000. 50 employees are randomly selected and the mean of the salaries is calculated.

a. No because n<30 and the population is not normal. b. Yes because n is greater than or equal to 30 (see test 3 #16-17)

A survey of 55 US law firms found that the average hourly billing rate was $425. a. Identify the population and the sample. b. Determine which part of the survey represents the descriptive branch of statistics and make an inference based on the survey.

a. Population: average billing rate of US law firms Sample: 55 US law firms b. 55 US law firms had an average billing rate of $425 Inference: The average billing rate of US law firms is about $425. (see quiz 1)

In a class of 32 students, 11 students made use of office hours and 7 students made an appointment at the math center. 4 of the students who used office hours also made an appointment at the math center. a. Create a venn diagram or two-way table to model the situation. b. What is the probability of selecting a student who used office hours but did not make an appointment? c. What is the probability of selecting a student who did not get extra help?

a. See test 2 #17 b. 0.219 c. 0.563

You flip three coins. Let the random variable X represent the number of heads that come up. a. Use a tree diagram to find the sample space for flipping three coins. b. Create a probability distribution for the random variable X. c. What is the expected number of heads obtained from flipping three coins?

a. See test 2 #18 b. See test 2 #18 c. 1.5

State the type of sample (simple random, stratified, cluster, systematic, or convenience). a. Assigning each student a number and choosing every third number after the number 5. b. Randomly choosing 5 classrooms in Bannow and asking all the students in each room to complete a survey.

a. Systematic b. Cluster (see test 1 #3-4)

The average time spent sleeping (in hours) for a group of medical residents at a hospital can be approximated by a normal distribution with a mean of 6.1 hours and a standard deviation of 1.0 hours. a. What is the shortest time spent sleeping that would still place a resident in the top 5% of sleeping times? b. Between what two values does the middle 50% of the sleep times lie?

a. invnorm(.95, 6.1, 1) = 7.74 b. invnorm(.25, 6.1, 1) = 5.43 invnorm(.75, 6.1, 1) = 6.77 - Between 5.43 hours and 6.77 hours. (see quiz 6)

The number of points scored by Kyrie Irving in every game of the 2017 NBA playoffs are listed below: 23, 27, 13, 28, 24, 22, 16, 37, 11, 23, 29, 42, 24, 24, 19, 38, 40, 26 a. Give the five number summary for the data and construct a box and whisker plot. b. Use the interquartile range (IQR) to identify any outliers. c. Find the value that corresponds to the 78th percentile. What percentile is 22?

a. min=1, Q1=22, Q2=24,Q3= 29, max=42 see test 2 #1 b. IQR=7 Outliers: 11, 40, 42 c. 78th percentile = 37 22 = 27.8th percentile

Consider the data set below. 39, 36, 30, 27, 26, 24, 28, 35, 39, 60, 50, 41, 35, 32, 51 a. Find the Five Number Summary. b. Create a box and whisker plot for the data.

a. min=24, Q1=28, Q2=35, Q3=41, max=60 b. see quiz 3 #3

Assume that the salaries of elementary school teachers in the US are normally distributed with a mean of $31,000 and a standard deviation of $2500. Suppose a teacher is selected at random. a. Sketch the distribution. b. Find the probability that he or she makes less than $28,000. c. Find the probability that he or she makes more than $35,000. d. Find the probability that he or she makes between $29,000 and $33,000.

a. see test 3 #11 b. normalcdf(-1x10^99, 28000, 31000, 2500) = 0.1151 c. normalcdf(35000, 1x10^99, 31000, 2500) = 0.0548 d. normalcdf(29000, 33000, 31000, 2500) = 0.5763

Law of Large Numbers

as an experiment is repeated over and over, the empirical probability of an event approached the theoretical (actual) probability of the event

Law of Large Numbers

as the number of trials for an experiment increases, the empirical probability of an event approaches the theoretical probability of the event

simple random

assigns every member a number & then uses a random number table to select

systematic

assigns numbers & then systematically select numbers

Descriptive Statistics

numerical summaries of a sample

systemtic sample

s a sample in which each member of the population is assigned a number. The members of the population are ordered in some way, a starting number is randomly selected, and then sample members are selected at regular intervals from the starting number. Ex: In the Calcasieu Parish example you could assign a different number to each household, randomly choose a starting number, then select every 100th household

sample space

set of all possible outcomes { }

Determine whether the data are qualitative or quantitative. the number of seats in a movie theater

quantitative

Solve the problem. Classify the number of seats in a movie theater as qualitative data or quantitative data.

quantitative data

Solve the problem. A local bank needs information concerning the checking account balances of its customers. A random sample of 15 accounts was checked. The mean balance was $686.75 with a standard deviation of $256.20. Find a 98% confidence interval for the true mean. Assume that the account balances are normally distributed. ($513.17, $860.33) ($238.23, $326.41) ($487.31, $563.80) ($326.21, $437.90)

($513.17, $860.33)

Solve the problem. Find the z-scores for which 98% of the distribution's area lies between -z and z. (-0.99,0.99) (-1.96, 1.96) (-1.645, 1.645) (-2.33, 2.33)

(-2.33, 2.33)

Suppose you are using α =0.05 to test the claim that μ ≠ 34 using a P-value. You are given the sample statistic n= 35 33.1, and s = 2.7. Find the P-value. 0.0244 0.1003 0.0448 0.0591

0.0448

Rank the probabilities of 10%, 1/5, 0.06 from the least likely to occur to the most likely to occur. 0.06, 10%, 1/5 0.06, 1/5, 10% 10%, 1/5, 0.06 1/5, 10%, 0.06

0.06,10%, 1/5

Solve the problem. The distribution of cholesterol levels in teenage boys is approximately normal with mean= 170 and standard deviation= 30 (Source: U.S. National Center for Health Statistics). Levels above 200 warrant attention. Find the probability that a teenage boy has a cholesterol level greater than 200. 0.3419 0.1587 0.8413 0.2138

0.1587

Solve the problem. A delivery route must include stops at three cities. If the route is randomly selected, find the probability that the cities will be arranged in alphabetical order. Round your answer to three decimal places. 0.03703704 0.16666667 0.33333333 0.125

0.16666667

Solve the problem. The distribution of Master's degrees conferred by a university is listed in the table. Major Frequency Mathematics 216 English 207 Engineering 86 Business 176 Education 204 What is the probability that a randomly selected student graduating with a Master's degree has a major of Education? Round your answer to three decimal places. 0.298 0.771 0.005 0.229

0.229

Solve the problem. The lengths of pregnancies are normally distributed with a mean of 264 days and a standard deviation of 15 days. If 36 women are randomly selected, find the probability that they have a mean pregnancy between 264 days and 266 days. 0.5517 0.7881 0.2881 0.2119

0.2881

Solve the problem. A group of students were asked if they carry a credit card. The responses are listed in the table. class cc carrier no cc Freshman 19 41 Sophomore 40 0 If a student is selected at random, find the probability that he or she is a freshman given that the student owns a credit card. Round your answers to three decimal places. 0.322 0.678 0.190 0.317

0.322

Find the standard error of estimate, se, for the data below, given that y = -1.885x + 0.758. x: -5 -3 4 1 -1 -2 0 2 3 -4 y: 11 6 -6 -2 3 4 1 -4 -5 8 0.613 0.011 0.312 0.981

0.613

Solve the problem. A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the manager's sales representatives travel per month and the amount of sales (in thousands of dollars) per month. Calculate the correlation coefficient, r. Miles traveled x 5 6 13 10 11 18 6 4 14 Sales y: 41 43 88 72 75 71 58 65 130 0.561 0.791 0.632 0.717

0.632

An airline knows from experience that the distribution of the number of suitcases that get lost each week on a certain route is approximately normal with mean= 15.5 and standard deviation= 3.6. What is the probability that during a given week the airline will lose between 10 and 20 suitcases? 0.1056 0.8314 0.4040 0.3944

0.8314

Solve the problem. The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Calculate the correlation coefficient r. hrs x: 5 7 4 10 4 6 6 7 8 5 Scores y: 66 81 61 89 67 79 86 91 91 5 0.654 0.761 0.847 0.991

0.847

Find the area under the standard normal curve to the left of z = 1.25. 0.7682 0.8944 0.1056 0.2318

0.8944

Solve the problem. An airline knows from experience that the distribution of the number of suitcases that get lost each week on a certain route is approximately normal with mean = 15.5 and standard deviation= 3.6. What is the probability that during a given week the airline will lose less than 20 suitcases? 0.1056 0.3944 0.8944 0.4040

0.8944

Solve the problem. Find the area of the indicated region under the standard normal curve. 0.0968 0.0823 0.9032 0.9177

0.9032

Solve the problem. IQ test scores are normally distributed with a mean of 99 and a standard deviation of 11. An individual's IQ score is found to be 109. Find the z-score corresponding to this value. -1.10 0.91 -0.91 1.10

0.91

Solve the problem. A coffee machine dispenses normally distributed amounts of coffee with a mean of 12 ounces and a standard deviation of 0.2 ounce. If a sample of 9 cups is selected, find the probability that the mean of the sample will be less than 12.1 ounces. 0.3216 0.0668 0.9332 0.2123

0.9332

Solve the problem. Find the area under the standard normal curve to the left of z = 1.5 0.1599 0.7612 0.9332 0.0668

0.9332

Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. What is the best predicted value for y given Assume that the variables x and y have a significant correlation. Temp x: 72 85 91 90 88 98 75 100 # of absences y: 3 7 10 10 8 15 4 15 15 13 12 14

12

Solve the problem. The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. What is the best predicted value for y given x = 95 Assume that the variables x and y have a significant correlation. Temp x: 72 85 91 90 88 98 75 100 80 # of absences y: 3 7 10 10 8 15 4 15 5 15 13 12 14

12

Provide an appropriate response. The mean score of a placement exam for entrance into a math class is 80, with a standard deviation of 10. Use the Empirical Rule to find the percentage of scores that lie between 60 and 80. (Assume the data set has a bell-shaped distribution.) 95% 34% 47.5% 68%

47.5%

Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if A and B must sit together? 120 48 12 24

48

Solve the problem. How many ways can five people, A, B, C, D, and E, sit in a row at a movie theater if A and B must sit together? 24 48 120 12

48

Solve the problem. If a couple has nine boys and two girls, how many gender sequences are possible? 11 55 16 8

55

Provide an appropriate response. Find the range of the data set represented by the graph. 6 20 5 17

6

Solve the problem. For the following data, approximate the mean miles per day. Miles (per day) Frequency 1-2 23 3-4 16 5-6 26 7-8 30 9-10 29 6 5 25 7

6

Solve the problem. If a couple has seven boys and eight girls, how many gender sequences are possible? 16 15 6435 8

6435

Experiment is binomial A success in this experiment is baby recovers

About 40% of babies born with a certain ailment recover fully. A hospital is caring for five babies born with this ailment. The random variable represents the number of babies that recover fully. Decide whether the experiment is a binomial experiment

Has 2 modes around which the observations are concentrated.

Bimodal

When reaching a conclusion in a hypothesis test, what is the relationship between the P-value and the significance level?

If the P-value is less than or equal to alpha, you reject the null hypothesis. If the P-value is greater than alpha, you fail to reject the null hypothesis. (see 7.1 in class practice)

B. P(A and B) = 0 because A and B cannot occur at the same time

If two events are mutually exclusive, why is P(A and B) = 0 A. P(A and B) = 0 because A and B each have the same probability B. P(A and B) = 0 because A and B cannot occur at the same time C. P(A and B) = 0 because A and B are independent D. P(A and B) = 0 Because A and B are complements of each other

Descriptive problem

Involves finding the value of a parameter or attribute.

Causal problem

Involves seeing if one variable is "caused by" or correlated with another.

Ordinal data

Non-numerical data that has an underlying order

Failure to submit to assigned treatment

Noncompliance

Estimates

Normal letters indicate __________________________, actual numbers calculated from the sample.

Percentiles on the ECDF axis

One can find quantiles of a distribution graph using ______________________.

Fill in the missing value of the probability distribution (test 2 #19) and use your calculator to find the variance and standard deviation of the probability distribution.

P(5) = 0.3 Standard Deviation = 5.679 Variance = 32.251

______ ___________ emphasize how the different categories relate to each other

Pie chart

What are the two types of variables?

Quantitative categorical

Estimators

Random variables representing properties of the sample.

A manufacturer claims that the mean lifetime of its fluorescent bulbs is 1500 hours. A homeowner selects 40 bulbs and finds the mean lifetime to be 1480 hours with a population standard deviation of 80 hours. Test the manufacturers claim. Use alpha = 0.05.

Standardized test statistic ≈ -1.58 Critical value z0 = ±1.96 Fail to reject H0 At the 5% level of significance, there is not enough evidence to reject the manufacturer's claim µ = 1500. (see 7.2 in class practice)

Determine whether the numerical value is a parameter or a statistic and explain your reasoning. A survey of 1004 US adults found that 52% think China's emergence as a world power is a major threat to the well-being of the United States.

Statistic because 52% is referring to a sample of 1004 US adults (see quiz 1)

D. 54 A. Less than the median A. 51 A. Skewed left (mean is smaller than the median)

Stem Leaf 3 44 4 022 5 11111335555 6 00222233 The median is... A. 51 B. 52 C. 53 D. 54 E. 55 The mean is... A. Less than median B. Larger than median C. Equal to median The mode is... A. 51 B. 52 C. 53 D. 54 E. 55 The distribution is... A. Skewed left B. Skewed right C. Symmetric

SAT scores are normally distributed. In a recent year, the mean score was 1498 and the standard deviation was 316. Student A received a score of 1240 and Student B received a score of 2200. Calculate the z score for each student's score.

Student A = -0.82 Student B = 2.22 (see quiz 5)

It is the sum of n Z^2 RVs.

W follows a Chi-squared(n) distribution if ______________________________________.

Cluster Sample

When the population falls into naturally occurring subgroups, each having similar characteristics, a cluster sample may be the most appropriate. To select cluster samples, divide the population into groups, called clusters, and select all the members in one or more (but not all) of the clusters Ex: In the Calcasieu Parish example you could divide the households into clusters according to zip codes, then select all the households in one or more, but not all, zip codes

D. A and C

Which is a quantitative variable? A. Salary B. Religious affiliation C. Grams of fat in a cheeseburger D. A and C E. None of these

B. The scores of students on a very easy exam in which most score perfectly, but a few do very poorly (When the mean is smaller than the median, that typically means a data set that is skewed to the left, leaving the bulk of the scores in the upper range)

Which of the following is likely to have a mean that is smaller than the median? A. The salaries of all NFL players B. The scores of students on a very easy exam in which most score perfectly but a few do very poorly C. Test scores on a standardized test D. The scores of students on a very difficult exam in which most score poorly, but a few do very well

Find the P-value for the hypothesis test with the standardized test statistic z. Decide whether to reject H0 for the level of significance alpha. a. Right-tailed test; z = 0.52; alpha = 0.05 b. Two-tailed test; z = 1.95; alpha = 0.05

a. normalcdf(.52, 1x10^99, 0, 1) = 0.3015 Fail to reject because P > alpha (0.3015 > 0.05) b. P = 2(area of standard test statistic) = 2(0.0256) = 0.0512 Fail to reject because P > alpha (0.0512 > 0.05) (see 7.2 in class practice)

The mean number of green jelly beans in a bag of jelly beans is 26 per bag with a standard deviation of 4. A sample of 64 bags of jelly beans is taken from the population. a. What is the probability that the mean number of green jelly beans per bag for the sample is between 20 and 28 green jelly beans? b. What is the probability that the mean number of jelly beans for the sample is greater than 27 that week? c. What is the probability that a SINGLE bag of jelly beans selected has more than 27 green jelly beans?

a. normalcdf(20, 28, 26, 4/8) = 0.999968 ≈ 1 b. normalcdf(27, 1x10^99, 26, 4/8) = 0.0228 c. normalcdf(27, 1x10^99, 26, 4) = 0.4013 (see test 3 #18)

the slope and intercept of the least square regression line are found using this equation (1/2):

b = r (sY/sX)

frequency histogram

bar graph that represents the frequency distribution

empirical (statistical) probability

based on observations obtained from probability experiments P(E)= frequency of E/ total frequency =f/n

a sample that produces data that's not representative of the population

biased sample

Solve the problem. The top 14 speeds, in miles per hour, for Pro-Stock drag racing over the past two decades are listed below. Find the mode speed. 181.1 202.2 190.1 201.4 191.3 201.4 192.2 201.2 193.2 201.2 194.5 199.2 196.0 196.2 bimodal 201.2 no mode 201.4

bimodal

Ordinal Level of Measurement

both qualitative & quantitative -can be put into order

Double blind study

both the observer and the participants are unaware of certain conditions in the experiment

Identify the data set's level of measurement. the nationalities listed in a recent survey (for example, Asian, European, or Hispanic).

nominal

random

every member of a population has an equal chance of being selected

Random Sample

every member of the population has an equal chance of being selected

Identify whether the statement describes inferential statistics or descriptive statistics. There is a relationship between smoking cigarettes and getting emphysema.

inferential statistics

Solve the problem. Based on previous clients, a marriage counselor concludes that the majority of marriages that begin with cohabitation before marriage will result in divorce. Does this statement describe inferential statistics or descriptive statistics?

inferential statistics

Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A card is drawn from a standard deck of 52 playing cards. A: The result is a 7. B: The result is a jack. mutually exclusive not mutually exclusive

mutually exclusive

Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A die is rolled. A: The result is an odd number. B: The result is an even number. mutually exclusive not mutually exclusive

mutually exclusive

an increase in one variable is associated with an decrease in the other

negative relationship

Double-blind experiment

neither the experimenter nor the subjects know if the subjects are receiving a treatment or a placebo

double-blind experiment

neither the experimenter nor the subjects know whether the subjects are receiving a treatment or a placebo. The experimenter is informed after all the data have been collected. This type of experimental design is preferred by researchers.

Solve the problem. Given the size of a human's brain, x, and their score on an IQ test, y, would you expect a positive correlation, a negative correlation, or no correlation? no correlation negative correlation positive correlation

no correlation

Identify the data set's level of measurement. hair color of women on a high school tennis team

nominal

Use your calculator to find P(z<-2.58 or z>2.58). State what you type into your calculator. Sketching a diagram may be useful.

normalcdf(-1x10^99, -2.58, 0, 1) = 0.0049 normalcdf(2.58, 1x10^99, 0, 1) = 0.0049 0.0049 + 0.0049 = 0.0098 (see quiz 5)

Solve the problem. Decide if the events A and B are mutually exclusive or not mutually exclusive. A student is selected at random. A: The student is taking a math course. B: The student is a business major. not mutually exclusive mutually exclusive

not mutually exclusive

statistic

numerical description of a sample characteristic. Ex: Average age of people from a sample of three states.

Confounding variable

occurs when an experimenter cannot tell the difference between the effects of different factors on a variable

Biased Samples

omit a portion of the population

Solve the problem. Identify the level of measurement for data that are the ratings of a movie ranging from poor to good to excellent.

ordinal

1. Find the mean of the data set 2. Find the difference between each observation and the mean 3. Square each # in the new data set 4. Add up the new data set and divide the numbers by n-1 (the sample minus 1) These are the steps to find the....?

standard deviation steps

Identify the sampling technique used. A market researcher randomly selects 200 drivers under 55 years of age and 200 drivers over 55 years of age.

stratified

Solve the problem. A market researcher randomly selects 200 drivers under 35 years of age and 100 drivers over 35 years of age. What sampling technique was used?

stratified

Solve the problem. A market researcher randomly selects 200 drivers under 35 years of age and 100 drivers over 35 years of age. What sampling technique was used? random stratified systematic cluster convenience

stratified

Solve the problem. Thirty-five sophomores, 35 juniors and 49 seniors are randomly selected from 230 sophomores, 280 juniors and 577 seniors at a certain high school. What sampling technique is used? random systematic stratified convenience cluster

stratified

completely randomized design

subjects are assigned to different treatment groups through random selection. In some experiments, it may be necessary for the experimenter to use blocks, which are groups of subjects with similar characteristics. A commonly used experimental design is a randomized block design.

In a histogram, SHAPE refers to

symmetry/skewedness modality

Q2

the median of the data set

Test the claim about the population mean µ at the level of significance alpha. Assume the population is normally distributed. Claim: µ ≠ 35; alpha 0.05; standard deviation = 2.7 Sample statistics: mean = 34.1; n = 35

z = -1.97 Claim: µ ≠ 35; H0: µ = 35; Ha µ ≠ 35 (two-tailed) P = 2(area of standard test statistic) = 2(0.0244) = 0.0488 Reject because 0.0488 < 0.05 At the 5% level of significance, there is sufficient evidence to support the claim that the mean is ≠ 35. (see 7.2 in class practice)

Provide an appropriate response. Find the z-score for the value 55, when the mean is 58 and the standard deviation is 3. z = 0.90 z = -0.90 z = -1.00 z = -1.33

z= -1.00

Solve the problem. Find the z-score for the value 70, when the mean is 76 and the standard deviation is 2. z= -.89 z= -3.00 z= .89 z= -3.50

z= -3.00

Use the Standard Normal Table to find the z value that would have an area of 0.9916 to the left of it under the standard normal curve.

z=2.39 (see quiz 6)

Solve the problem. Suppose you are using α = 0.01 to test the claim that μ ≤ 29 using a P-value. You are given the sample statistics n = 40, x = 30.8, and s = 4.3. Find the P-value. 0.1030 0.0211 0.0040 0.9960

±

the _______ of the relationship is determined by how _______ the data follow the _____ of the relationship.

strength, closely, form

Replication

the repetition of an experiment under the same or similar conditions

Intersection

the set of outcomes that is contained in both event A and event B at the same time Intersection = and, upside-down u

Union

the set of outcomes that is contained in event A or event B or both. Union = or , u

Exp(2)

A Chi-squared(2) RV has the same pdf as __________________________.

100p% Likelihood interval

A ______________________ is the set of all theta for which the RLF is at least p.

Pareto Chart

A bar graph for qualitative data, with the bars arranged in descending order according to frequencies

Inherent Zero

An inherent zero is a zero that implies "none." For example, the amount of money you have in a savings account could be zero dollars. The zero represents no money; it is an inherent zero. A temperature of 0°𝐶𝐶 does not represent a condition in which no heat is present. The 0°𝐶𝐶 temperature is simply a position on the Celsius scale; it is not an inherent zero

In 2014, the mean starting salary for an undergraduate student graduating with a degree in mathematics or statistics was $53,000 with a standard deviation of $4,000. 35 students graduated with math degrees from Fairfield in 2014. Use Chebychev's Theorem to estimate at least how many Fairfield math grads had a starting salary between $45,000 and $61,000.

At least 26 grads had a starting salary between $45,000 and $61,000. (see test 1 #20)

Chi-squared(1)

For large n, -2logR(theta) converges to _________________________.

N(n,2n)

For large n, Chi-squared(n) is approximately distributed as ______________________.

Normal Distribution

Bell Shaped; data is symmetric around the center

This is an example of EMPIRICAL probability, since THE STATED PROBABILITY IS CALCULATED BASED ON OBSERVATIONS FROM THE COMPANY RECORDS

Classify the following example as classical, empirical or subjective. Explain why. According to company records, the probability that a washing machine will need repairs during a six-year period is 0.09

This is an example of CLASSICAL probability, since EVERY COMBINATION OF 6 NUMBERS HAS AN EQUAL CHANCE OF BEING DRAWN

Classify the following example as classical, empirical or subjective. Explain why. The probability of choosing 6 numbers from 1 to 52 that match the 6 numbers drawn by a certain lottery is 1/20358520 = 0.00000005

Statistical Inference

Conclusion drawn about a population from a sample

The events are DEPENDENT because the outcome of returning a rental movie after the due date AFFECTS the probability of the outcome of receiving a late fee

Determine whether the events are independent or dependent Returning a rented movie after the due date and receiving a late fee

These events ARE NOT mutually exclusive since IT IS POSSIBLE TO SELECT A FEMALE HISTORY MAJOR WHO IS 21 YEARS OLD

Determine whether the following events are mutually exclusive Event A. Randomly select a female history major Event B. Randomly select a history major who is 21 years old

C. True

Determine whether the statement is true or false. If two events are mutually exclusive, they have no outcomes in common A. False, if two events are mutually exlcusive, they have some outcomes in common B. False, if two events are mutually exclusive they have every outcome in common C. True

Measures of center (2)

Mean and median

1

The area under a density histogram is ___________.


संबंधित स्टडी सेट्स

Chapter 20: Assessment of Respiratory Function

View Set

Intermediate MicroEconomics: Chapter 1 Book Terms

View Set

Excel Lesson 1 & 2: Microsoft Excel Basics

View Set

Grammar structures and sentences(好是好...可是...)

View Set

Chapter 5: Human Resource Planning and Recruitment

View Set

Chapter 13: Care of the Patient with a Sensory Disorder

View Set