statistics online
when taking a systematic random sample of size n, every group of size n from the population has the same chance of being selected
false, because certain groups would never be selected
a simple random sample is always preferred because it obtains the same information as other sampling plans but requires a smaller sample size
false, because other sampling techniques may provide more information for less cost than a simple random sample
when obtaining a stratified sample, the number of individuals included within each stratum must be equal
false. within stratified samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population
blocking
grouping together similar experimental units and then randomly assigning the experimental units within each group to a treatment
designed experiment
if a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group
setup- a, b, c, d, e size- 48, 40, 59, 41, 43 screen type- plasma, projection, projection, plasma, projection number of channels available- 299, 111, 425, 270, 290
individuals being studied: the characteristics of high-definition televisions A through E variables and their corresponding data being studied: size (48, 40, 59, 41, 43), screen type (plasma, projection, projection, plasma, projection), and number of channels available (299, 111, 425, 270, 290)
the human resource department at a certain company wants to conduct a survey regarding worker benefits. the department has an alphabetical list of all 7358 employees at the company and wants to conduct a systematic sample of size 70.
k = 105 determine the individuals who will be administered the survey. randomly select a number from 1 to k. suppose that we randomly select 4. starting with the first individual selected, the individuals in the survey will be 4, 109, ..., 7249
frequency distribution
lists the number of occurrences of each category of data
relative frequency distribution
lists the proportion of occurrences of each category of data
which measure of central tendency best describes the "center" of the distribution?
mean
an insurance company crashed four cars of the same model at 5 mph. the costs of repair for each of the four crashes were 411, 443, 468, and 232. compute the mean, median, and mode cost of repair.
mean- 388.5 median-427 mode does not exist
observational study
measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. that is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study
surveys tend to suffer from low response rates. based on past experience, a researcher determines that the typical response rate for an email survey is 40%. she wishes to obtain a sample of 400 respondents, so she emails the survey to 2000 randomly selected email addresses. assuming the response rate for her survey is 40%, will respondents form an unbiased sample?
no. the survey still suffers from undercoverage (sampling bias), nonresponse bias, and potentially response bias
a polling organization conducts a study to estimate the percentage of households that has two incomes. it mails a questionnaire to 1841 randomly selected households across the united states and asks the head of each household if he or she has two incomes. of the 1841 households selected, 42 responded.
nonresponse bias
a polling organization conducts a study to estimate the percentage of households that home school their children. it mails a questionnaire to 1958 randomly selected households across the United States and asks the head of each household if he or she home school their children. of the 1958 households selected, 18 responded.
nonresponse bias
inferences based on voluntary response samples are generally not reliable
true, because it is often the case that the individuals who volunteer do not accurately represent the population
a data set will always have exactly one mode
false
stem-and-leaf plots are particularly useful for large sets of data
false
the weight of an organ in adult males has a bell-shaped distribution with a mean of 300 grams and a standard deviation of 35 grams. use the empirical rule to determine the following (a) about 95% of organs will be between what weights? (b) what percentage of organs weighs between 265 grams and 335 grams? (c) what percentage of organs weighs less than 265 grams or more than 335 grams? (d) what percentage of organs weighs between 195 grams and 370 grams?
(a) 230 and 370 grams (b) 68% (c) 32% (d) 97.35%
three major categories of observational studies
1. cross-sectional studies: collect information about individuals at a specific point in time or over a very short period of time 2. case-control studies: retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records 3. cohort studies: first identify a group of individuals to participate in the study (the cohort) then observes them over a long period of time
a salesperson obtained a systematic sample of size 30 from a list of 600 clients. to do so, he randomly selected a number from 1 to 20, obtaining the number 12. he included in the sample the 12th client on the list and every 20th client thereafter. list the numbers that correspond to the 30 clients selected
12, 32, ..., 592
a salesperson obtained a systematic sample size of 25 from a list of 500 clients. to do so, he randomly selected a number 1 to 20, obtaining number 13. he included in the sample the 13th client on the list and every 20th client thereafter. list the numbers that correspond to the 25 clients selected.
13, 33, ..., 493
for a large sporting event the broadcasters sold 51 ad slots for a total revenue of $135 million. what was the mean price per ad slot?
2.6 million
*the median for the given set of six ordered data values is 29.5
7 12 21 38***** 41 51
generally the goal of an experiment is to determine the effect that the treatment will have on the response variable
TRUE
pareto chart
a bar graph whose bars are drawn in decreasing order of frequency or relative frequency
bar graph
a horizontal or vertical representation of the frequency or relative frequency of the categories. the height of each rectangle represents the category's frequency or relative frequency
census
a list of all individuals in a population along with certain characteristics of each individual
parameter
a numerical summary of a population
statistic
a numerical summary of a sample
what does it mean when a part of the population is under-represented?
a part of the population is under-represented when it is proportionally smaller in a sample than in its population
individual
a person or object that is a member of the population being studied
experimental unit
a person, object, or some other well-defined item upon which a treatment is applied
factor
a variable whose effect on the response variable is to be assessed by the experimenter
lurking variable
an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. in addition, lurking variables are typically related to explanatory variables considered in the study
placebo
an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication
treatment
any combination of the values of the factors (explanatory variables)
the manager of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 120 customers who come into the food court during weekend evenings to determine what types of food the shoppers would like to see added to the food court
cause of bias: sampling bias best way to remedy this problem: ask customers throughout the day on both weekdays and weekends
the owner of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 110 customers who come into the food court during weekday afternoons to determine what types of food the shoppers would like to see added to the food court
cause of bias: sampling bias best way to remedy this problem: ask customers throughout the day on both weekdays and weekends
to determine customer opinion of their safety features, daimler- chrysler randomly selects 120 service centers during a certain week and surveys all customers visiting the service center
cluster
to determine customer opinion of their pricing, greyhound lines randomly selects 60 busses during a certain week and surveys all passengers on the busses
cluster sampling
descriptive statistics
consists of organizing and summarizing information collected
a polling organization conducts a study to estimate the percentage of households that speak a foreign language as the primary language. they mail a questionnaire to 1,023 randomly selected households and asks the head of household if a foreign language is the primary language spoken at home. of the 1,023 households selected, 12 responded. this survey has bias.
nonresponse bias possible remedy: conduct face-to-face or telephone interviews
cluster sample
obtained by dividing the population into groups and selecting all individuals within a random sample of the groups
stratified sample
obtained by dividing the population into homogenous groups and randomly selecting individuals from each group
confounding
occurs when the effects of two or more explanatory variables are not separated. therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study
what are some solutions to nonresponse?
offer rewards and incentives, attempt callbacks
find the population variance and standard deviation: 8, 11, 15, 17, 19
population variance: 16 standard deviation: 4
a polling organization contacts 2526 undergraduates who attend a university and live in the United States and asks whether or not they had spent more than $200 on food in the last month
population: undergraduates who attend a university and live in the united states sample: the 2526 undergraduates who attend a university and live in the united states
a marketing research firm wishes to determine the most effective method of promoting a rock band: print, radio, television, or online. the researcher segments volunteers by their ages. of the 490 volunteers, 140 are under 20 years old, 70 are 20-39 years old, 140 are 40-59 years old, and 140 are 60 years old or older. the volunteers from each group are randomly assigned to either the print advertising group, the radio group, the television group, or the online group. each group is exposed to the advertising. after 1 hour, a recall exam is given with the proportion of correct answers recorded.
randomized block design response variable: the scores on the recall exam explanatory variable manipulated: type of advertising 4 treatments
a study conducted by researchers was designed "to determine if the application of duct tape is as effective as cryotherapy in the treatment of common warts." the researchers randomly divided 50 patients into two groups. the 25 patients in group 1 had their warts treated by applying duct tape. the 25 patients in group 2 had their warts treated by cryotherapy. once the treatments were complete, it was determined that 66% of the patients in group 1 & 86% of the patients in group 2 had complete resolution of their warts. the researchers concluded that cryotherapy is significantly more effective in treating warts than duct tape.
research objective: to determine if duct tape is as effective as cryotherapy in treating warts sample: the 50 patients with warts
a pro-life advocate wants to estimate the percentage of people who favor closing abortion clinics. she conducts a nationwide survey of 1980 randomly selected adults 18 years and older. the interviewer asks the respondents, "do you favor protecting unborn children by closing abortion clinics?"
response bias
to determine the public's opinion of the police department, the police chief obtains a cluster sample of 15 census tracts within his jurisdiction and samples all households in the randomly selected tracts. uniformed police officers go door to door to conduct the survey
response bias possible remedy: conduct a polling without police uniform
researchers wish to know if there is a link between hypertension (high blood pressure) and consumption of salt. past studies have indicated that the consumption of fruits and vegetables offsets the negative impact of salt consumption. it is also known that there is quite a bit of person-to-person variability as far as the ability of the body to process and eliminate salt. however, no method exists for identifying individuals who have a higher ability to process salt. it is recommended that daily intake of salt should not exceed 2300 milligrams (mg). the researchers want to keep the design simple, so they choose to conduct their study using a completely randomized design.
response variable: blood pressure three factors that have been identified: daily consumption of fruits and vegetables, daily consumption of salt, body's ability to process salt blood pressure- not a factor daily consumption of salt- can be controlled daily consumption of fruits and vegetables- can be controlled body's ability to process salt- cannot be controlled age- not a factor gender- not a factor if a factor cannot be controlled, what should be done to reduce variability in the response variable? experimental units should be randomized to each treatment group
a school psychologist wants to test the effectiveness of a new method of teaching statistics. she recruits 200 second-grade students and randomly divides them into two groups. group 1 is taught by means of the new method, while group 2 is taught via traditional methods. the same teacher is assigned to both groups. at the end of the year, an achievement test is administered and the results of the two groups compared
response variable: the score on the achievement test explanatory variable manipulated: method of teaching 2 levels of treatment type of experimental design: completely randomized assignment subjects: 200 students
find the sample variance and standard deviation: 23, 13, 6, 10, 9
s2= 42.7 s= 6.5
define statistics
statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. in addition, statistics is about providing a measure of confidence in any conclusions
which sampling method does not require a frame?
systematic
to estimate the percentage of defects in a recent manufacturing batch, a quality control manager at IBM selects every 14th computer that comes off the assembly line starting with the fourth until she obtains a sample of 30 computers
systematic sampling
classes
the categories by which data are grouped
variables
the characteristics of the individuals of the population being studied
confounding
the effect of two factors (explanatory variables on the response variable) cannot be distinguished
a histogram of a set of data indicates that the distribution of the data is skewed right. which measure of central tendency will likely be larger, the mean or the median? why?
the mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail
response variable
the quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable
what makes the range less desirable than the standard deviation as a measure of dispersion?
the range does not use all the observations
compute the range and sample standard deviation for strength of the concrete (in psi): 3970, 4140, 3400, 3200, 2910, 3840, 4140, 4040
the range is 1230 psi s=472 psi
what are the advantages of having a presurvey with open questions to assist in constructing a questionnaire that has closed questions?
the researcher can learn common answers
complete the paragraph
the standard deviation is used in conjunction with the mean to numerically describe distributions that are bell shaped. the mean measures the center of the distribution, while the standard deviation measures the spread of the distribution
a study is conducted to determine if there is a relationship between Parkinson's disease and childhood head trauma. doctors look at the hospital records for patients with parkinson's disease for any childhood head trauma
the study is an observational study because the study examines individuals in a sample, but does not try to influence the response variable
while shopping, 350 people are asked to perform a taste test in which they drink two randomly placed, unmarked coffees. they are then asked which coffee they prefer
the study is an observational study because the study examines individuals in a sample, but does not try to influence the variable of interest
area of a park
the variable is continuous because it is countable
height of an office building
the variable is continuous because it is not countable
medal won in race
the variable is qualitative because it is an attribute characteristic
nation of origin
the variable is qualitative because it is an attribute characteristic
a sample of seniors is selected and it is found that 45% own a television
this is a statistic because the value is a numerical measurement describing a characteristic of a sample
the average annual salary of 50 of a company's 800 employees is $54,000
this is a statistic, because the data set of salaries of 50 employees is a sample
researchers wanted to determine if having a tv in the bedroom is associated with obesity. the researchers administered a questionnaire to 380 twelve-year-old adolescents. after analyzing the results, researchers determined that the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom
this is an observational study because the researchers observe the behavior of the individuals in the study without trying to influence an explanatory variable of the study cross-sectional study the response variable is the body mass index of the adolescents the explanatory variable is whether the adolescent has a tv in the bedroom or not possible lurking variables might be eating habits and the amount of exercise per week "these results remain significant after adjustment for socioeconomic status" means that the researchers made an effort to avoid confounding by accounting for potential lurking variables a television in the bedroom and obesity are associated because the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom
when comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure
true, because the standard deviation describes how far, on average, each observation is from the typical value. a larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.
when conducting a cluster sample, it is better to have fewer clusters with more individuals when the clusters are heterogeneous
true, because when the clusters are heterogeneous, they are scaled down versions of the population
chebyshev's inequality applies to all distributions regardless of shape, but the empirical rule holds only for distributions that are bell shaped
true, chebyshev's inequality is less precise than the empirical rule, but will work for any distribution, while the empirical rule only works for bell-shaped distributions
researchers wanted to evaluate whether a certain herb improved memory in elderly adults as measured by objective tests. to do this, they recruited 98 men and 125 women older than 65 years and in good health. participants were randomly assigned to receive the herb, 45 mg 3 times a day, or a matching placebo. a measure of memory improvement was determined by a standardized test of learning and memory
type of experimental design: completely randomized design population being studied: adults older than 65 years and in good health response variable: score on standardized test of learning and memory what is the factor? the herb treatments: 45 mg 3 times a day or a matching placebo experimental units: 98 men and 125 women older than 65 who are in good health that participated in the study
inferential statistics
uses methods that generalize results obtained from a sample to the population and measure the reliability of the results
the sum of the deviations about the mean always equals
zero