Statistics Homework:1.

Ace your homework & exams now with Quizwiz!

**What does it mean when an observational study is​ retrospective? @What does it mean when an observational study is​ prospective?

**A retrospective study requires that individuals look back in time or require the researcher to look at existing records. @A prospective study collects the data over time.

**Define simple random sampling. @What does it mean when sampling is done without​ replacement?

**A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample. @Once an individual is​ selected, the individual cannot be selected again.

A​ quality-control manager randomly selects 90 bottles of bleach that were filled on October 25 to assess the calibration of the filling machine. **What is the population in the​ study? @What is the sample in the​ study?

**All bottles of bleach produced in the plant on October 25. @The 90 bottles of bleach selected in the plant on October25.

**What is an observational​ study? @What is a designed​ experiment? ^^Which allows the researcher to claim causation between an explanatory variable and a response​ variable?

**An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. @A designed experiment is when a researcher assigns individuals to a certain​ group, intentionally changing the value of an explanatory​ variable, and then recording the value of the response variable for each group. ^^A designed experiment allows the researcher to claim causation between an explanatory variable and a response variable

**What is meant by​ confounding? @What is a lurking​ variable? ^^What is a confounding​ variable?

**Confounding in a study occurs when the effects of two or more explanatory variables are not separated.​ Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. @A lurking variable is an explanatory variable that was not considered in a​ study, but that affects the value of the response variable in the study. In​ addition, lurking variables are typically related to explanatory variables in the study. ^^A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study.

**What is a​ cross-sectional study? @What is a​ case-control study? ^^Which is the superior observational​ study?

**Cross-sectional studies are observational studies that collect information about individuals at a specific point in time or over a very short period of time. @Case-control studies are observational studies that are​ retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. ^^Neither study is always the superior to the other. Both have advantages and disadvantages that depend on the situation.

**For a distribution that is symmetric, the left whisker is the same length as the right whisker. In a boxplot, **For a distribution that is skewed​ left, the left whisker is longer than the right whisker.

**EXTRA if the median is to the left of the center of the box and the right whisker is substantially longer than the left whisker, the distribution is skewed right

Does the description correspond to an observational study or an​ experiment? **Example: Placebo/ drug, What type of study? @Example: 2 Groups, 1 takes breaks, others don't, stress test given ^^Example: Poll given for class preference

**Experiment because researchers control one variable to determine effect on the response variable. @The study is an experiment, researchers control one variable to determine the effect on response variable. ^^The study is observational, examines individuals in a sample, not try to influence the response variable.

Statement is true or false. **When taking a systematic random sample of size​ n, every group of size n from the population has the same chance of being selected. @When conducting a cluster​ sample, it is better to have fewer clusters with more individuals when the clusters are heterogeneous. ^^Inferences based on voluntary response samples are generally not reliable. ++When obtaining a stratified​ sample, the number of individuals included within each stratum must be equal.

**False, because certain groups would never be selected. @True, because when the clusters are​ heterogeneous, they are scaled down versions of the population. ^^True, because it is often the case that the individuals who volunteer do not accurately represent the population. ++False. Within stratified​ samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population.

Z-Score Example:z =x−μ/σstandard deviation One year Larry had the lowest ERA​ (earned-run average, mean number of runs yielded per nine innings​ pitched) of any male pitcher at his​ school, with an ERA of 2.98. Also, Vanessa had the lowest ERA of any female pitcher at the school with an ERA of 3.44 For the​ males, the mean ERA was 4.221 and the standard deviation was 0.898. For the​ females, the mean ERA was 5.325 and the standard deviation was 0.799.Find their respective​ z-scores. Which player had the better year relative to their​ peers, Larry or Vanessa?​(Note: In​ general, the lower the​ ERA, the better the​ pitcher.)

**Larry had an ERA with a​ z-score of negative −1.38 **Vanessa had an ERA with a​ z-score of negative −1.61. ***First identify each value for the formula.*** -The value of x for Larry is 2.98. -The value of μ for the males is 4.221. The value of σ(standard deviation) for the males is 0.898. **Substitute the values into the formula and calculate the​ z-score, rounding to two decimal places. z =x−μ/σstandard deviation so here's Larry= 2.98 - 4.221 / 0.898 = - 1.38 so here's Vanessa= 3.44-5.325/0.799 = -2.36 @Larry​'s ​z-score of −1.38 is higher than Vanessa​'s z-score of -2.36. So Vanessa ad the better year in relation to her peers.

A polling organization contacts 2691 male university graduates who have a white collar job and asks whether or not they had received a raise at work during the past 4 months. **What is the population in the​ study? @What is the sample in the​ study?

**Male university graduates who have a white job. @The 2691 male university graduates who have a white collar job.

Determine the level of measurement of the variable. **Positions of persons in a line. @Nation of origin **Year of birth of college students ++Weight of a child.

**Ordinal @Nominal **Interval ++ 50lbs, 55 lbs, 60 lbs, 70 lbs

What type of Sample used? **To estimate the percentage of defects in a recent manufacturing​ batch, a quality control manager at Daimler minus Chrysler selects every 14th van that comes off the assembly line starting with the fifth until she obtains a sample of 150 vans. @To determine customer opinion of their musical variety​, Sony randomly selects 80 concerts during a certain week and surveys all concert goers. ^^Toyota wants to administer a satisfaction survey to its current customers. Using their customer​ database, the company randomly selects 70 customers and asks them about their level of satisfaction with the company. ++A magazine asks its readers to call in their opinion regarding the amount of the advertising in the present issue. &&To determine her stress level​, Carrie divides up her day into three​ parts: morning,​ afternoon, and evening. She then measures her stress level at 3randomly selected times during each part of the day.

**Systematic @ Cluster ^^Simple random ++Convenience &&Stratified

Determine whether the underlined value is a parameter or a statistic. **In a national survey on substanceIn a national survey on substance abuse, 66.4 % of respondents who were full- time college students aged 18 to 22 reported using alcohol within the past month. ^^In a championship football game, a quarterback completed 59 % of his passes for a total of 265 yards and 2 touchdowns.

**The value is a statistic because the respondents who were full-time college students aged 18 -22 sample. ^^The value is a parameter because the quarterback's passes are a population.

Determine whether the variable is qualitative or quantitative. Is the variable qualitative or​ quantitative? **Amount of money won in a lottery @Number of pets ^^Number on an athlete's jersey ++Time in hours that a light bulb lasts.

**The variable is quantitative / numerical measure. @the variable is quantitative/ numerical measure. ^^The variable is qualitative/an attribute characteristic. ++The variable is quantitative /numerical measure.

Homework 2. 1 **What is a bar​ graph? ^^What is a Pareto​ chart?

**bar graph is a horizontal or vertical representation of the frequency. ^^A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency/rel.freq also

​**A(n)_____is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups. @A(n) ____is obtained by dividing the population into homogeneous groups and randomly selecting individuals from each group.

**cluster sample @stratified sample

**descriptive statistics @inferential statistics

**consist of organizing and summarizing data; describe data through numerical summaries, tables, and graphs @Use methods that take results from a sample, extend them to the population, and measure the reliability of the result

*A(n) _________ is a person or object that is a member of the population being studied. ​@A(n)______is a numerical summary of a sample. ?​A(n)______is a numerical summary of a population

*individual @statistic ?parameter

Explain the difference between a population and a sample

A population is the entire group that is being studied while a sample is a subset of the population that is being studied.

Explain the meaning of the following percentiles in parts​ ​(a) The 10th percentile of the weight of males 36 months of age in a certain city is 12.0 kg. ​(b) The 95th percentile of the length of newborn females in a certain city is 53.8 cm.

A. 10​% of​ 36-month-old males weigh 12.0 kg or​ less, and 90​% of​ 36-month-old males weigh more than 12.0 kg. B. 95​% of newborn females have a length of 53.8cm or​less, and 5% of newborn females have a length that is more than 53.8cm.

Are the following statements true or​ false? ​(a) In​ statistics, results are always reported with​ 100% certainty. Choose the correct answer below. ​(b) Statistical studies are not concerned with understanding the sources of variability in​ data, only with describing the variability in the data. Choose the correct answer below. c) Suppose three different individuals conduct the same statistical​ study, such as estimating the average commute time of students at a college. It is possible that all three studies end up with different results. Choose the correct answer below.

A. False. In​ statistics, results are not reported with​ 100% certainty. Because statistical studies draw on​ samples, and because there is variation within​ groups, results cannot be reported with​ 100% certainty. B. False. Statistical studies are concerned with both describing the variability in the data and understanding the sources of variability in data. Understanding the sources allows researchers to control it and reach better conclusions. C. True. Statistical studies typically look at samples rather than entire populations. Since each study is likely to draw different​ samples, it is quite possible that each study ends up with different​ results, due to variability in the data.

To find the​ quartiles, first arrange the data in ascending order. In this​ case, the data are already in ascending order​ (as read down the​ columns). A.Determine the​ median, or second​ quartile, Q2. Since there are 24​ observations, the median is the mean of the 12th and 13th observations. The 12th and 13th observations are 38.1 and 38.7. Find the mean of these two values to determine Q2. B.The bottom half of data includes the observations less than the median. Determine the median for this data​ set, or the first​ quartile, Q1. Since there are 12​ observations, the median is the mean of the 6th and 7th observations. The 6th and 7th observations in the bottom half of data are 36.6 and 37.1 Find the mean of these two values to determine Q1. C.The top half of data includes the observations greater than the median. Determine the median for this data​ set, or the third​ quartile, Q3. Since there are 12​ observations, the median is the mean of the 18th and 19th observations. The 6th and 7th observations in the top half of data are 40.5 and 41.2. ***​(c) The interquartile range is the difference between the third and first quartiles. IQR=Q3minus−Q1 (d) The fences are cutoffs for determining outliers. Lower fence= Q1−​1.5(IQR) Upper fence equals= Q3+​1.5(IQR)

A. Q2=38.1+38.7/ 2 =38.4mpg B. Q1=36.6+37.1/2=36.85mpg C.Q3=40.5+41.2/2equals=40.85mpg c.)Compute the interquartile range. IQR=Q3−Q1 =40.85−36.85 =4mpg d. Calculate the lower fence with Q1=36.85and IQR=4 Lower fence=36.85−​1.5(4​)=30.85 Calculate the upper fence with Q3=40.85 and IQR=4 Upper fence=40.85+​1.5(4​)equals=46.85 SO: A data point is considered an outlier using this method if it is less than the lower fence or greater than the upper fence. Notice that the largest data​ value, 47.6 is greater than the upper​ fence, 46.85, so it is an outlier.

​(a) Would it make sense to draw a pie chart for land​ area? ​(b) Would it make sense to draw a pie chart for the highest​ elevation?

A. yes B. no

A. The ___class limit is the smallest value within the class and the__class limit is the largest value within the class. B. The ___is the difference between consecutive lower class limits.

A.lower, upper B. class width

The Empirical Rule says that if a distribution is roughly bell​ shaped, the following is true.

Approximately​ 68% of the data will lie within 1 standard deviation of the mean. ​· Approximately​ 95% of the data will lie within 2 standard deviations of the mean. ​· Approximately​ 99.7% of the data will lie within 3 standard deviations of the mean.

#2 Example z-score Suppose babies born after a gestation period of 32 to 35 weeks have a mean weight of 3000 grams and a standard deviation of 700 grams while babies born after a gestation period of 40 weeks have a mean weight of 3300 grams and a standard deviation of 440 grams. If a 33​-week gestation period baby weighs 2850 grams and a 40​-week gestation period baby weighs 3150grams, find the corresponding​ z-scores. Which baby weighs less relative to the gestation​ period?

B.The baby born in week 40 weighs relatively less since its​ z-score, -.34​, is smaller than the​ z-score of -.21 for the baby born in week 33. **So 33week z-score = 2850 - 3000 / 700 = - .21 **So 40week z-score = 3150 - 3300 / 440 = - .34

The​ five-number summary is 41,43​,47​,55​,69

BY first Ascending order then separate to smallest # , then Q1, Median(Middle# between all), Q3, Largest # Skewed right if more on right of median, skewed left if more on left of median for BOX PLOT

Variables

Characteristics of the individuals within the population

Uniform distribution/ Bell shaped, Symmetry

Distribution where populations are spaced evenly

The cutoff point is 649minutes. ​(Round to the nearest​ minute.) Upper fence equals= Q3+​1.5(IQR)

Get by Q3+1.5(104)=649

Explain the difference between a​ single-blind and a​ double-blind experiment.

In a​ single-blind experiment, the subject does not know which treatment is received. In a​ double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.

Example: Can the researchers conclude that proximity with​ high-tension wires causes leukemia in​ children?

No, because this is an observational study.

A measure of central tendency

Numerically describes the average or typical data value. Three measures of central tendency are the​ mean, the​ median, and the mode. The mean and median are usually used to measure the central tendency of a numerical data set. When the data set is skewed the median is the preferred measure of central tendency.

Determine whether the underlined numerical value is a parameter or a statistic. Explain your reasoning. A certain zoo found that Modifying 8 % of its 843 animals were nocturnal.

Parameter​, data set 843 animals in zoo is a population.

Explain what is meant when it is said that​ "data vary". How does the variability affect the results of statistical​ analysis?

Saying​ "data vary" means that the values of the variable change from individual to individual. In​ addition, certain variables

For car prices, state whether you would expect a histogram of the data to be​ bell-shaped, uniform, skewed​ left, or skewed right.

Skewed right

To find the minimum score: Example: A highly selective boarding school will only admit students who place at least 1.5 standard deviations above the mean on a standardized test that has a mean of 300 and a standard deviation of 24. What is the minimum score that an applicant must make on the test to be​ accepted? To find the minimum score that an applicant must make on the test to be​ accepted, solve for x. Start x -150 / 18 Example: x−150/18=3.5 SO DO THIS x=​(3.5)(18)+150 x=213

Start x -150 / 18 Example: x−150/18=3.5 SO DO THIS x=​(3.5)(18)+150 x=213

Define statistics

Statistics is the science of​ collecting, organizing,​ summarizing, and analyzing information to draw a conclusion and answer questions. In​ addition, statistics is about providing a measure of confidence in any conclusions.

Discuss the advantages and disadvantages of histograms versus​ stem-and-leaf plots.

Stem-and-leaf plots are easier to make and can contain more information than histograms.​ However, they are not very useful for large data sets.

Skewed Left Distribution

Tail on the left. Mean less than the median. Examples: When data are either skewed left or skewed​ right, there are extreme values in the​ tail, which tend to pull the mean in the direction of the tail. For​ example, in​ skewed-right distributions, there are large observations in the right tail.

BOX PLOT-Median, Minimum, Maximum

Third quartile is end of box R side Dispersion=The more spread a set of data​ has, the higher the interquartile range will be. **Use this information to determine which variable has more dispersion.

Following statement is true or false. ​Generally, the goal of an experiment is to determine the effect that the treatment will have on the response variable.

True

The ___represents the number of standard deviations an observation is from the mean.

Z-score

​(a)Experimental unit ​(b)Treatment (c)Response variable ​(d)Factor ​(e)Placebo ​(f)Confounding

a)A​ person, object, or some other​ well-defined item upon which a treatment is applied ​(b)Any combination of the values of the factors​ (explanatory variables) (c)The quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable ​(d)A variable whose effect on the response variable is to be assessed by the experimenter ​(e)An innocuous​ medication, such as a sugar​ tablet, that​ looks, tastes, and smells like the experimental medication ​(f)The effect of two factors​ (explanatory variables on the response​ variable) cannot be distinguished.

Example: Researchers wanted to test the effectiveness of a new drug therapy for treating patients with diabetes. To do​ this, they identified 120 patients with a diagnosis of diabetes. (a) What type of experimental design is​ this? (b) What is the population being​ studied? ​(c) What is the response variable in this​ study? (d) What are the​ treatment(s)? (e) Identify the experimental units. Choose the correct answer below. ​(f) Which figure illustrates the design of this​ experiment?

a. Completely randomized design b. All patients with a diagnosis of diabetes c.The score on the standardized rating scale for diabetes d.The new drug​ therapy, the older drug​ therapy, and the placebo therapy e.The 120 patients with a diagnosis of diabetes f.

It is extremely important for a researcher to clearly define the variables in a study because this helps to determine the type of analysis that can be performed on the data. For​ example, if a researcher wanted to describe people based on Social Security number​, what level of measurement would the variable ​"Social Security number​" ​be? Now suppose the researcher felt that certain people with a greater birth weight received higher numbers. Does the level of measurement of the variable​ change? If​ so, how? a. What is the level of measurement of the variable ​"Social Security number​"in the original​ scenario? b. Does the level of measurement of the variable change in the second​ scenario?

a. Nominal b. Yes, it changes to ordinal

A school psychologist wants to test the effectiveness of a new method of teaching statistics. She recruits 800fourth​-grade students and randomly divides them into two groups. (a) What is the response variable in this​ experiment? ​(b) Is the response variable qualitative or​ quantitative? ​(c) Which of the following explanatory variables is​ manipulated? ​(d) What are the​ treatments? How many treatments are​ there? ​(e) How are the factors that are not controlled dealt​ with? ​(f) Which group serves as the control​ group? (g) What type of experimental design is​ this? Identify the subjects. (h) Identify the subjects. ​(i) Draw a diagram to illustrate the design.

a. The scores on the achievement tests of both group 1 and group 2 b. The response variable is quantitative because it is a measurement. c. Method of teaching d.The treatments are the new teaching method and the traditional teaching. There are 2 treatments. e.Random assignment f. Group 2 serves as the control group because this group corresponds to the standard method that will be compared to the other method. g. Completely randomized design h. The 800 students

Example: Happiness and High Bloodpressure: (a) What type of observational study was​ this? ​(b) What is the response​ variable? What is the explanatory​ variable? c. Explain what this sentence means.

a. This was a cohort study, because information was collected about a group of individuals by observing them over a long period of time. b. The response variable is whether or not high blood pressure was contracted, because it is the variable of interest. The explanatory variable is level of happiness, because it affects the other variable. c. The researchers may be concerned with confounding that occurs when the effects of two or more explanatory variables

Example: Daily kale consumption and the occurrence of high cholesterol. (a) What type of observational study was​ this? ​(b) What is the response variable in the​ study? Is the response variable qualitative or​ quantitative? What is the explanatory​ variable? ​(c) In their​ report, the researchers stated that​ "After adjusting for various demographic and lifestyle​ variables, daily consumption of two or more servings was associated with a​ 30% reduced prevalence of high cholesterol​." Why was it important to adjust for these​ variables?

a. This was a​ cross-sectional study because all information about the individuals was collected at a specific point in time. b. The response variable is whether the woman has high cholesterol or not. The response variable is qualitative. -The explanatory variable is consumption of kale. c. The researchers may be concerned with confounding that occurs when the effects of two or more explanatory variables are not separated or when there are some explanatory variables that were not considered in a​ study, but that affect the value of the response variable.

A study conducted by researchers was designed​ "to determine if application of duct tape is as effective as cryotherapy in the treatment of common​ warts." The researchers randomly divided 64 patients into two groups. The 32 patients in group 1 had their warts treated by applying duct tape. The 32 patients in group 2 had their warts treated by cryotherapy. Once the treatments were​ complete, it was determined that 66​% of the patients in group 1 and 19​% of the patients in group 2 had complete resolution of their warts. The researchers concluded that duct tape is significantly more effective in treating warts than cryotherapy. a. What is the research​ objective? b.What is the population being​ studied? What is the​ sample? c. What are the descriptive​ statistics? d. What is the conclusion of the​ study?

a. To determine if duct tape is as effective as cryotherapy in the treatment of warts b.All people who have warts/ The 64 patients with warts c. 66​%of patients in group 1 and 19​%of patients in group 2 had resolution of their warts. d. Duct tape is significantly more effective than cryotherapy in treating warts.

About birth and seasons: c. What type of variable is the season in which you were​ born? e. What conclusion was drawn from the​ study?

c. Qualitative, nominal e. Season of birth plays a role in​ one's temperament.

Histogram (relative frequency histogram)

directions

The kth percentile Violent crimes include​ rape, robbery,​ assault, and homicide. The following is a summary of the​ violent-crime rate​ (violent crimes per​ 100,000 population) for all states of a country in a certain year. Q1equals=274.8 Q2equals=386.2 Q3equals=526.9 The kth percentile of a set of data is a value such that k percent of the observations are less than or equal to the value. (a) Provide an interpretation of these results. (b) Determine and interpret the interquartile range. The interquartile range is____crimes per​ 100,000 population. IQR=Q3−Q 1 The interquartile range is 526.9−274.8= 252.1 crimes per​ 100,000 population. ​(c) The​ violent-crime rate in a certain state of the country in that year was 1, 502. Would this be an​ outlier? First find the upper fence. Recall from part​ (b) that the interquartile range is 252.1 **Upper fence=Q3+​1.5(IQR) =526.9+​1.5(252.1) =905.05 **Next find the lower fence. Lower fence =Q1−​1.5(IQR) =274.8−​1.5(252.1) =-103.35 ​(d) Do you believe that the distribution of​ violent-crime rates is skewed or​ symmetric? So: Find the difference Q2−Q1. So: Now find the difference Q3−Q 2 If the differences are not approximately​ equal, the distribution is skewed.

​A.25% of the states have a​ violent-crime rate that is 274.8 crimes per​ 100,000 population or less.​ 50% of the states have a​ violent-crime rate that is 386.2 crimes per​ 100,000 population or less.​ 75% of the states have a​ violent-crime rate that is 526.9 crimes per​ 100,000 population or less. B.The middle​ 50% of all observations have a range of 252.1crimes per​ 100,000 population. C. Since the data​ value, 1,502 is greater than the upper​ fence, 905.05​, this data value would be an outlier. D. .Q2−Q1=386.2−274.8 Q2−Q1=111.4 So:Q3−Q2=.526.9−386.2 Q3−Q2=140.7 SoThen: The difference between Q1and Q2 is quite a bit less than the difference between Q2 and Q3. In​ addition, the outlier in the right tail of the distribution implies that the distribution is skewed right.​

Suppose you are interested in comparing brand A exterior latex paint to brand B exterior latex paint. Best design for this experiment.

​Matched-pairs design because experimental units are paired up and there are only two levels of treatment.

For what lengths will a bolt be​ destroyed? 1. less than< −3(0.20)+11 x < 10.4 2. greater x > 3 (0.20)+ 11 x > 11.6

​Therefore, a bolt will be destroyed if the length is less than 10.4 cm or greater than 11.6 cm.


Related study sets

Biology Chapter 8 Mastering Biology

View Set

Geometry FINAL EXAM- Chapters 1, 2, and 3 Combined!!

View Set

Principle of Business Foothill college Professor Nava Winter 2020

View Set

Astronomy: Solar System Vocabulary

View Set

Symbiotic Relationships of Fungi

View Set