bus 215 quiz answers
Based on the information contained in the spreadsheet provided for question 1, as well as the one-factor ANOVA you performed using Excel, and the Tukey's HSD test, if you performed one, what seems to be the correct decision for Widgets International to make regarding the six training programs?
That training program D should be characterized as probably the single best of the options
A sample proportion ______.
ALL THE CHOICES ABOVE: - is the number of successes in n trials - is often expressed as a decimal or a percentage - is denoted p - follows a binomial distribution
To calculate a percentage change between values, we need ______________ data.
ratio
A zero origin is ________.
required to avoid deception in a column chart
The wider the bins used in the frequency distribution, the ________ the data will appear in the histogram.
smoother
Response error can be caused by
social desirability
Statistical bias is ______.
systematic
Selecting every kth item from an ongoing production line is
systematic sampling
A histogram may reveal that a distribution has more than one mode, which might mean that _______.
the sample contains data from more than one population
Multivariate linear regression, unlike bivariate linear regression, ______.
uses multiple predictors
When the sample mean equals 25 and the sample standard deviation equals 5, the standardized value of 15 equals ______.
-2.00
Using the attached data set for questions 7 - 9, what is the probability that a customer liked the product and will buy it in the next 30 days? Link: https://blackboard.stonybrook.edu/bbcswebdav/pid-6025322-dt-content-rid-52939750_1/xid-52939750_1
.1600
A histogram is a _______.
column chart of the bin frequencies
When formatting a cell as a fraction, what is the maximum number of digits that can be displayed?
Three
If the p value is < .01 and alpha = .01, should you accept the alternative hypothesis?
Yes
The cumulative frequencies are _______.
a running total of the bin frequencies as a percentage of the sample size
Using the empirical rule, the approximate percentage of observations expected to lie within the interval of z = -2.00 to z = -1.00 equals ______.
13.59%
Confidence intervals ______.
ALL OF THE CHOICES ABOVE: - are sometimes called interval estimates - become narrower when n increases - make use of probability to manage random sampling error - are constructed around estimators
Continuing with the previous question ... based on the data collected from their sample, how confident should the company be that their process either is or is not operating within its specification for variation in the time required for the process?
Approximately 99.95% confident
Why do I (your professor) not use the pooled proportions when testing for zero difference?
Because I think it is an unnecessary statistical power grab based on a false premise
Why do we not calculate the P(X = x) for a continuous random variable?
Because it would be zero for any value of x
What is an easy way to transpose a long list of numbers?
Copy & Paste Special
When formatting a cell as an absolute reference or address, you
Highlight the reference in the formula and Press F4
Hardcoding a variable means
Including the value of a variable in a formula
What hypothesis test should be used to test H(sub 1): x(sub 1) - x(sub 2) < 0
Left tailed, two-sample test of means (independent samples)
The standardized value of x when the population mean is 25 and the population standard deviation is 2 is ______.
NONE OF THE CHOICES ABOVE: - 14.7500 - 14.7500 - 23.00 - 27.0000 - All of the choices above
What hypothesis test should be used to test H(sub 1): sigma^2 > sigma^2(sub 0)
Right tailed, one-sample test of variances
To what does the phrase "Garbage in, garbage out" refer?
The effect the quality of data has on decisions made informed by an analysis of those data
The value of a specific variable for a specific observation is
a datum
The closing values of the S&P 500 Index for each trading day in 2020 are what type of data?
Time Series
A bakery randomly surveyed 100 of its customers and found that the average quality score given to its chocolate chip cookies was 83 with a standard deviation of 7. The bakery changed its chocolate chip cookie recipe to result in a softer cookie and then a month later randomly surveyed 113 of its customers and found the average quality score given to its new chocolate chip cookies was 88 with a standard deviation of 9. Can the bakery be 90% confident that the increase in the quality score was not due to random sampling error?
Yes
A paired sample test of means should be used to test the hypothesis statement H1: d<0
True
What hypothesis test should be used to test straight H(sub1) : sigma^2 (sub 1) / sigma^2 (sub 2) > 1
Two-sample test of variances
Assume, for the sake of this question, that the data were collected through a well-designed, well-implemented random sampling method. The marketing department of a widget manufacturer collected potential consumer preference data regarding a proposed widget upgrade. Three hundred thirty-eight of 575 respondents reported preferring the proposed new widget. The widget manufacturing company had established a threshold of 60% preferring the proposed new widget to move forward with producing the new widgets. At a .10 level of significance, was the threshold probably met?
Yes
In hypothesis testing, the test statistic vs. the critical value method ______ the p value vs. alpha method.
always results in the same decision regarding the null hypothesis as
The limitations of statistics include ...
analyst bias.
To what did the following quote from the instructor's notes refer? "There are lies, damned lies, and statistics."
analyst ethics.
VIF is used to ______.
determine the extent of correlation among all the predictors within a regression model
The response scale [Lowest 1 2 3 4 5 Highest] will yield ____________ data.
interval
The classical method of assigning probability ______.
is based on theory or logic
A one-tail, one-sample hypothesis test ______.
is more powerful than a two-tail, one-sample hypothesis test
We use probability to _____.
manage random sampling error
Descriptive statistics include _______.
measures of center, variability, and shape
Use Excel and the attached data set to run a regression and make any necessary additional calculations to fill in the blank spaces. Use a 95% confidence level (.05 alpha) for all calculations requiring that information. Do not use any rounded values in any of your calculations. Round all your answers to 4 decimal places. Do not include any leading zeros in your responses for numbers that cannot be greater than 1. If the value is discrete, enter it as a whole number. Link: https://blackboard.stonybrook.edu/bbcswebdav/pid-6094071-dt-content-rid-55423686_1/xid-55423686_1 The intercept (was/was not) significant. For the rest of your responses, assume that the intercept was significant, even if it was not. Approximately _____% of the variation in widget sales is explained by the regression. There are ___ high leverage observations in the data set. There are _____ unusual residuals based on the empirical rule. The prediction interval margin of error is ____ when the value of the independent variable is 450. Y(hat) is ____ when the value of buggy whip sales is 618.
was not 69.9115 2 1 121.1047 569.4093
Random sampling is used to minimize/eliminate
bias
If the success rate for any binary random process is 43%, on average, how many trials will be needed to achieve the first success?
approximately 2.33
If two random variables X and Y are independent and the standard deviation of X = 2.31 and the standard deviation of Y = 2.94, what is the sum of these two standard deviations?
approximately 3.74
If the minimum is 25, the mode is 35, and the maximum is 41, what is the P(X < 40)
approximately equal to .9896
A discrete random variable is defined ______, whereas a continuous random variable is defined ______.
at each point, over an interval of points
When evaluating a scatter chart, a dot pattern that has a positive slope indicates _____.
that the predictor and response variables have a direct correlation
If the estimate being tested is less than the benchmark, you should conduct a ______ one-sample hypothesis test.
left tail
Data analysts use one-sample hypothesis tests to ______.
manage random sampling error
We use probability to ______.
manage the uncertainty caused by random sampling methods
Probability is used for ______.
managing uncertainty
Conditional formatting is powerful because
. It enables rapid visual analysis of a data set
The µ and σ for a uniform continuous distribution with a minimum value of 115 and a maximum value of 183 are _____________.
149.0000 and 19.6299 respectively
The value of the population mean when z is 1.22, the population standard deviation is 3, and x is 22 is ______.
18.3400
The uniform discrete probability distribution is used _____.
ALL OF THE CHOICES ABOVE: - when we know that the probability of each possible outcome is identical - when we have no reason to believe that any outcome is more likely than any other outcome - when we use Excel to generate random numbers with equal probabilities of occurring
The data are from a small moving company that moves people from urban apartment buildings to other urban apartment buildings in the same or different cities. The company needs assistance in predicting the labor hours for jobs so they can provide more accurate price quotes. Run two separate regressions using the data in the sections of the spreadsheet titled Regression Model 1 and Regression Model 2. Use a 95% confidence level. Fill in all the blanks below. Do not round any numbers except for when entering them in the blanks for this quiz, round these numbers to exactly 2 decimal places even if the number has only one or no decimal places. Use commas to separate thousands (e.g., 1225 should be entered as 1,225.00). Regression Model 1: R Square expressed as a percentage rounded and carried out to two decimal places is _____% and Adjusted R Square expressed as a percentage rounded and carried out to two decimal places is ____%. Refer to the spreadsheet embedded in your Module 14 Regression document I provided you to see the calculation and format for answering this next question. The gap between R Square and Adjusted R Square expressed as a percentage rounded and carried out to two decimal places is 0._____%. The overall regression model (was/was not) significant. Using square footage of 1,500 and mileage of 92, the model estimates that _____ hours of labor would be required, with a margin of error of ____hours. Regression Model 2: R Square expressed as a percentage rounded and carried out to two decimal places is _____% and Adjusted R Square expressed as a percentage rounded and carried out to two decimal places is _____%. Refer to the spreadsheet embedded in your Module 14 Regression document I provided you to see the calculation and format for answering this next question. The gap between R Square and Adjusted R Square expressed as a percentage rounded and carried out to two decimal places is 0.____%. The overall regression model (was/was not) significant. Using square footage of 1,500, mileage of 92, and floors of 71, the model estimates that ____ hours of labor would be required, with a margin of error of _____ hours. Model Selection Because the R Square is (higher / lower) in regression model 1 than in regression model 2 and the gap between R Square and Adjusted R Square is (smaller / larger) in regression model 1 than in regression model 2 and the margin of error of the estimated labor hours is (smaller / larger) in regression model 1 than in regression model 2, we would characterize regression model (1 or 2) as being the better model.
Answered in order: Regression Model#1 - 95.98 - 95.94 - 04 - was - 101.32 - 7.29 Regression Model#2 - 99.40 - 99.39 - 01 - was - 105.99 - 2.83 Model Selection - lower - larger - larger - 2
Why is it important to construct an interval estimate for y(hat)?
Because y(hat) is an estimate and is subject to random sampling error
VIEW LINK: https://blackboard.stonybrook.edu/bbcswebdav/pid-6084844-dt-content-rid-55027522_1/xid-55027522_1 Use Excel to perform a one-factor ANOVA at alpha = .05 and Tukey's HSD test if appropriate. Fill in all the blanks in the tables below. Round each of your entries to exactly 4 decimal places (Do NOT use any rounded numbers in any calculations). Do not include any leading zeros. If the HSD test was not appropriate, enter N/A or n/a in each of the cells for the HSD test values. Single-Factor ANOVA ANOVA T critical
Count A: 46 Sum A: 172.3644 Avg A: 3.7471 Var A: 4.7948 Count B: 46 Sum B: 166.5200 Avg B: 3.6200 Var B: 3.4129 Count C: 46 Sum C: 179.7789 Avg C: 3.9082 Var C: 3.9770 Count D: 46 Sum D: 245.8539 Avg D: 5.3446 Var D: 8.0413 Count E: 46 Sum E: 169.5528 Avg E: 3.6859 Var E: 5.5999 Count F: 46 Sum F: 198.6803 Avg F: 4.3191 Var F: 3.6631 SSB: 99.3723 df B: 5 MSB: 19.8745 F: 4.0438 P: 0.0015 Fcrit: 2.2474 SSE: 1327.0063 df E: 270 MSE: 4.9148 SST: 1426.3786 df T: 275 Tcritical: 2.99 B vs A: 0.2748 B vs A sig: No C vs A: 0.3487 C vs A sig: No D vs A: 3.4560 D vs A sig: Yes E vs A: 0.1322 E vs A sig: No F vs A: 1.2376 F vs A sig: No C vs B: 0.6235 C vs B sig: No D vs B: 3.7309 D vs B sig: Yes E vs B: 0.1426 E vs B sig: No F vs B: 1.5124 F vs B sig: No D vs C: 3.1073 D vs C sig: Yes E vs C: 0.4809 E vs C sig: No F vs C: 0.8889 F vs C sig: No E vs D: 3.5882 E vs D sig: Yes F vs D: 2.2185 F vs D sig: No F vs E: 1.3698 F vs E sig: No
Linear regression is used for ______.
NONE OF THE ABOVE: - Using a dependent variable to predict the value of an independent variable - Calculating the population correlation coefficient - Calculating the sample correlation coefficient - Calculating leverage statistics - All of the above
The probability of a random variable is _____.
NONE OF THESE: - .0000 - 1.0000 - .5000 - .2500 - All of the choices above
If, for a specific randomly selected sample, the correlation coefficient for advertising expenditures (X data) are correlated (r = .77) with sales (Y Data), we may conclude that ______.
NONE OF THESE: - investing in advertising increases sales - investing in advertising decreases sales - investing in advertises has no effect on sales
The Empirical Rule tells us that ______.
Nearly all of the observed values of X are expected to lie within 3 standard deviations of the mean in a normal distribution
What hypothesis test should be used to test H(sub 1): d > 3
Right tailed, two-sample test of means (paired samples)
Why would you use an absolute cell reference in a formula?
Using an absolute reference allows you to fill a formula down while still pointing to the absolute reference
Using the attached data set for questions 7 - 9, what is the probability that a customer will buy the product if the customer was indifferent to it? Link: https://blackboard.stonybrook.edu/bbcswebdav/pid-6025322-dt-content-rid-52939766_1/xid-52939766_1
approximately .7059
In a two-sample hypothesis test, D0 is equivalent to the ______ in a one-sample hypothesis test.
benchmark
Mutually exclusive events ______.
can be binary
Using the frequency distribution provided in the attached spreadsheet, create a probability distribution and indicate which are the correct approximate values for the mean and standard deviation. Link: https://blackboard.stonybrook.edu/bbcswebdav/pid-6033325-dt-content-rid-53318037_1/xid-53318037_1
4.42 and 1.81
A sampling distribution ______.
ALL OF THE CHOICES ABOVE: - may be assumed to be normal if n is at least 30 - may be assumed to be normal if the underlying population is normal - is a distribution of sample statistics - is the probability distribution of all the possible values an estimator may assume when a random sample of n items is taken
The subjective method of assigning probability ______.
ALL OF THE CHOICES: - is not scientific - is subject to personal bias - is subject to systematic bias - is subject to error
Sample statistics ______.
ALL THE CHOICES ABOVE: - are efficient, unbiased estimators for their respective population parameters - are random variables - tend to be nearer their respective population parameters as n increases - are used to make inferences about their respective populations
The standard error of the mean ______.
ALL THE CHOICES ABOVE: - is a measure of random sampling error - is the standard deviation of a sampling distribution - decreases as n increases - is part of the calculation for the margin of error
Assume that a process has a company-established specification for variation set at 4 minutes. Assume that a well-designed, well-implemented random sample of 150 process time measurements was collected. The mean time was 97.3 minutes, with a standard deviation of 4.77 minutes. Using the company established significance level of .07, is the process probably operating within its company-established specification?
No
Boxplots display ______.
a sense of center, variability, and shape
You can round the value of a cell by
Using the Round formula
The test for normality for the distribution of proportions ______.
is required because we are using the normal approximation to calculate a confidence interval
The response scale [1. Spruce 2. Hemlock 3. Maple 4. Oak 5. Cedar] will yield ___________ data.
nominal
The Central Limit Theorem ______.
tells us that all sampling distributions become approximately normal as n increases regardless of the shape of the underlying population
The best decision makers ...
use the appropriate analysis of data they have satisfied themselves are valid as a tool to inform their decision making.
If the sample distribution is skewed strongly to the right, the mean _____.
will tend to be larger than the median
Assume that the following data were generated using a valid and appropriate methodology. XYZ Data Analytics, Inc. received properly completed surveys from 523 members of the target population for a new product. One hundred seventy nine of those respondents indicated they would buy the new product at the proposed price. Based on these data, construct an interval estimate of the population proportion of the target market that would buy the new product at the proposed price using a confidence level of 92.5%. The lower boundary of the interval estimate is [lower] and the upper boundary is [upper]. Round your answers to four decimal places, and do not use leading zeros. The margin of error for the interval estimate is [moe]. Round your answer to four decimal places, and do not use a leading zero. Assuming the same p would result and keeping the confidence level at 92.5%, the sample size required to assure a margin of error of no more than plus or minus 2 percentage points is [n]. Assume that the population proportion is .5, round your answer to the appropriate level, and use a comma to separate 000s, e.g. 2,445 instead of 2445.
Specified Answer for [lower]: 0.3053 Specified Answer for [upper]:0.3792 Specified Answer for [moe]: 0.0369 Specified Answer for [n]: 1,982
A randomly selected sample data set with n of 75 has a mean of 25.00, a median of 25.50, a range of 10, a standard deviation of 5, an IQR of 4, a skewness coefficient of -0.25, and a CV of ______.
20.00%
If an organization?s webservers have an at activation reliability of 95%, what number of independent backup servers will be needed to reach an uptime percentage of 99.999%
3
You can...
All of the above: - change the row height - hide a row - insert and delete a row
How are the Currency and Accounting number formats different?
Both: - The currency symbol is flush to the number in the currency format and flush left in the accounting format - A zero value is represented as a dash in the accounting format and a 0 in the currency format
Statistics ...
Both: - is the process of generating or collecting, organizing, analyzing, interpreting, and presenting data. - are calculations made from sample data.
Why is it so important that we understand the concept of the PDF in a continuous probability distribution if we never use it to calculate the P(X = x)?
Because we must keep in mind that our calculations will be wrong to some extent if the shape of our data distribution differs from the shape defined by the PDF of the model we are using
You can...
Both: - Insert a column - hide a column
Using the attached data set for questions 7 - 9, are the events Will Not Buy and Indifferent independent, and why? Link: https://blackboard.stonybrook.edu/bbcswebdav/pid-6025322-dt-content-rid-52939774_1/xid-52939774_1
No, because the conditional probability of A is not equal to the probability of A
Which data are not continuous or treated as continuous?
Number of defects in production runs
In a two-sample hypothesis test for zero difference, what is the probability that the difference in the two estimates is not due to sampling error?
One minus the p value
How are one-sample hypothesis tests different from confidence intervals?
One-sample hypothesis tests construct intervals and partial intervals around benchmarks instead of intervals around estimators.
The aspect ratio is ________.
a potential source of deception if it is not approximately 1.67
Deconstructing a process into its component parts and determining how those parts relate to each other and the overall purposes of the process is an example of ...
analyzing.
If the average defect rate (# of scratches) for a randomly selected desk is 5, what is the probability of observing more than 7 scratches on a randomly selected desk?
approximately .1334
If the average defect rate (# of scratches) for a randomly selected desk is 5, what is the probability of observing 7 or more scratches on a randomly selected desk?
approximately .2378
If there are 23 defective items within a population of 1,000 items, what is the probability of observing 3 or fewer defects in a sample of 50 items? (Note: You may not use one distribution as an approximation for another distribution.)
approximately .9758
The P(X ≤ z) when the population mean is 25 and the population standard deviation is 2 and x is 22 is ______.
approximately equal to .0668
If the average time between events is 4 months and the event occurred today, what is the probability that is will take more than another 2 months for the next event to occur
approximately equal to .6065
The binomial distribution _____.
can be used to predict the number of successes within a fixed number of trials
Independence between events ______.
cannot occur if the events are mutually exclusive
If the estimate is 29 and the benchmark is 28, because of the probability that the estimate suffers from some random sampling error, it would be logical to ask ______, or is the difference probably due to random sampling error?
is the estimate probably greater than the benchmark
In each of the two sample size equations introduced in this module, E ______.
is the maximum allowable margin of error
The critical value ______.
is the standardized value associated with alpha or alpha/2
Descriptive statistics are _______.
numerical descriptions of data
A random process ______.
produces events that are variable and unknown ahead of time
By using random sampling methods, the analyst demonstrates a bias toward
random sampling error
Study a sample rather than a population when
the act of testing/studying destroys the item being tested/studied
Kurtosis refers to _______.
the amplitude of the distribution peak
Our preliminary analysis of the data we just collected is intended to provide us with insight into ________ of the data set.
the center, variability, and shape
You would reject the null hypothesis if ______.
the p-value is less than alpha
If the data are skewed left, _______.
the mean is dragged to the left of the peak within the distribution
The mean should not be used when ______.
the sample distribution is very skewed
In a given randomly selected sample of 50 items, the kurtosis coefficient is -0.77, which indicates that ______.
we may be 90% confident that the distribution has normal kurtosis
A false positive in one-sample hypothesis tests occurs when ______.
you reject the null hypothesis when it is true