Business Analytics Final

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is the difference between an outlier on a normal distribution and an outlier on a regression model?

Outliers on a normal distribution refers to the boundary beyond standard deviations away from the mean. An outlier on a regression model would be a boundary of a certain distance as a residual from the line of best fit

How many different classifications of correlation are there?

Positive, negative and no correlation/neutral (fourth one could be does not exist which would be a circular figure)

Interpret the results of a fitted values versus residuals scatterplot

If the points are in a cone shape, it indicates that the data is not suitable to use for regression as it is not a linear correlation. Heteroscedasticity is present- means there is a larger error in a specific area of a scatter plot.

How does increasing the confidence level affect the margin of error?

Increasing the confidence level widens the range of values on the normal curve of distribution encompassed by the confidence interval thus increasing the margin of error

Is regression used to help you understand data or to predict data?

Regression is more used to predict data. However, it is predictions that are within the x axis within the interval. More specifically, it is used to predict the mean of the dependent variable given specific values of the independent variable

As N increases, how does it affect the standard error

Standard error decreases when sample size increases - as the sample size gets closer to the true size of the population, the sample means cluster more and more around the true population mean

What is the difference between standard error and standard deviation

Standard error is the variability across multiple samples of the data while standard deviation is the variability in the data in a single sample. (The standard error is the standard deviations of the means) distance away from the mean with the difference of sample size

When modeling data in time series, why is knowing the units super important? What will this help you interpret?

Units are important as we need to know whether the points we are predicting take place over days, weeks, months or potentially years. Without this information, the predictions we make may not line up with reality. Mean average deviation!

What is the sample size for the normal distribution?

infinity

56. The data set "Weather" contains information on weather conditions for five cities in both 2016 and 2017. We are interested in using the "average" columns for the variables of temperature, dew point, sea level pressure, visibility, and wind speed to predict average humidity in Beijing (both 2016 and 2017). Using the best subsets method, find the most adequate multi-regression model to predict average dew point. a. What predictor variables are included in your best subsets model? b. What statistical measures make this model the best?

** Believe theterm-54re is a typo in the problem, this will use average humidity as the predictor** a. avg_temp, avg_dewpt b. large r 2 value, low difference between r2 adjusted and predicted, mallows cp that is close to the amount of variables, low s value

What does cleaning data mean in statistics?

refers to the process of removing invalid data points from a dataset

As the sample size increases, how does it affect the standard deviation of the sampling distribution?

sample size increases, the standard deviation of the means decreases; and as the sample size decreases, the standard deviation of the sample means increases

when is a type 1 error likely to happen

sample size is really large

The data set "SandwichAnts" contains sample information about 48 sandwiches with different configurations (bread type, filling and whether butter was present) and the amount of ants that had collected to enjoy the sandwich. Does the data indicate that the presence of butter on the sandwich results in a different amount of ants observed on a sandwich? Use 𝛼 = 0.05. a. What is the level of significance? b. State the null and alternative hypotheses. c. What sampling distribution will you use? What assumptions are you making? d. What is the value of the sample test statistic? e. Find the P-value. f. Sketch the sampling distribution and show the area corresponding to the P-value. g. Will you reject or fail to reject the null hypothesis? Are the data statistically significant at level 𝛼? reject the null hypothesis h. State your conclusion in the context of the application. i. Find a 95% confidence interval for μ1 − μ2. Explain the meaning of the confidence interval in the context of the problem.

** NOT SURE BUT I THINK** a. 0.05 b. u1-u2=0, u1-u2 not equal 0 c. t distribution- sigma not known d. t=2.61 e. p=0.012 g. reject null hypothesis, stat. significant h. rejection of null means difference in buttered vs non buttered is stat significantly different i. (2.44, 19.06)

60. Using the data set called "question" (also used on 11/16), run multiple simulations and interpret the results from each of them.

** Question will go off the one off the Midterm 2** In the RISK analysis, the expected profit would be $19,900.99- looking at the indicated mean. The reason in which someone should invest $50,000 into this company as when picking option 3 one would have a 59.4% chance on either braking even or getting higher than it as a profit. With this in mind, it would probably not make sense for an investor to invest using Option A or B. This is because there is a greater percentage chance in the investor losing money or breaking evem (99.6% in option A and 69% in Option B). As a result, it is likely that there would be no profit and a big chance of losing money from the investment

47. The United States Department of Transportation, National Highway traffic safety administration, reported that 77% of all fatality injured automobile drivers were intoxicated. A random sample of 27 records of automobile driver fatalities in Kit Carson County, Colorado, showed that 15 involved an intoxicated driver. Does this data indicate that the population proportion of driver fatalities related to alcohol is less than 77% in kit Carson County? Use alpha level of 1%

** USES APPROX.NORMAL INSTEAD OF EXACT** p=0.004 p value is less than alpha (0.01). Reject null hypothesis, data is stat. sig. different

What are the boundaries for the person r test

-1 to 1

What are the four types of data measurements?

1. Nominal- Can count but not order or measure (name, sex, eye color, etc.) 2. Ordinal- Values can be ranked but not measured (house, numbers) 3. Interval- Numeric scales in which we know both the order and the exact differences between the value 4. Ratio- Defined as a variable measurement scale that not only produces the order of variables but also the difference between the variables known along with the information on the value of the true zero

After running a two sample hypothesis test why would it be a good idea to run a two sample confidence test? What are you looking to confirm or deny? What inference could you make from this?

After running a two sample hypothesis test, creating a two sample CI would be useful in assessing if the outcome of the two sample HT makes sense. if the two sample HT returns a failure to reject (not statistically significantly different) we could check the mean values of our two samples to see if they fall within the two sample HT. if not, we would then investigate to see if this is really a type 2 error

Two simulations models are being run. One is simulating the values 100 times, the other is simulating the values 500 times. Which concept in stats are these simulation models demonstrating? Is one of these models better than the other?

Concept: Central Limit Theorem (average of an average)- 500 times because more samples is more accurate 100 times using the Monte Carlo simulation

What does confidence intervals measure?

Confidence intervals tell us the percentage of likelihood that a population parameter lies within a set range of the data

Assume that x has a normal distribution with the specified mean and standard deviation. Find the indicated probability. (Round your answer to four decimal places.) μ = 15.1; σ = 4.1 a. Find P(10 ≤ x ≤ 26)

Do =Norm.Dist to give the normal distribution for the stated mean and standard deviation or the z.probability for 10 and 26. Subtract the probabilities from each other =0.8893

What is the purpose of dummy variables in regression modeling?

Dummy variables serve the purpose of handling categorical data in regression modeling

What is the meaning when a statistician says these two averages are statistically equivalent?

Even though the averages are numerically different, in terms of statistically context they are considered equal or "equivalent". Difference of the numbers are considered less than significant.

Is it appropriate to use prediction that is outside of the parameter?

It is not appropriate (extrapolation), only appropriate to use a prediction outside of the parameter in a time series analysis

After running a statistical test, you have determined a p value of 0.025. Why would this result make it difficult to report a conclusion of the data?

It would be difficult based on the alpha value. Because the p value is between the alpha values of 0.05 and 0.01, you would get different conclusions. If the p value is greater than alpha, you would fail to reject the null hypothesis. If p value is less than alpha, you would reject the null hypothesis

After running a statistical test, you did not find the result you were looking for. When would it be appropriate to retest the exact sample and add more parameters to it?

Never (with the exact same sample). "Exact" is the keyword to an example like this. If the parameters of the test did not relate to the data at all, then it would be ok to replace parameters with others and retest the exact same sample

When looking at your residual analysis and seeing all of the desirable graphs demonstrate a strong correlation between your independent and dependent variables, is this enough evidence to say your independent variable makes a good predictor for your dependent variable? If yes, state why. If no explain what other factors you need to consider.

No, residuals "only tell a part of the story" giving the error associated with the line of best fit. Other factors need to be considered such as the r squared value, r, data sets themselves, etc.

you want to run a regression analysis to determine if there is a strong correlation between student study hours and exam grades. you run this model in the beginning of the semester when the students you are surveying have taken the exam only once. Your model shows a r strength of 0.5. You survey that exact sample of students after they taken a second exam. You then create a regression model that combines both data sets after taken a second exam. You then create a regression model that combines both data sets when your r value is now 0.8. You now make the claim that there is a strong positive correlation between hours studying and exam grades. Is this statement correct why or why not? Discuss the concept of sample size and correlation between independent variables (collinearity) in your response.

Statement is incorrect, it is unethical to retest the exact same sample and add more parameters. It is also unethical to combine data that are unrelated to each other

What is the purpose of the Central Limit Theorem

The average of an average an infinite amount of times. The central limit theorem tells us that no matter what the distribution of the population is, the shape of the sampling distribution will approach normality as the sample size (N) increases

What is the prediction difference between regression and time series analysis

The prediction difference between regression and time series lies within the x interval. For regression, you are allowed to make predictions within an interval and discussed error (distance from the line). On the other hand in time series, allows you to make prediction for future events. These future events occur outside of the interval and error also changes based on how far in the future the event will occur.

If an observational point is far from the line of least squares, would its residual be a large or small number, relatively speaking?

The residual would be large number relatively speaking. It would be far from the line of best fit

The purpose of a simulation is to make a list of probabilistic outcomes. Is this statement true or false

True, we run the simulation to make a list of probabilistic outcomes over a period of time based on the original variables

how is it possible that two different statisticians can get two different results when performing a hypothesis test on the same data. Both of these statisticians got the exact same p value but yet came to two different conclusions. How is this possible?

Two statisticians had two different thresholds of significance in the form of alpha values. Based on whether the p value was greater or less than the alpha value, the two statisticians would have different conclusions and interpretations

What are the type 1 and type 2 errors?

Type 1: False Positive-When we incorrectly reject a true null hypothesis. Commonly occurs when the sample is too large Type 2: False Negative- When we incorrectly accept a false null hypothesis. This commonly occurs when the sample size used is too small

What values are essential to know in order to find the margin of error?

You need to know the critical t or z score, standard deviation (or sample), and number of observations

Interpret the results of a residual histogram

You want the residuals of a residual histogram to resemble a normal distribution as much as possible, means the data is usable and strong

The price to earnings ratio (P/E) is an important tool in financial work. A random sample of 14 large U.S. banks (J. P. Morgan, Bank of America, and others) gave the following P/E ratios. 24 16 22 14 12 13 17 19 23 11 18 Generally speaking, a low P/E ratio indicates a "value" or bargain stock. Suppose a recent copy of a magazine indicated that the P/E ratio of a certain stock index is μ = 18. Let x be a random variable representing the P/E ratio of all large U.S. bank stocks. We assume that x has a normal distribution and σ = 4.4. Do these data indicate that the P/E ratio of all U.S. bank stocks is less than 18? Use 𝛼 = 0.10. a. What does the sample mean? b. What is the level of significance? c. State the null and alternative hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? d. What sampling distribution will you use? Explain the rationale for your choice of sampling distribution. e. Compute the z value of the sample test statistic. f. Find (or estimate) the P-value. g. Sketch the sampling distribution and show the area corresponding to the P-value. h. Will you reject or fail to reject the null hypothesis? Are the data statistically significant at level 𝛼? i. State your conclusion in the context of the application.

a) 17.1818 b) 10% c) H(0): u=18 H(1): u<18 d) Normal, known sigma e) z=-0.62 (not exactly sure) f) 0.269 g) (Sketch) h) fail to reject- data is not statistically significant different at this level of significance i) in context, this conclusion means that the difference between the true p/e ratio of large US bank stocks and p/e ratio of 18 are not different enough to warrant a conclusion that, based on the sample, all US bank stocks have a P/E ratio less than 18

The data set "SandwichAnts' ' contains sample information about 48 sandwiches with different configurations (bread type, filling and whether butter was present) and the amount of ants that had collected to enjoy the sandwich. Dominic, the student who conducted this experiment, had a friend that bet him $10 that he could not get an average higher than 45 ants on the sandwiches he made. Does the data that Dominic collected suggest that the average ant count was above 45 and that he doesn't lose $10? Use 𝛼 = 0.05. a. What is the sample mean and sample standard deviation? b. What is the level of significance? c. State the null and alternate hypothesis. d. What distribution will you use? Explain the rationale for your choice of sampling distribution. e. What is the value of the sample test statistic? f. Estimate the P value. g. Sketch the sampling distribution and show the area corresponding to the P value. h. Will you reject or fail to reject the null hypothesis? Is the data statistically significant at the alpha level? i. Who wins the bet? Interpret your conclusion in the context of the problem.

a) 43.5, 15.1489 b) 5% c) u=45, u>45 d) t distribution, sigma is unknown e) t= -0.69 f) 0.752 (not exactly sure) g) sketch h) fail to reject the null hypothesis, not statistically significantly different i) Domomic's friend wins the bet. The results of the test conclude that the average number of ants on the sandwiches is not statistically significantly different from 45, which means that the average number of ants cannot possibly exceed 45

58. The data set "Weather" contains information on weather conditions for five cities in both 2016 and 2017. Using the data in this set for Beijing (both 2016 and 2017), answer the following questions. a. Create a histogram of residuals for a regression model built to predict average temperature from average wind speed. What type of correlation does this graph imply? Interpret this graph in context. b. Create a graph that plots the y-hat (fitted values) vs. the residuals for a regression model built to predict average temperature from average dew point. What type of correlation does this graph imply? Interpret this graph in context.

a) very weak- no normal distribution b) homoescadity

54. Do heavier cars really use more gasoline? Suppose a car is chosen at random. Let x be the weight of the car (in hundreds of pounds), and let y be the miles per gallon (mpg). X: 30 42 29 47 23 40 34 52 Y: 31 18 24 13 29 17 21 13 a. Create a scatter diagram displaying the data. What is the r value? b. Find x-bar, y-bar, and the equation of the least-squares line. c. Find the value of the coefficient of determination r 2 . What percentage of the variation in y can be explained by the corresponding variation in x and the least squares line? What percentage is unexplained? d. Suppose a car weighs x = 31 (hundred pounds). What does the least-squares line forecast for y = miles per gallon? (Round your answer to two decimal places.)

a. -0.93 b. x-bar: 37.125, y-bar: 20.75, y= -0.6393x+44.485 c. r squared = 0.85, 85%, unexplained= 15% d. y=24.67

Accrotime is a manufacturer of quartz crystal watches. Accrotime researchers have shown that the watches have an average life of 30 months before certain electronic components deteriorate, causing the watch to become unreliable. The standard deviation of watch lifetimes is 4 months, and the distribution of lifetimes is normal. a. If Accrotime guarantees a full refund on any defective watch for 2 years after purchase, what percentage of total production will the company expect to replace? (Round your answer to two decimal places.) b. If Accro Time does not want to make refunds on more than 6% of the watches it makes, how long should the guarantee period be (to the nearest month)?

a. 0.0668 b. 24 months

A person's blood glucose level and diabetes are closely related. Let x be a random variable measured in milligrams of glucose per deciliter (1/10 of a liter) of blood. Suppose that after a 12-hour fast, the random variable x will have a distribution that is approximately normal with mean μ = 89 and standard deviation σ = 21. Note: After 50 years of age, both the mean and standard deviation tend to increase. For an adult (under 50) after a 12-hour fast, find the following probabilities. (Round your answers to four decimal places.)

a. 0.9163 b. 0.8413 c. 0.7577 d.0.0432

In this problem, assume that the distribution of differences is approximately normal. Note: For degrees of freedom d.f. not in the Student's t table, use the closest d.f. that is smaller. In some situations, this choice of d.f. may increase the P-value by a small amount and therefore produce a slightly more "conservative" answer. Is fishing better from a boat or from the shore? Pyramid Lake is located on the Paiute Indian Reservation in Nevada. Presidents, movie stars, and people who just want to catch fish go to Pyramid Lake for really large cutthroat trout. Let row B represent hours per fish caught fishing from the shore, and let row A represent hours per fish caught using a boat. The following data are paired by month from October through April. Oct Nov dec jan feb march april B:shore 1.6 1.8 1.9 3.2 3.9 3.6 3.3 A:boat 1.5 1.3 1.5 2.2 3.3 3.0 3.8 Use a 1% level of significance to test if there is a difference in the population mean hours per fish caught using a boat compared with fishing from the shore. a. What is the level of significance? 1% b. State the null and alternative hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? Two tailed, difference H0: mu1-mu2=0, H1: mu1-mu2 does not equal 0 c. What sampling distribution will you use? What assumptions are you making? d. What is the value of the sample test statistic? 2.15 e. Find (or estimate) the P-value. 0.075 f. Sketch the sampling distribution and show the area corresponding to the P-value. g. Will you reject or fail to reject the null hypothesis? Are the data statistically significant at level 𝛼? h. State your conclusion in the context of the application.

a. 1% b. u1-u2=0, not equal to 0. two tailed test c. 2.15 d. 0.075 f. sketch g. fail to reject, not stat. significant h. no stat significant difference between fishing from a boat and from the shore

42. Let x be a random variable that represents hemoglobin count (HC) in grams per 100 milliliters of whole blood. Then x has a distribution that is approximately normal, with a population mean of about 14 for healthy adult women. Suppose that a female patient has taken 10 laboratory blood tests during the past year. The HC data sent to the patient's doctor are as follows. 14 18 16 19 11 15 15 16 15 12 a. What is the sample mean and sample standard deviation? Does this information indicate that the population average HC for this patient is higher than 14? Use 𝛼 = 0.01. b. What is the level of significance? c. State the null and alternate hypothesis. d. What distribution will you use? Explain the rationale for your choice of sampling distribution. e. What is the value of the sample test statistic? f. Estimate the P value. g. Sketch the sampling distribution and show the area corresponding to the P value. h. Will you reject or fail to reject the null hypothesis? Is the data statistically significant at the alpha level? i. Interpret your conclusion in the context of the application.

a. 15.1 and 2.4244 b. 1% c. u=14,u>14 d. t-distribution e. 1.43 (not exactly sure) f. 0.093 g.(sketch) h. Fail to reject, not statistically significant different i. In context, failure to reject means the population average HC for this patient is not statistically significant greater than 14

44. A random sample of n1 = 16 communities in western Kansas gave the following information for people under 25 years of age. x1: Rate of hay fever per 1000 population for people under 25 97 94 124 129 90 123 112 93 125 95 125 117 97 122 127 88 A random sample of n2 = 14 regions in western Kansas gave the following information for people over 50 years old. x2: Rate of hay fever per 1000 population for people over 50 92 105 101 97 115 88 110 79 115 100 89 114 85 96 Assume that the hay fever rate in each age group has an approximately normal distribution. Does the data indicate that the age group over 50 has a lower rate of hay fever? Use 𝛼 = 0.05. a. What is the level of significance? b. State the null and alternative hypotheses. c. What sampling distribution will you use? What assumptions are you making? d. What is the value of the sample test statistic? e. Find the P-value. f. Sketch the sampling distribution and show the area corresponding to the P-value. g. Will you reject or fail to reject the null hypothesis? Are the data statistically significant at level 𝛼? h. State your conclusion in the context of the application. i. Find a 90% confidence interval for μ1 − μ2.Explain the meaning of the confidence interval in the context of the problem.

a. 5% b. u1-u2=0, ul-u2 not equal 0 c. t distribution- sigma not known d. t=2.18 e. 0.019 f. Sketch g. reject the null hypothesis, statistically significant h. rejection of the null hypothesis means the data indicate that the age group of >50 has a higher rate of hay fever than the population of <25 i. (2.36, 19.39). 90% confident that the true differences in hay fever in both population groups lies between 2.36 and 19.39

Total plasma volume is important in determining the required plasma component in blood replacement therapy for a person undergoing surgery. Plasma volume is influenced by the overall health and physical activity of an individual. Suppose that a random sample of 41 male firefighters are tested and that they have a plasma volume sample mean of x-bar = 37.5 ml/kg (milliliters plasma per kilogram body weight). Assume that σ = 7.10 ml/kg for the distribution of blood plasma. a. Find a 99% confidence interval for the population mean blood plasma volume in male firefighters. What is the margin of error? (Round your answers to two decimal places.) Lower limit and upper limit? b. What conditions are necessary for your calculations? c. Interpret your results in the context of this problem. d. Find the sample size necessary for a 99% confidence level with maximal margin of error E = 2.50 for the mean plasma volume in male firefighters. (Round up to the nearest whole number.)

a. Margin of error: 2.86 LL: 34.64 UL: 40.36 b. sigma is known c. we are 99% confident that the true mean of ml/kg of blood plasma in firefighters is between 34.6393 and 40.3607 ml/kg d. Find Zc using z table=2.58 e= 2.5 s.d=7.1 zc*sd=18.318 divide by e=7.3272 and then square it =53.687 (54)

The data set "SandwichAnts" contains sample information about 48 sandwiches with different configurations (bread type, filling and whether butter was present) and the amount of ants that had collected to enjoy the sandwich. We are interested in learning specifically more about sandwiches made with rye bread. a. Calculate the mean and sample standard deviation for the amount of ants observed on sandwiches made with rye bread. b. Find a 95% confidence interval for the population average number of ants that can be found on sandwiches made with rye bread. What is the margin of error, lower limit, and upper limit?

a. Mean: 42.25 S SD: 15.8638 b. margin of error: 10.0794 UL: 52.3294 LL: 32.1706

What percentage of hospitals provide at least some charity care? Based on a random sample of hospital reports from eastern states, the following information is obtained (units in percentage of hospitals providing at least some charity care) 56.5 56.5 52.8 65.5 59.0 70.1 64.7 53.5 78.2 Assume that the population of x values has an approximately normal distribution. a. Use a calculator with mean and sample standard deviation keys to find the sample mean percentage x-bar and the sample standard deviation s. (Round your answers to one decimal place.) b. Find a 90% confidence interval for the population average μ of the percentage of hospitals providing at least some charity care. (Round your answers to one decimal place.)

a. Mean= 61.9 Sample standard deviation= 8.5 b. (56.6, 67.1)- find estimated margin and error and plus/minus to the sample mean

The data set "Weather" contains information on weather conditions for five cities in both 2016 and 2017. We are interested in using the average dew point as a predictor for average temperature in Beijing. Using the data in this set for Beijing (both 2016 and 2017), answer the following questions. a. Create a scatter plot diagram displaying average dew point and average temperature. Describe the correlations strength and direction. b. Find x-bar, y-bar, and the equation of the least-squares line. c. Find the value of the coefficient of determination r 2 . What percentage of the variation in y can be explained by the corresponding variation in x and the least squares line? What percentage is unexplained? d. Suppose dew point one day is measured at 60 (x = 60). Is the model you created appropriate for predicting this outcome? If yes, calculate the value of y-hat. If not, explain why

a. r= 0.9176, positive b. x bar: 36.63, y bar: 55.48, y=0.7537x+27.876 c. r squared = 0.842, 84.2%, 15.8% unexplained d. =73.098

59. The data set "Weather" contains information on weather conditions for five cities in both 2016 and 2017. Using the data in this set for Beijing (both 2016 and 2017), answer the following questions. a. Create and interpret a time series plot (time series plot - simple) for the variable of average temperature. Is it best described as trend, seasonal, cyclic or irregular? Does look seasonal, does look cyclic (still peaks and valleys), few outliers, but no trend. b. Run a trend analysis on average temperature. Which model type (linear, quadratic, or exponential) fits the data best? Interpret the c. Using the average humidity variable, create an auto-regression model with two lags. With your model, predict the average humidity for the next day beyond the data set. Interpret your prediction and its strength using r2 in context.

a. seasonal and cyclic b. MAPE 43.8 percent chance there is an error, because it is seasonal MAD on average there is a 17.9 degree difference in the point away from the line c. =45.284%, r squared= 56.11%

48. Historically, about 80% of students should score above 70 on their first exam. Using the data set "Exam_Grades," determine if there is a difference from the above percent and the (one-sample) proportion of students who scored above a 70 using the data from exam 1. Use 𝛼 = 0.05 a. State the null and alternative hypotheses. b. What is the value of the test statistic? c. What is the P-value? d. Sketch the sampling distribution and show the area corresponding to the P-value. e. Will you reject or fail to reject the null hypothesis? Are the data statistically significant at the level 𝛼? f. Interpret your conclusion in the context of the application.

a. u=.8, u is not equal to .8 b. 1.21 c. 0.225 d. sketch e. fail to reject, not stat sig f. not stat sig different

standard deviation

distance away from the mean

57. The data set "Weather" contains information on weather conditions for five cities in both 2016 and 2017. Using the data in this set for Beijing (both 2016 and 2017), answer the following question: The "event" column in this data set notes the occurrence of weather events and specifies what happened each day. You have been asked to convert this "event" column into dummy variables so it can be useful in regression analysis. Using the dependent variable temperature and the independent variable humidity, accompanied with whether an event occurred (no matter what kind) or did not occur, create a multi regression analysis equation and interpret its results.

dummy variable (0)- when the event is NA-: avg_temp= 36.85+0.3325avg_humdity dummy variable(1)- when the event is not NA-: avg_temp= 40.22+0.3325avg_humidity r2= 14.18%

What are the three scenarios we test for hypothesis testing?

left tail, right tail, two tail

The data set "SandwichAnts' ' contains sample information about 48 sandwiches with different configurations (bread type, filling and whether butter was present) and the amount of ants that had collected to enjoy the sandwich. For the column "Ants," calculate the mean and standard deviation, and find the z-scores for each observation. Are there any outliers in this data set? How did you determine whether an observation was an outlier or not?

mean = 43.5 S.D= 15.14 Calculate z-score function with $ under mean and standard deviation and then drag under the column. Calculate outliers with the count function from the z-scores

49. Would you favor spending more federal tax money on the arts? This question was asked by a research group on behalf of the national Institute. A random sample of 220 women 59 reported yes another random sample of 175 men, showed that 56 responded yes. Does this information indicate a difference between the population proportions of men and the population proportion of women who favor spending more federal tax dollars on the arts? Use alpha level of 1%

p value: 0.262 Fail to reject null hypothesis. Men and women are willing to spend the same amount of federal tx money on the arts and therefore gender has no effect on this

when is the t distribution equivalent to the normal distribution

sample size is infinity

why do we use degrees of freedom when sampling

use degrees of freedom to account for human error


Kaugnay na mga set ng pag-aaral

med surg exam 2 review questions 1

View Set

Accounting 231 - Ch. 10 LearnSmart

View Set

Market Pricing - Conducting a Competitive Pay Analysis

View Set

Biology Test - Human Anatomy and Physiology

View Set

Informatics and Quality Improvement

View Set

Chapter 4 Social Perception and Managing Diversity

View Set