exam 3 qmb

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Use the Unemployment Data Set that is posted in the instructions. This data is the unemployment rate in the US from 2016/01/01 until 2020/02/01. Use JMP and find the autoregressive model with four lags. Are they all significant at an alpha level of 0.10? Select the correct response from the options below. Remember to remove any insignificant variables one at a time.

Only lag 1 and 2 are significant To compute this model, use Fit Model. Enter the employment rate for y and enter the first four lags for x.

What method is used in neural networks to transform the data?

Propagation - transforming the data into a "S"-shaped curve

Given the following situation, determine if the data is rank data. A NFL Stadium manager wants to compare the Customer Satisfaction levels of two different stadium locations. He randomly selects 30 customers from Stadium A and Stadium B, and he compares the results. The Customer Satisfaction Survey measures responses as "Highly Satisfied", "Satisfied", "Neither", "Unsatisfied", and "Highly Unsatisfied." Is this rank data?

Rank data Definition Rank data is data that has a distinctive order, but the difference between each item in the ranking may not be the same. Example If you consider the rankings of products on Amazon, a four-star product is not twice as good as a two-star product. However, the four-star product is certainly better than the two-star.

Answer the following True/False statement about Data Mining. Data Mining cannot solve un-asked business questions.

true

A farmer is going to plant some carrots in his open plot. He has a 30% chance to have a good yield, 50% chance for an average yield, and a 20% chance for a bad yield. The payoffs for a good, average, and bad yield are $6,000, $4,000, and $2,000 respectively. What is the expected value?

$4,200 idk bruh watch the video module 23

A farmer wants to decide if he should plant corn or lima beans. No matter what he plants he will get either a good, average, or bad crop yield. For corn, a good yield will give him a profit of $7,000, average will profit $3,000 and, and a bad yield will give a profit of $1,000. For lima beans, the profit for good, average, and bad yields are as follows: $5,000, $3,500, and $2000. For either crop, the probability of good yield is 0.4, the probability of average yield is 0.5 and the probability of a bad yield is 0.1. What are the outcomes for lima beans?

$5,000, $3,500, $2000

Determine if each data set qualifies as a time series data set: Sales per year of cars at a dealership Count of social media views per week Rank of SEC football teams in the AP Poll Average market rate or return

- Time series data - Time series data - Not time series - Not time series

Suppose a sales manager wants to compare different sales promotions. He chooses 5 different promotions and samples 10 random stores for each different promotion. The F value is 3.4. Using JMP, find the correct p-value.

.0163 For this problem, you have a numerator df = k-1=5-1. The formula for the denominator degree of freedom is N-k. In this problem N = 5*10 = 50, since they randomly sample 10 stores for each promotion. 10 store with promotion 1 10 stores with promotion 2 10 stores with promotion 3 10 stores with promotion 4, 10 stores with promotion 5 _______________ 50 stores This calculation gives you a sample size of 50. The denominator df equals 50-5=45. Use the distribution calculator to find the probability.

Using the Q_22_Jacksonville_Sp2020_style.xlsx data file, examine the relationship between house price and square footage. Calculate Kendall's Tau. Round your answer to two decimal places.

.55 JMP see photos

MODULE 20: Suppose the family wise error rate is 0.05, if there are ten levels of one factor but only 5 comparisons are of importance, what would the individual error rate need to be for each of those 5 comparisons?

0.01 Suppose the family-wise error rate is 0.05, if there are ten levels of one factor but only 5 comparisons are of importance, what would the individual error rate need to be for each of those 5 comparisons? 0.05/5=0.01

A grocery store company wanted to know how well some of their local stores were doing. In order to find out, they hired three different reviewers to rate 10 local stores. The Test statistic was 2.3, what is the p value?

0.9858 This test is a Friedman man's block test. The degrees of freedom are k-1 where k = the number of treatments. The test statistic has a Chi-squared distribution. THe p-value is the area shaded greater than the test statistic

Using the Q19_Sales file in the instructions, conduct a One Way ANOVA test to determine if there is a difference in mean sales at the Kitchen Hardware Stores in Atlanta, Boston or Chicago. Enter the F test statistic to two decimal places.

0.9909 Use Analyze> Fit Y by X. Enter the Y and X variables. Select Okay. Click on the red triangle, select Means/ANOVA.

A professor at the University of Florida wanted to determine if offering video tutorials for the course software would increase student engagement. The engagement ratings are below for a random sample of 5 students before and after implementing the course change. Ratings were on a scale between 0 and 50. The higher scores translated to higher student engagement score. When rating, use the "1" to signify the lowest values. Student Before After 1 25 35 2 20 40 3 31 39 4 44 43 5 47 49 What is the test statistic for the Wilcoxon Signed Rank Test?

1

For Bonferroni's method, given 6 levels of a Factor, how many comparisons are there?

15

Nineteen closing prices of Apple stock (in the Spring of 2020) are reflected in the table below: $284 $284.27 $289.91 $289.80 $291.52 $293.65 $300.35 $297.43 $299.80 $298.39 $303.19 $309.63 $310.33 $316.96 $312.68 $311.34 $315.24 $318.73 $316.57 With JMP, find the predicted closing price for time period 5 for simple exponential smoothing with optimized alpha.

289.8 Enter the values into a column in JMP. Select Analyze > Specialized Modeling > Time Series. Click on the column of data that you entered and select Y, Time Series. OK. Click on the red triangle next to Time Series at the top of the new window. Select Smoothing method > Simple Exponential Smoothing. On the new window, click Estimate. On the page that is showing the output, click on the red triangle next to Model: Simple Exponential Smoothing (Zero to One). Select Save Columns. Look at row five under Predicted.

A doughnut shop wants to determine if there is a difference in donut sales at different times of the day and for different types of doughnuts. They are open in the morning, afternoon, and night, and offer the following flavors: vanilla, chocolate, red velvet, and marbled. There were a total of 48 sales recorded. The shop conducted a two-way ANOVA test and found an F test statistic for Flavor of 14.87. What would be the numerator degree of freedom for the F test statistic to determine if the factor flavor was significant?

3 Four flavors - 1?

Below are the unemployment rates for a few months in 2019, find the moving average where L = 5 and t = 5 (2019-11-01). Date 2019-07-01 2019-08-01 2019-09-01 2019-10-01 2019-11-01 Unemployment Rates 3.7 3.7 3.5 3.6 3.5

3.6 This question is asking for the simple moving average for the last time point - 2019-11-01. When you are finding the moving average, start with the value for time period t (3.5 in this case) and move back until you have L values. Take all of these values and add them together and divide by L (5 in this case).

A large firm that owns a chain of grocery stores wanted to see if they could increase the average number of patrons who visit by implementing a rewards member program. To test this, they randomly chose half of their 60 locations and started the rewards program only at those locations, while the other half of locations did not receive the rewards program. Which of these answers represents an element of replication in this study?

30 locations Replication is the number of observational units for each treatment

Given the following table, find MSTR. Source df Sum of Squares Mean squares F P value Treatment 3 907.2 ? ? ? Error 27 ? 54 Total 30

3024 Sums of Squares Treatments Estimates how much the means vary This value gets bigger just by adding another group, so we need to use the mean square treatments instead.

Below is a time series table of the closing stock price for Apple Inc starting with Feb. 20th. Find the smoothed value for the last value in the table. Use the simple moving average non-centered method for L = 3. Date Closing Price February 20, 2020 $320.30 February 21, 2020 $313.05 February 24, 2020 $298.18 February 25, 2020 $288.08

303.44 WRONG ANSWER THIS IS WRONG Similar Example Below is a time series table of the closing stock price for Apple Inc starting with Feb. 10th. Find the smoothed value for the last value in the table. Use the simple moving average non-centered method for L = 3. Round your answer to the nearest cent. Do not include the '$' sign. Date Closing Price March 15, 2021 $123.99 March 16, 2021 $125.57 March 17, 2021 $124.76 March 18, 2021 $120.53 The smoothed value for the last time point would be (125.57 + 124.76+120.53) / 3 = 123.62

The closing prices for 5 days in early 2020 for a share of Apple Stock are reflected in the table below: Closing Price 309.63 310.33 316.96 312.68 311.34

310.02 Enter the values into a column in JMP. Select Analyze > Specialized Modeling > Time Series. Click on the column of data that you entered and select Y, Time Series. OK. Click on the red triangle next to Time Series at the top of the new window. Select Smoothing method > Simple Exponential Smoothing. On the new window, change the box that says "Constraints" from "Zero to One" to "Custom". Change the box next to Level from "Bounded" to "Fixed". Enter 0.4 for the Fixed Weight. Estimate.

A large company wants to determine if there is a difference in the expenses between their locations in three cities and what type of delivery method they use (FedEx or UPS). Using the ANOVA table given below, find F for City. Enter the data to one decimal place. Source DF Sum of Squares Mean Squares F City 22478 Delivery method Interaction Error 56 17,842 Total

35.2 To find the F test statistics, first find MSA. MSA = SSA / a-1 Then, find MSE. MSE = SSE /df_error Finally, F = MSA / MSE

A Data Analyst at the NFL created a lag 2 auto regression model for the league average Yards per Carry (YPC) for all Running Backs. The data ranges from 1990 to 2019. The table below lists the league average Yards per Carry (YPC) from the past 5 years. year 2014 2015 2016 2017 2018 2019 YPC 3.75 4.80 4.76 4.56 5.11 5.31 Term Estimate Std Error Intercept 0.0533 0.06 Lag 1 0.668 0.25 Lag 2 0.28 0.05 Find the predicted value of Yards Per Carry for 2018. Round your answer to two decimal places

4.43 Suppose that the slope for lag 2 is equal to 0.35. This will value will vary and yours may not have been equal to 0.35. For this problem, we are asked to predict for 2018. So, lag 1 would be for the year 2017. lag 1 = 4.56 lag 2 = 4.76 So, our formula would be yhat = 0.0533+0.668(4.56) + 0.35(4.76) = 4.765 or 4.77

A Data Analyst at the NFL created a lag 2 auto regression model for the league average Yards per Carry (YPC) for all Running Backs. The data ranges from 1990 to 2019. The table below lists the league average Yards per Carry (YPC) from the past 5 years. year 2014 2015 2016 2017 2018 2019 YPC 3.75 4.80 4.76 4.56 5.11 5.31 Term Estimate Std Error Intercept 0.0533 0.06 Lag 1 0.668 0.25 Lag 2 0.33. 0.05 Find the predicted value of Yards Per Carry for 2020. Round your answer to two decimal places.

5.19 Suppose that the slope for lag 2 equals 0.27. Please note that your value may be different. Lag 1 = 5.31 (from 2019 - 1 lag behind 2020) Lag 2 = 5.11 (from 2018 - 2 lags behind 2020) Yhat = 0.0533+0.668*5.31+0.27*5.11 = 4.98

Given the following scenario, rank the data for the Wilcoxon Rank Sum. A manager is looking at the ratings of two different customer service lines at the Telecom company where he works. Service Line A and Service Line B each received different ratings from 5 randomly selected customers. Answer the following question about the ranks. A+ is the highest grade, with D- being the lowest grade. Make your lowest letter score be ranked as "1." Service Line A Service Line B A+ A- B- B C+ B+ A- A+ B D- What is the numerical rank for a B+?

6

Using the data set Q21_Jacksonville_Sp2020_style, perform a test to determine if there is a difference in the median square footage of homes with different styles (traditional, townhome, condo, ranch). Using JMP, what is the test statistic for the Kruskall Wallace Test? Give your answer to two decimal places.

6.77

A grocery store company wanted to know how well some of their local stores were doing. In order to find out, they hired 4 different reviewers to rate 8 local stores. What are the degrees of freedom?

7

MODULE 22: A grocery store took a customer satisfaction survey at three different locations. Based on the ratings of the survey, find T1 for Friedman's Test for a Randomized Block Design. Rate the highest as "1". Survey Store 1 Store 2 Store 3 A 3 7 2 B 5 8 7 C 9 8 9 D 6 7 8

9.5

For the following scenario, what is the correct decision tree? A restaurant is trying to decide if it should hire worker A or worker B. For each worker, they can have either a high productivity or a low productivity. With a high productivity worker A can serve 15 tables an hour, but only 8 with a low productivity. For worker B, it is 14 and 9. D

A see photos

Wilcoxon Signed Rank: A Friedman Block Test: B Kendall's Tau: C Spearman's Rho: D

A: To determine if the ratings of a new plagiarism software are better than Turnitin. Each software was rated by 10 teachers. B: To compare the ratings of three restaurants by 7 different food critics C: to determine the monotonicity between GPA and socioeconomic status in 50 cities in the United States. D. To measure the association between two variables

Given the following ANOVA table, what is the correct conclusion? Source df Sum of Squares Mean of Squares F P value Treatment 2 35.733 17.867 4.086 0.022 Error 57 250.45 4.394 BLANK BLANK Total 59 286.183 BLANK BLANK BLANK

At least one of the population means are different One way ANOVA lets you test to see if at least one population mean is different from the rest. Small p-values -> evidence at least one mean is different. Large p-values -> there is no evidence at least one mean is different.

If we wish to better model quarterly trends, which of the following time series methods should be consider?

Autoregression Model

Which methods are best when you are dealing with seasonal data? Mark all that apply

Autoregressive models Regression based models

In One Way ANOVA Tests, which steps could be taken to check the nearly normal condition?

Check the boxplots of each group (level). Check the histogram of each group (level) NOT A funnel shape

In what way is a random forest similar to bagged trees?

Each tree is made by sampling with replacement

The following is a valid null and alternative hypothesis for a Kruskall Wallace Test. A shoe company wants to know if three groups of workers have different salaries. Ho: at least one salary center is different. Ha: the centers for salary are all the same.

FALSE Ho: The centers for all groups are the same. Ha: At least one of the centers is different.

If there is significant interaction, we look at the main effects and also look at multiple comparisons for every level of each factor.

FALSE If the p-value is small, reject HO, interaction is significant. In this case, you can't look at the main effects of Factor A. Look at the conditional multiple comparisons. Look at the individual comparisons of the combinations of the treatments. If the p-value is large, fail to reject Ho, interaction is not significant. You can look at the effects of each of the factors. Look at the marginal effects for Tukeys.

Identify the factor and levels in the following scenario: The Hyppo (a gourmet ice pop store) is trying to test out new dip flavors in which customers can dip their popsicles (strawberry, orange, and banana). All of the popsicles are made the same, utilizing the same simple chocolate flavoring for experimental purposes, but the dips are randomly applied. Twenty regular customers are given the popsicles with the dips on them in a random order, and are asked to rate them from 1-3.

Factor = dip flavor. Levels = strawberry, orange, and banana

Use the data set monthly banana imports to the USA in the table below. By hand, find the smoothed value for time period 2 (1907-08-01) using simple exponential smoothing with alpha = 0.4. Round to the nearest whole number. Date 1907-07-01 1907-08-01 1907-09-01 Bananas 4529 4074 3322

For this question, you are finding the smoothed value for time period 2. With SES, you always have to start with the smoothed value for time period 1 and work your way to the value of interest. smoothedy1= 4529 smoothedy2= α(y2)+(1−α)smoothedy1 smoothedy2= 0.4(4074)+0.6(4529)=4347

Use the data set Unemployment rate, below. By hand, find the smoothed value for time period 3 (1948-03-01) using simple exponential smoothing method with alpha = 0.2. Date 1948-01-01 1948-02-01 1948-03-01 Unemployment Rate 3.4 3.8 4.0

For this question, you are finding the smoothed value for time period 3. With SES, you always have to start with at the smoothed value for time period 1 and work your way to the value of interest. smoothedy1=3.4 smoothedy2=0.2(3.8)+0.8(3.4)=3.48 smoothedy3=0.2(4.0)+0.8(3.48)=3.584

For the following scenario, would you utilize a Wilcoxon Sign Rank or Friedman's Rank test? A researcher wanted to compare the ratings of three different video games. Each game was given 8 ratings by 8 reviewers.

Friedman's Rank Test Wilcoxon Sign Rank test is used when you have paired data - two measurements for each observational unit (case). Friedman Rank Test is used when you have blocked data - three or more measurements for each observational unit (case).

Using the data set Q21_Pe_Ta_Sp20, perform a test to determine if there is a difference in the centers of price of homes between houses in Tampa and in Pensacola. Using JMP, what is the z test statistic for the appropriate test? Give your answer to two decimal places. Make sure to include a negative sign if appropriate.

Go to Analyze > Fit Y by X. Enter Y as the response variable and X, Factor as the city. Select ok. On the new window, click on the red triangle. On the drop-down box, select non-parametric. Then select Wilcoxon Test. The results are listed under the heading for "2 sample Test, Normal Approximation".

What is the correct alternative hypothesis for the Kruskall Wallace Test?

Ha: At least one of the centers is different

In Friedman's Test for a Randomized Block Design, what is the correct alternative hypothesis?

Ha: Not all the medians are equal

A professor at the University of Florida wanted to determine if offering video tutorials for the course software would increase student engagement. The engagement ratings are below for a random sample of 5 students before and after implementing the course change. Ratings were on a scale between 0 and 50. The higher scores translated to higher student engagement. Student Before After 1 25 35 2 20 40 3 31 39 4 43 44 5 47 47 What is the alternative hypothesis for Wilcoxon Signed Rank Test?

Ha: the students were generally more or less engaged

A professor at the University of Florida teaches three courses during the semester. Assume for this scenario that the professor's students take all three courses at the same time. After the semester, the professor sends 5 randomly selected students the course evaluations for the three courses. The course ratings are below on a scale of 1 to 10, with 1 being the lowest and 10 being the highest. What is the correct null hypothesis?

Ho: The median course ratings are the same

What is the correct null hypothesis for the Wilcoxon Rank Sum Test?

Ho: The two samples come from the same distribution.

In Friedman's Test for a Randomized Block Design, what is the correct null hypothesis?

Ho: all the medians are equal

What is the correct null hypothesis for a One way ANOVA test with three different groups?

Ho: μ 1 = μ 2 = μ 3

Select the best scenario for when you might want to know specific information about a black box model

If you need to determine if a car owner is likely to submit an insurance claim.

Select the best scenario for when you might NOT want to know the black box model.

If you need to predict the number of packages shipped through the mail each week.

If a modeling method is described as a black box, what does this indicate?

In a black box model, the equation that evolves is not interpretable.

control

In an experiment it is important to make sure that extraneous variation is controlled, so the experimenter needs to make sure that experimental conditions are the same for each participant. Also there often needs to be a group that received the standard or no treatment, this treatment is called the control. The group that receives it is the control group.

What is the correct definition of interaction?

Interaction is when the effects of one factor are NOT similar across all levels of the other factor

What type of model is best to use when there is a consistent percentage increase in the data?

Multiple regression based model

observational vs randomized experiments

Observational Studies Data is not collected in an organized or controlled experiment. An analyst might find that there is an association between variables, but they are not able to show causation. Randomized, Comparative Experiments The experimenter applies treatments or manipulates the variables and observes the results. These treatments are assigned at random

FACTORS:

Often in an experiment, we have multiple variables that have different settings. For example, we might have different temperatures (80, 85, and 90 degrees) and different humidity levels (10%, 50%, and 80%) that products are produced under at a factory. The factory may want to determine the best settings to reduce the number of defective pieces. The variables that we are changing are called the factors. In this case, TEMPERATURE and HUMIDITY are the factors. However, the settings of these factor is called the levels. So, 80, 85, and 90 would be levels of TEMPERATURE and 10%, 50% and 80% would be levels of HUMIDITY. Each product in the experiment would have a particular temperature and humidity setting. The combination of these together is called a treatment. For example, 90 degrees and 10% humidity would be a treatment. The recipients of the treatments are called subjects (if they are people) or observational units (if they are not).

When examining boxplots, which of the following characteristics would tell us that there was a violation of one of the assumptions for One Way ANOVA

One of the plots had a range that was three times the other plots. The assumptions for ANOVA are the following: Equal Variances Random Sampling or Random Allocation Nearly Normal Independent Groups (Only two of these can be checked with boxplots.)

Single Blind: Double Blind:

The participants do not know who is receiving the treatment The participants and experimenters do not know who is receiving the treatment

Suppose you ran a multiplicative regression model of log10(price in $1000s) vs time (years) and the log10(price in $1000s) per year was 0.11. Interpret the slope.

The price increased by 28.8% per year typically. Similar problem: Suppose that the slope is equal to 0.09. (Note: This is not the value in this problem.) To work out this question, first raise the slope to the power of 10. 10^0.09 = 1.230 Then take the amount greater than 1, so .230. Then, change it to a percent. So, the price has increased by 23.0% per year typically.

Suppose you ran a additive regression-based model of price vs time (years) and the point estimate for change in price per year was 1140. Interpret the slope.

The price tended to increase by 1140 dollars per year. The key to this question is that this is an additive model. For additive models, the slope is interpreted as "b is the increase in y for a one-unit change in x." For a multiplicative model, the slope is interpreted as "b is the percent increase in y for a one-unit change in x"

When looking at an interaction plot, how can we tell if interaction may be present?

The slopes of the two lines cross or may cross at some point

Given the following multiple comparison analysis, select the best conclusion.

There is no significant difference between any of the population means for each of the levels look at Upper CL and Lower CL If the confidence interval includes 0, you would say that the levels weren't significantly different. If the confidence interval doesn't include 0, you would be able to say that the levels were significantly different.

A large firm that owns a chain of grocery stores wanted to see if they could increase the average number of patrons who visit by implementing a rewards member program. To test this, they randomly chose half of their 60 locations and started the rewards program only at those locations, while the other half of locations did not receive the rewards program. They also collected information the amount of money spent on these extra customers. Which of these answers represents an element of control in this study?

They had a group that opened at the standard pricing structure

The Paralyzed Veterans of America receives donations every year. They want to understand which of its donors are more likely to donate again, so they take a random sample of 1400 donors and compile them into the PVA_small.xlsx data set. They use the response variable as Time.Between.Gifts. The factors of interest are: Own.Home., Num.Children, Sex, and Total.Wealth. What is the second split? (Note: You don't get the split probability when the variable is continuous. You can continue with the analysis but skip the step to select "Show the split probabilities. )

Total.Wealth>=2, Total.Wealth<2 splits - see module 23 help forum

What are the key differences between trends and seasonal components in time series data?

Trends are a consistent pattern (either linear or curves) that approach either a negative or positive direction; Seasonal Components are fluctuation in data that occur around the same time every period.

Determine if the following are valid null and alternative hypotheses for a Wilcoxon Rank Sum Test. Is there a difference in time for Army versus Navy recruits to finish an obstacle course? Ho: Army and Navy recruits have the same distribution Ha: The distributions are different for Army and Navy recruits

True

For a time series for simple moving average, a larger period will be smoother than a shorter period.

True

In simple exponential smoothing, an alpha value of 0.2 is considered smoother than an alpha with a value of 0.5.

True

Kendall's Tau is used to measure monotonicity, which is the degree to which a relationship trends in one direction.

True

Interaction is when the effect of one factor depends on the level of the other factor.

True If you have interaction you have a "difference of differences". The effect of one-factor changes as you move along the levels of the other factor. If you don't have interaction you have a "similarity of differences". The effect of one factor doesn't change as you move along the levels of the other factor.

In terms of Big Data and Data Mining, what is the correct definition of variety?

Variety refers to the different types of data. Big Data tends to have many different types of data.

A sales manager wanted to determine if increasing sales commissions by 5% would increase employee satisfaction. Her analyst determined the p-value was 0.17. (Use α=0.05 .) Based on the above output, what is the correct conclusion for the Wilcoxon Signed Rank Test?

We do not have evidence that there is a difference in population median employee satisfaction rating before and after the commission change.

What is the correct definition of a random forest?

When we construct a multitude of decision trees at training time and output the class that is the mode of the classes or mean prediction of the individual trees. Every time a split is made only 10% of the predictors are considered.

In which one of the four scenarios would you consider a non-parametric test?

When we don't have quantitative data, but rather could have survey responses such as "Strongly agree", "Agree", etc. You would use a non-parametric test when the data features outliers or heavy skewness or when you have rank data.

What is the correct definition of bagging?

When we fit a model with training data, then fit a sample with replacement and fit another tree, and repeat that process many times.

NOTES

Wilcoxon Sign Rank test is used when you have paired data - two measurements for each observational unit (case). Friedman Rank Test is used when you have blocked data - three or more measurements for each observational unit (case). Kendall's tau is used to measure monotonicity. Spearman's rho is used to measure association by looking at the ranks instead of values.

Given the following scenario, would you use the Wilcoxon Rank Sum or the Mann Whitney Test? Comparing the ratings of new young-adult television show between high school students and college students. There were 15 high school students and 15 college students.

Wilcoxon rank sum test

Imagine you are performing a Two-Way Anova test, and you find a p-value for interaction of 0.0001. What is the correct interpretation of this p-value?

With a p-value less than all alpha levels, we reject the Ho. Do not look at main effects. Look at multiple comparisons for each level of each factors If the p-value is small, reject HO, interaction is significant. In this case, you can't look at the main effects of Factor A. If the p-value is large, fail to reject Ho, interaction is not significant. You can look at the effects of each of the factors.

Below is the normal quantile plot and histogram of the residuals when a One WAY ANOVA test was conducted on the sales at the Spice Store at the three locations. The sales are randomly selected daily sales totals in thousands

Yes, it appears that the errors are normal because the points follow a fairly straight line in the middle of the quantile plot.

A PE coach at a high school wants to determine if there is a significant difference between the height of students and their time in a 40-yard dash (in seconds). He inputs all his students' names into a software package that randomizes the list of players. He then tells the software to blindly pick his sample size of forty. Does this scenario meet the Independence Assumption?

Yes, the coach made a good effort to randomize the sampling. (Hint: Independence is related to how the data is collected. Is the data randomly collected or the treatments randomly allocated?)

A internet research firm gathers data on billions of Google search queries across thousands of categories. Often times analysts must clean the data because it is inaccurate or messy. A typical data set of queries may have a minimum of 100,000 rows of data. The research firm's main revenue source is its consulting operation, in which analysts and associates consult major firms in many different industries, including transportation, retail, manufacturing, and technology/software. The data consists of text, quantitative variables, categorical variables, video and others. These firms have high demands and standards, so the data must be analyzed quickly. Does the scenario above meet the four qualifications of big data?

Yes, the data meets the volume, velocity, variety, but not veracity requirements.

Macy's wants to see if it can increase its revenues from customers. They randomly select 4,000 customers and send either a coupon for 25% off a purchase of $50+, a coupon for $10 off any purchases, or an email with no offer. They also collect information on the customers gender. Is there blocking in this experiment? And, if so, what is the element of blocking?

Yes: Gender

A company is trying to decide if it should hire more workers or an automated order machine. Is this an action, or a state of nature?

action

For a time series for exponential smoothing, which of the following is the smoothest?

alpha = 0.3

Why should we use non-parametric tests? Choose the one FALSE answer.

because they are more powerful than parametric tests (FALSE) True statements: because when you are only using ordered data because when there is no linearity because when you have quantitative data but assumptions of normality are not met

In the CRISP Model for Data Mining, what is the first step in the cycle?

business understanding

Determine if the following is a OLAP, Classification, or Regression Model. Dollar General is planning on sending out a $5 off coupon to its email subscribers. Based on past behavioral patterns, predict what region of town has the most customers that will use the coupon.

classification model

A model handles outliers well, handles many potential predictor variables easily, and sorts through predictor variables repeatedly to divide the data into groups to best determine the response variable.

decision tree

In boosting, does misclassified data get a higher weight or lower weight?

higher

The Paralyzed Veterans of America receives donations every year. They want to understand what type of gift they might expect, so they take a random sample of 1400 donors and compile them in to the PVA_small.xlsx data set. They use the response variable as Current Gift The factors of interest are: Own.Home., Num.Children, Income, and Previous Gift Here are the results for the neural network output. What is the pattern of the profiler for Previous Gift?

line going up - positive linear big increase

Use the Unemployment Data Set that is posted in the instructions. This data is the unemployment rate in the US from 2016/01/01 until 2020/02/01. Find the SLR model for predicting the unemployment rate based on month using the multiplicative model. (Use log in JMP which is actually the natural log.)

log(unemployment)_hat = 12.16-0.0000000030x Pages 12 - 13 in the notes should help with this topic. To get the regression model, 1.) Find log(unemployment rate). 2.) Use fit by x to find the least-squares regression equation . Y = log(unemployment rate) and x = date.

The Paralyzed Veterans of America receives donations every year. They want to understand what type of gift they might expect, so they take a random sample of 1400 donors. They use the response variable as Current Gift Here are the results for the neural network output. (Use the results below. Don't run a separate analysis.) What is the pattern of the profiler for Num of Children?

mostly flat line = flat

A model is an automatic, flexible, non-linear regression model, not interpretable, and retains all predictor variables although it may limit the effects of some.

neural network

A manager at an Apple store was interested in seeing whether or not customers actually used the Apple products that are on display at his store (such as iPhones, Apple Watches, and Macbooks). To find out what percentage of customers actually used the display products, he counted each customers who came in on a day and marked whether or not they used a product. What type of study is this?

observational study

Below is a screenshot of the Bivariate Fit of the Monthly Closing Price for Disney's stock over the last 10 years (2000-2019). Q_TS_definition_StockPrice_Trend.JPG Select the best answer choice that correctly describes the data from about January 1, 2012 to January 1, 2016. Graph going upwards

positive trend

Using the gas price data and autoregression model given below, predict time period 5. yhat = 0.985 + 0.417lag1 + 0.057lag2 + 0.292lag3 - 0.022lag4

time period 1 2 3 4 5 gas price 3.85 3.85 3.87 3.87 ? 3.86 For this type of question, you have to be careful with the lags. Lag 1 =3.94 Lag 2 = 3.93 Lag 3 = 3.92 Lag 4 = 3.91 Plug the lags into the question. yhat = 0.496 + 0.415*lag1 + 0.161*lag2 + 0.176*lag3 + 0.125*lag4 yhat = 0.496 + 0.415*3.94 + 0.161*3.93+ 0.176*3.92+ 0.125*3.91 = 3.9425 The predicted value for time period 5 is 3.9425.

A large company that owns a chain of gyms wanted to see if they could increase the membership in their locations by hiring full time personal trainers to work in the gyms (they do not currently employ any personal trainers). To test this, they randomly chose half of their 120 locations and hired personal trainers to work there, while the other half received no personal trainers. They also collected information on the gender of the people who had memberships to each location. Which of these answers would be the best way to randomization in this study?

use a computer program to select the locations who hire personal trainers and those that don't Randomize Treatments should be randomly assigned to participants. OR Patients are randomly selected from the population

Use the Unemployment Data Set that is posted in the instructions. This data is the unemployment rate in the US from 2016/01/01 until 2020/02/01. Use JMP and find the autoregressive model with two lags.

yhat = 0.0753881 +0.6278427lag1 + 0.139379lag2 WRONG ANSWER THIS IS THE WRONG ANSWER

Use the Unemployment Data Set that is posted in the instructions. This data is the unemployment rate in the US from 2016/01/01 until 2020/02/01. Find the SLR model for predicting the unemployment rate based on month using the additive model.

yhat = 49.21 - 0.0000000125x An additive model is using simple linear regression methods with y and x as we did in the earlier material. Use Fit Y by X.


संबंधित स्टडी सेट्स

Chapter 5 Review: Forming a Business

View Set

AP Psychology--Human Development Quiz

View Set

Chapter 36- Management of Patients With Immune Deficiency Disorders PrepU (CC3 Immunity 2)

View Set

Chapter 5 Quantitive Primary research

View Set