C207 Module 3
Demand = 400 + 50X, where "X" is the price. IF the company wants to sell 1000 units, what should the price be? $5, $8, $12, $20
$12 Linear Regression equation (straight line). Demand=y X=price. IF the company wants to sell 1000 units the demand would be equal to 1000. What value of X gives us 1000? Solve for X.
Enterprise Industries produces Fresh, a brand of liquid laundry detergent. In order to study the relationship between price and demand for the large bottle of Fresh, the company develops the following relationship: Demand = 600 + 50X. What price is needed to produce a demand of 1000?
$8
Which value of R would tell us that our model is complete enough to be accurate? 0.4, 0.2, 0, -0.8
-0.8 R goes anywhere from -1 to 1. The closer to 0 the more we have zero relationship or a model that is not a strong model. The closer we get to 1 or -1 the stronger the relationship. The (-) sign means as one variable goes up the other goes down. Ex. as temp rises coat sales drop.
At a confidence level of 95%, which p-value would cause us to reject the null hypothesis?
0.01 The p-value must be less than 0.05, not equal to.
Z = -1. What is the probability sales will be at our below this level?
0.17
If store sales are $100,000 per month with a standard deviation of $10,000, what is the probability that sales will be less than $70,000?
0.5% IF we know the mean is 100k the SD is 10k we're in a normal curve (bell curve). 70k is 30k less than average and sd is 10k so we are 3SD to the left or lower than our average (mean). IF we are 3SD the center stripe has to have 99% of all events in it. So the other 1% is in the two tails. Half of 1% is 0.5%. 99% of the time we'll be between 70k and 100k in sales.
WHich of these is a possible value of R-Squared? 2.0, 1.5, 0.7, -0.4
0.7 R-squared is R x R so no negative possible answer. IT's anywhere between 0-1
IF the upper tail of bell curve only happens 2.5% of the time, which value of Z would we use? 1.00, -1.65, 1.96, or -2.13
1.96 If the upper tail only happens 2.5% of the time then the other tail also happens 2.5% of the time. So middle stripe must have 95%. IF the middle has 95% we're about 2SD on either side of the mean. 1.96 is the value closest to 2.
A city hospital wishes to evaluate the labor hours it needs based upon monthly occupied bed days and average length of patients' stay. The estimated regression is: y = 2000 + 75X1 + 65X2 What scenario would be predicted if X1 = 500 and X2 = 1000?
104,500
Mary is determining the likelihood that she will lose money on an investment. There is an expected 10 percent gain in a normally distributed dataset, with a standard deviation of 10 percent. The likelihood she'll lose money is _______ percent.
16 This is a little tricky as 68 percent of the dataset will be between a 0 and 20 percent gain. The other 32 percent is above 20 percent and below 0 percent. We care about the half that is below 0 percent, therefore half of the 32 percent outside of one standard deviation. Therefore there is a 16 percent chance Mary will lose money on this investment.
F-critical = 3. For which calculated F value should we NOT reject the null hypothesis? 50, 10, 4, 2
2 Which F tells us we don't have enough evidence to reject the null hypothesis? Would be anything <3 F critical is the target value
Regression equation: Y = 20 + 100X1. IF the independent variable equals 5, what is the forecast? 500, 520, 120, 100
520
Decision Tree
A form of financial modeling that considers different options, and determines the options(s) with the highest profit potential based on a set of specified assumptions and outcome probabilities
Simulation Example
A hospital has a policy of ordering 10 sets of crutches whenever the inventory gets down to 5 sets or fewer. It takes anywhere from 2 days to 2 weeks for an order to arrive. Every now and then the hospital runs out of crutches when there is a high demand. A new re-order policy is needed to minimize running out of crutches, while minimizing inventory (because hospital storage space is scarce).
Decision Tree Example
A non-profit charity wants to determine which of 3 options, that have different resource requirements and possible outcomes, will generate the highest net charitable contributions. 1. an internet campaign 2. a mailing campaign 3. organizing a 5k charity run
Crossover Analysis Example
A school district wants to determine whether it makes more sense to purchase a hybrid powered bus or a diesel powered bus (that differ in the cost to purchase and cost per mile to operate)
Are there differences in average police salaries across 3 adjoining precincts?
ANOVA
North American Oil Company is attempting to develop a reasonably priced gasoline that will deliver improved gasoline mileage. As part of its development process, the company would like to compare the effects of three types of gasoline. Which analysis technique should the company use to compare the performances of each gasoline type?
ANOVA
Three teachers have a bet as to whether any of the teacher's students scored higher than the other two classes. Which tool should they use?
ANOVA
Statistical Process Control Example
An Internet Service Provider monitors whether there are atypical changes in the number of customer complaints it receives
Break-Even Analysis Example
An entrepreneurial artist wants to sell hand-designed t-shirts online. She wants to determine how many she will need to sell to cover her costs and then start making a profit.
Decision Science
Analytics to derive an optimal solution/decision
A company uses time series analysis to develop its product forecast. The forecaster uses simple linear regression but notices that the past data trend is aligned with advertising expenditures. What issue might exist with this regression analysis?
Autocorrelation
Which describes the fact that today's weather is related to yesterday's weather?
Autocorrelation The data points are related to each other than to just the variables. We don't want this in linear regression problems. So we typically make some sort of adjustment and then continue with our analysis. Is the chance of snow today related to the fact that it was 80 and sunny yesterday.
Which tool? Launching a new product, a CEO wants to know how much money they will spend before gross income is positive.
Breakeven Analysis
Should a staffing plan for emergency nurses be adjusted based on time of day?
Chi-Square test
Which analysis technique can be used with hypothesis testing when nominal or categorical data is gathered?
Chi-square
A researcher is evaluating voter turnout based upon age and location of voting precincts. Which analysis technique should she use?
Cluster Analysis
Housing prices are forecast by size of the home and lot and number of bedrooms and bathrooms, which are related to each other.
Collinearity Where the variables are related and the lines are co-linear. They tend to go in the same direction.
Simulation
Computer-based models that simulate real-world situations for different "What if ?" scenarios
Linear regression can help us discover which?
Correlation ( think co-relation.. is one variable related to the other does it help predict, do they move in a related way)
Ski Boards, Inc. wants to enter the market quickly with a new finish on its ski boards. It has three choices: refurbish the old equipment, make major modifications or purchase new equipment. The company has estimated the fixed and variable cost for each option. Which technique should they use to select the least costly option?
Cross Over
Which tool? Customer wants to choose one of three vendors based upon their forecasted demand for next year.
Crossover Analysis (Looking at different volumes which vendor will be the most economic choice at the volume we anticipate having)
The manager is concerned that sales forecast is too high because the economy is slowing down.
Cyclicality The business cycle is the description of the economy going up and down because of recession or for a boom time.
An entrepreneur is thinking about starting an independent gasoline station and considering how large the station should be. The annual return will depend upon the size of station and a number of marketing factors related to the oil industry and demand for gasoline. What analysis technique should be used to evaluate station size given the uncertainty of oil industry and demand factors?
Decision Analysis
these are analytics to derive an optimal solution or decision
Decision Science Analysis Techniques
Statistical Process Control
Derive standards and then monitor whether a process is meeting those standards
Linear Programming
Determine how best to use limited resources (e.g., employee hours, materials, space) to either maximize profits or minimize costs
Linear Programming Example
Determine number of junior and senior nurses to assign to emergency versus in-patient wards based on a series of constraints, in order to control costs while meeting patient needs
Crossover Analysis
Determine the least expensive option, of 2 or more options, at different volumes of need (considers both fixed costs and variable costs)
Break-Even Analysis
Determine the volume at which Total Revenues make up for (equal) Total Costs
Regression & Time-Series Analysis
Do you want to forecast/predict some value?
Statistical Analyses (i.e., Inferential Statistics)
Goal is to make an inference about a larger population based on analysis of a smaller sample
Cluster Analysis helps a marketing department do which?
Group Customers (market segmenting. Putting customers into groups based upon their characteristics)
The temperatures in our sample remains within a consistent range between high and low.
Homoscedasticity (scedasticity - scattering. homo- uniform. High and low range of y (answers) is going to be consistent as we move across our X's. Basically, our data is fairly consistent from low to high. This is what we would look for normally in a linear regression.)
Regression & Time-Series Analysis
Is there a significant relationship and/or trend?
Statistical Analyses
Is there a statistically significant finding (trend, group difference, etc.)?
Name a disadvantage of cluster analysis
It is a long and expensive process. Advantages - It sorts individual data points into different groups. It will not help in determining target markets.
A craftsman builds two kinds of birdhouses. One for bluebirds and one for cardinals. He knows the amount of labor and the units of lumber that are needed for each birdhouse. The craftsman has available 60 hours of labor and 120 units of lumber. Which technique should the craftsman use to minimize cost?
Linear Programming
Which tool? Minimize production costs of two products sharing an assembly line while meeting marketing targets.
Linear Programming ( takes multiple items sharing resources and comes up with the single best answer for the production mix that will give us the best result and still meet all of our goals)
Which tool? Do ice-cream sales go up when the temperature is higher?
Linear Regression (looks at one variable to see if it is related to another).
What factors impact donations to a yoga studio?
Multiple Regression
Which tool? Hosing prices are forecast by size of the home and lot and number of bedrooms and bathrooms, which are related to each other.
Multiple Regression (Using 4 different X variables to determine the Y value, which is the price of the house).
Mechanic averages 10 oil changes a day with a standard deviation of 5. How probable is it to have more than 15 oil changes in a day? 0.17, 0.05, 0.025, 0.01
Normal Distribution problem This is 1SD above the mean and would have 2/3 of all events in it. We're talking about being in the upper tail. So if the two tails have the other 1/3 the upper tail has the other 1/3 so that's about 0.17
United Motors indicates that gas mileage tests of one of their cars, the Starbird 300, under city driving conditions has a mean of 30 mph and a median of 29.9 mph. Which type of distribution would this testing represent?
Normal distribution
Which distribution has a mean and median that are nearly equal?
Normal distribution (bell curve) M, M, and M are nearly the same).
Which reveals the probability of being wrong if we reject the null hypothesis?
P-Value Null hypothesis (uninteresting, boring, result we start with as our assumption) P-Value is the probability that the null hypothesis is true. Typically <0.05.
Is there a perceived difference in the effectiveness of a new versus old employee incentive plan?
Paired sample t-test
Chi-Square helps understand which question?
Police interaction deaths Chi-Square is about looking at different categories of events. - we realize that minorities vs. white interactions with police happen in different quantities. Difference between the frequency of event occurrences. To figure out if events are random or something deeper is going on.
If there is a relationship between variables, but the relationship is not linear, what possible challenge with regression could it be?
Polynomial Regression When there is a relationship but it is not linear, it is a non-linear relationship. Polynomial, exponential, and logarithmic regressions are types of non-linear relationships.
We perform a regression analysis on a pair of variables and determine that there is a linear relationship. The regression line is determined to be y=12x-5. What type of linear relationship exists between the independent variable, x, and the dependent variable, y?
Positive The relationship between the two variables is positive. The dependent variable increases as the independent variable increases.
Which of these uses the normal distribution to watch for unusual occurrences?
Quality Control Looking or things that happen atleast 3SD away from the normal.
A business entrepreneur wishes to predict yearly revenue for potential Sub Restaurant sites. He performs a regression analysis based upon the area's population and business rating. What statistic indicates the strength of the relationship of population and rating to revenue?
R - square
Describe R-squared
R-squared measures the goodness of fit. R-squared can be misleading if there are false independent variables. R-squared shows positive correlation when closest to 1. R-squared shows no correlation when closest to 0. It shows negative correlation when closest to -1.
Is there a relationship between number of training hours for counselors and number of patient incidents requiring physical restraint?
Regression
A car dealership wishes to forecast car sales based upon price discounts and television ads. What forecasting technique should the dealership use?
Regression Analysis
A Boston Hallmark store is preparing a budget for the next year and needs to forecast sales. The store notices variation in sales around holidays. What pattern describes the data to be forecasted?
Seasonality
After traveling down the Mississippi River, barges randomly arrive in New Orleans and are unloaded on a first-in, first-out basis. Any barges not unloaded on the day of arrival must wait until the following day incurring additional cost and negatively affecting customer service. The dock superintendent wants to use analytics to support a request for additional unloading crews. What analysis technique should he use?
Simulation
Which tool? A manager wants to analyze a new type of factory setup without building the factory.
Simulation (a computer model of reality without having to actually build that building)
Which describes the level of variation in a set of data?
Standard Deviation How much it varies around the mean (could also be variance).
Which part of a linear regression is the independent variable?
The X variable The variable that we get to choose the value for it. Dependent value depends on the value of the independent variable).
How would a greater number of samples and a fewer number of populations affect an ANOVA analysis?
The results would be more accurate. The greater number of data points in a data set will more greatly allow for conclusions to be made in an ANOVA output as there is more information about those populations. Also, the fewer the number of populations, the fewer the degrees of freedom.
Is there a trend in the incidence of child abuse in a particular State?
Time series analysis
Which of the following is not a technique a manager uses when forecasting?
Transitive is not an element that a manager uses when forecasting.
A large manufacturer wants to forecast demand for a piece of pollution-control equipment. A review of the past 12 months of sales indicates that sales are increasing. What time series pattern does the sales likely exhibit?
Trend
Chi-Square
are there significant patterns/differences among frequency (categorical) data
Decision Science Examples
break-even analysis, simulation, linear programming
One-sample t-test
comparing 1 mean to a standard
Two-sample t-test
comparing 2 means
ANOVA
comparing 3 or more means
Jane works for a public health company. She is working on an anti-tobacco campaign and is interested in how smoking cigarettes affects a smoker's cholesterol. She could use the number of cigarettes smoked per day as her _____ variable and place it on the __-axis.
independent, x The number of cigarettes smoked is the independent, explanatory variable, while cholesterol can be used as the dependent, response variable. We place the independent, explanatory variable on the x-axis.
A Monte Carlo simulation runs many __________ after _________ have been made about the probability of different outcomes.
iterations, assumptions A Monte Carlo simulation runs many iterations after assumptions have been made about the probability of different outcomes. Although calculations likely have to be taken when helping to determine the probabilities used for different outcomes, assumptions are necessary to run a simulation.
Logistic Regression
regression where dependent variable is binary (e.g., yes/no, treated/not treated, etc.)
Statistical Analyses Examples
regression, t-test, Chi-Square, ANOVA
Regression analysis:
takes information from one data set and can predict information for another data set. Regression analysis determines the relationship between two data sets and can be useful in predicting or forecasting results for a data set based on the other data set.