Eco 230 Final Review
In business analytics, what is the traditional level of significance that we compare our p-values to?
0.05 or 5%
Suppose you have data on the dollar amount of monetary donations given by individual UWL alums during the annual "telethon" in which alums are called and asked to contribute to scholarship funds for UWL students. What would be the scale of measurement of this data?
Ratio
What level of measurement is the variable Distance in Miles?
Ratio Right. This is a continuous variable and 0 would make sense - it would tell us that two cities are 0 miles apart. This would be sensible for Minneapolis and St Paul if instead of measuring from their city centers we were measuring from their closest points.
Suppose you have data on the results of the annual "telethon" in which alums are called and asked to contribute to scholarship funds for UWL students. The gender, graduation year, and state of residence of donors are all:
variables
True or False: When hypothesis testing, we provide statistical evidence that the null hypothesis is true.
False
True or False: Algorithms are less biased than humans because they use data in place of human judgements.
False
True or False: Publicly available data without any personal identifiers presents no risk to human subjects.
False
True or False: Removing names and identifiers like Social Security numbers from a data set will make the data anonymous.
False
True or False: Research that involves risk to human subjects should never be conducted.
False
True or False: Under the Common Rule, activities that are categorized as practice (not research) are required to be reviewed by an IRB.
False
True or false: an observational study allows us to find a causal relationship between treatment and our outcome
False
Identify the "jargon" that best describes the critique your colleague is making in the scenario below. Your company has just test marketed a new product here in La Crosse. The results of the test are very positive; the product sold well and customer feedback surveys indicated the new product was well liked. You are now considering whether to launch the product throughout all of your Minneapolis, Milwaukee, and Chicago stores at once or complete another test market. A colleague argues she is concerned the tastes and preferences of La Crosse consumers may differ from consumers in larger metros, so she would prefer to complete another test market in one of those larger metros first.
Generalizable
You work for a large healthcare system and you are using data on patients to try to improve their use of preventive healthcare services. In your data, you find patients who have insurance coverage provided by their employer are more likely to have received recommended preventive healthcare services than patients who buy their health insurance themselves.Which of the following is a confounding factor you should be concerned about when interpreting this relationship?
Income
You are doing an analysis of Facebook posts and want to test if photo posts have on average the same number of impressions as statuses. What is the appropriate statistical test?
Independent Samples t
CCN and ActMedia provide a television channel targeted to individuals waiting in supermarket checkout lines. They are planning content based on the assumption that the mean time a shopper stands in a supermarket checkout line is 8 minutes. Their data analysts have suggested they study wait time to see whether this target of 8 minutes is correct. Sample data on waiting time was obtained from grocery stores and they estimated an average wait time of 10.45 minutes. Suppose you run the appropriate test and get a p-value of 0.2. Can we say that on average, people do not wait in line 8 minutes?
No, we fail to reject the null hypothesis of an average 8 minute waiting time
A Madison area employer is preparing to hire for a new business analyst position. Suppose in a different report, they find a confidence interval for salary of $48,745 - $66,985, a smaller interval than the one before. Which of the following could be a reason why this confidence interval is smaller than the previous one?
(Select all that apply) New data with less variance Lower confidence level Larger sample size
On October 23rd, data journalism website FiveThirtyEight had Democrats ahead on the generic ballot for the 2020 elections by 6.3 points. Suppose they have constructed a 90% confidence interval which is: [D +6.3 ± 5] In this case _______ is the point estimate and _________ is the margin of error
+6.3; 5
Please select the appropriate test for this scenario: You are constructing a sample of consumers from across the 4 main regions of the US: Northeast, Midwest, West, & South. You want to test how your sample's regional distribution compares to the country as a whole. If you know what proportion of the US lives in each region, what test would you use?
Chi Square
Please select the appropriate test for this scenario: You have constructed a sample of consumers from 3 generations: Boomers: people aged 55-73Gen X: people aged 39-54Millennials: people aged 23-38 For each age group you have constructed the average amount they spend on shelter a month:shelter=rent + mortgage + utilities + insurance You want to test if the average spending on shelter is the same across the three groups. Which test do you use?
ANOVA
Please select the appropriate test for this scenario: You have been asked to help design a bonus plan intended to motivate sales persons throughout your company. Your company operates nationwide. You'd like to design something simple where any salesperson who meets or exceeds annual sales of $450,000 will earn a bonus. However, your colleagues are concerned that setting the same goal nationwide may not be appropriate because some regions are tougher than others. So, you decide to look at last year's sales data for your 568 salespersons by region (Northeast, Midwest, South, West). You want to know whether average sales appear to depend on region.
ANOVA Step 1: I am estimating means for each region (4 in total). Step 2: I want to know if there is a difference in average sales across regions - so I'm looking to compare means across 4 groups. Step 3: ANOVA H0: µ1 = µ2 = µ3 = µ4 HA: µ's are not all equal.
Justice can be established by ensuring the benefits of the research are distributed: (Check all that apply)
According to need. Equally.
Recognition of individual autonomy can be addressed in research by:
All of these Gaining informed consent from individuals participating in research. Protecting persons with limited autonomy. Ensuring participation is voluntary and individuals can opt out at any time.
You are watching a business presentation and the presenters have just displayed a bar graph depicting average total sales by quarter. The most recent quarter is $1.3 million below the prior quarter, and $1.1 million below the average of the past three quarters. The presenters are arguing this trend is troubling and requires action. Your colleague asks them whether these differences in sales are statistically significant. Why is this an important question to ask?
Because these differences could be due to random chance.
In the Belmont Report, beneficence means:
Benefits of research must be measured against risks.
Which of the following hypotheses is NOT testable?
Consumers prefer to shop online. Right. This is a hypothesis about a single variable and does not include a constant to evaluate it against. Revising it to say "The majority of consumers prefer to shop online" would work because "The majority" implies a constant of 50% that the estimated proportion would be compared to.
A business research question is actionable if it:
Can be answered in a satisfactory way using available resources. Yes. Actionable research questions are those that you have the time, resources, and ability to address, though you may need to make some assumptions or the results may be subject to some limitations.
Sometimes factories measure whether or not there was a work stoppage during a shift. Stoppages occur when something goes wrong and the line must be stopped. Picture the data that would go in the spreadsheet. Is the work stoppage variable categorical or numerical?
Categorical
Will the data that is collected using this survey question be categorical or numerical? What is your major? ______________
Categorical Right. Your major is a category - like the car brand example from the video.
Will the data that is collected using this survey question be discrete or continuous? What is your hourly wage? _________________
Continuous
During a time and motion study in a factory, precise measurements of the time it takes to complete a particular step in the production process are collected. Are the resulting data discrete or continuous?
Continuous Right. The time could be measured down to fractions of a second. It could take on an infinite number of values. Just like the example about body weight from the video.
Identify the "jargon" that best describes the critique your colleague is making in the scenario below. In your afternoon meeting, business analysts have just displayed a graph showing sales are 14.7% higher on average in stores with smaller square footage. They are recommending your company consider smaller footprints for new stores as a way to improve sales and reduce costs. A colleague asks: "I'm concerned we're missing the influence of population density in the city where these stores are located. Couldn't we be seeing higher sales because there are simply more people, and smaller stores because our properties are generally smaller in bigger cities?"
Correlation and Causation
Which statistical test would you perform if you were interested in testing the relationship between two different variables from the same sample?
Correlation test
Please select all of the Research Objectives listed below that align with the following research question aimed to provide information to a large hotel chain: "What resources do our customers use to make hotel booking decisions?"
Estimate the proportion of consumers who consult online customer reviews when booking a hotel. Compare the share of bookings made through third party sites (e.g. Travelocity or Hotels.com) to the share who book directly through the hotel's website.
Gather, organize, tabulate, and depict data to then describe the characteristics about what is being studied.
Descriptive
Will the data that is collected using this survey question be discrete or continuous? How many credits are you taking this semester? _______________
Discrete Although this question is asked open ended there are only a few valid response options (1 through 18 credits). The resulting data will be discrete.
Of the following research questions, which are all aimed to help a small business set their hours of operation, which has the narrowest scope?
During what hours are our highest numbers of consumer transactions occurring? Yes. This research question also appears as part of another one but that research question expands the scope to an analysis of revenue, too. This may be appropriate and even more interesting to the business, but would require more time and analysis. So this question is narrower in scope.
Please select the appropriate test for this scenario: Your firm currently focuses its recruitment on two area schools: University of Wisconsin -La Crosse and University of Wisconsin - Eau Claire. You're hiring students to work in your health analytics department and basic data analysis skills are really important. Anecdotally, it seems your La Crosse hires have had stronger skills than your Eau Claire. You decide to test this more formally because if this is true you may decide to just focus all of your recruitment in La Crosse. You take all employees hired in the last 6 months and have them complete an assessment test. The hires from Eau Claire had an average score of 82. The hires from La Crosse had an average score of 93. How should you use inferential statistics to help make your decision?
Independent Samples t 1) I have estimated two average assessment scores for two groups (Eau Claire and La Crosse) 2) I want to compare these averages to each other to determine whether La Crosse scores are higher. 3) Independent samples t H0: µ1 = µ2. HA: µ1 ≠ µ2
A business research question is interesting if it:
Informs a business decision of interest to key stakeholders. Correct. Business research should produce insights that have a clear application to decisions key stakeholders need to make.
IRB stands for:
Institutional Review Board
What level of measurement is the variable Year?
Interval This is a quantitative variable that we can compute meaningful differences from, but it does not have an absolute 0 and ratios of the values would not be meaningful.
Why is proper annotation of code used in quantitative research a good ethical practice and business practice?
It facilitates replication of work which can both save money and limit faking of results.
You are writing a biography on a local politician to be published in a national magazine. In the process you are interviewing her friends and family for information on the politician's childhood. Does this work count as research that would require an IRB review before collecting potentially sensitive personal information on humans?
No this is not research
Your firm has had big problems with employee retention. You believe it was because employees were highly dissatisfied with management. You have employees take an annual satisfaction survey and ask the question: "On a scale from 0 to 100 with 100 being exceptionally good management and 0 being very poor, please rate your direct supervisor". Historically, responses have averaged about 45. Not encouraging. You decide that a managerial training program might help and you pilot this program among a sample of 25 managers. You then survey their employees and obtain an average response to the question about of 65. Suppose you run the appropriate statistical test and get a p-value of 0.3. Did the program work?
No, we fail to reject the null hypothesis of no improvement
Suppose employee records capture each employee's gender. What type of data is this?
Nominal
What level of measurement is the variable RouteID?
Nominal Right! Although this variable has numbers it is an ID number - or the use of a number in place of a name. For example, Route ID = 1 may indicate observations for a flight from Minneapolis to Rapid City.
CCN and ActMedia provide a television channel targeted to individuals waiting in supermarket checkout lines. They are planning content based on the assumption that the mean time a shopper stands in a supermarket checkout line is 8 minutes. Their data analysts have suggested they study wait time to see whether this target of 8 minutes is correct. Sample data on waiting time was obtained from grocery stores and they estimated an average wait time of 10.45 minutes. What test should they run in this situation?
One Sample t
Please select the appropriate test for this scenario: Your firm has had big problems with employee retention. You believe it was because employees were highly dissatisfied with management. You have employees take an annual satisfaction survey and ask the question: "On a scale from 0 to 100 with 100 being exceptionally good management and 0 being very poor, please rate your direct supervisor". Historically, responses have averaged about 45. Not encouraging. You decide that a managerial training program might help and you pilot this program among a sample of 25 managers. You then survey their employees and obtain an average response to the question about of 55. You want to know if the program worked. How would you use inferential statistics to answer this question?
One Sample t 1) I have estimated 1 average employee satisfaction score from 1 sample of managers 2) I want to compare it to our historical average of 45. 3) One sample t test of a mean. H0: µ = 45. HA: µ ≠ 45
On October 23rd, data journalism website FiveThirtyEight had Democrats ahead on the generic ballot for the 2020 elections by 6.3 points. Suppose they have constructed a 90% confidence interval which is: [D +6.3 ± 5] Suppose instead of a 90% confidence interval, they want to use a 95% confidence interval. Compared to the original, how would the new confidence interval be different?
Only the margin of error would be larger
If a survey includes a question with a scale of agreement [strongly agree / agree / disagree / strongly disagree], what is the resulting data type?
Ordinal
Imagine a fixed alternative question meant to measure age of respondents to a customer satisfaction survey. The question and response options are as follows: What is your age? 20 or younger 21 - 59 60 or older If you had the data collected from this question as a part of the customer satisfaction survey, what would be the scale of measurement of the data?
Ordinal
Let's say we find a new source of information on these routes that provides us with a measure of how often flights are on time. However, instead of getting something like the percentage of flights that are on time, we get a "traffic light" metric with values Green (meaning most flights are on time) Yellow (indicating moderate delays) and Red (indicating most flights on this route are delayed). What would the level of measurement for this new flight be?
Ordinal Yes. This variable has a natural order but since we only get the colors we can't compute any difference measure to learn about how many more on-time flights a particular route tends to have.
Identify the association among the variables and then predict the likelihood of a phenomenon on the basis of identified relationships.
Predictive
Suggest a course of action.
Prescriptive
A manufacturing firm uses performance targets to award bonuses to their factory employees. These targets set a number of products per hour employees should produce. Employees who meet or exceed the target get a bonus. Employees are unionized and the union has complained that this practice is inappropriate because employees feel they work equally hard one hour as the next but their performance data for those hours will be very different. Which aspect of measurement is this complaint based on?
Reliability
The extent to which a measurement procedure yields consistent results
Reliability
Keeping the algorithm to evaluate New York teacher's performance secret limited
Replication
Why are replicability and confidentiality sometimes at odds?
Replication is easiest if the data are made public, but making data public may risk exposing private information.
Target's use of consumer shopping patterns to identify pregnant women potentially violates the ethical principle of:
Respect for Persons
Imagine a fixed alternative question meant to measure age of respondents to a customer satisfaction survey. The question and response options are as follows: What is your age? 0-21 21 - 59 60 or older What is a potential weakness of this question?
Response options are not mutually exclusive
The Belmont Report was commissioned in response to which of these?
The Tuskegee Study
You have constructed a sample of consumers from 3 generations: Boomers: people aged 55-73Gen X: people aged 39-54Millennials: people aged 23-38 For each age group you have constructed the average amount they spend on shelter a month:shelter=rent + mortgage + utilities + insurance You want to test if the average spending on shelter is the same across the three groups. Suppose you conduct the appropriate test and get a p-value of 0.03. What is the practical interpretation of your test?
The average spending on shelter is not the same across generations
The New York Taxicab data release presented an ethical problem because:
The data was not properly anonymized and drivers could be re-identified by linking to other data sets.
On Spetember 18th, data journalism website FiveThirtyEight had Democrats ahead on the generic ballot for the 2020 elections by 6.8 points. Suppose they have constructed a 90% confidence interval which is: [D +6.8 ± 5] Suppose instead of a 90% confidence interval, they want to use a 99% confidence interval. Compared to the original, how would the new confidence interval be different?
The margin of error would be larger
When conducting hypothesis tests for differences in means, what are the null and alternative hypotheses? H 0:_____________ H A:______________
The means are equal; the means are not equal
Which of the following statements cannot be an accurate description of data?
The median relationship status of the survey respondents is "Married"
Informed consent is impractical in Big Data Research because of:
The number of participants and the fact that gathering informed consent may reveal the identies of the persons to the researcher whereas the data itself would not.
A Madison area employer is preparing to hire for a new business analyst position. They found a report for their local labor market that says the average salary is $62,365 and the 95% confidence interval around this estimate is $40,745 - $74,985. Let's say the business offered $58,000 for their new business analyst position. Based on the information provided above, it would be correct to say:
This salary is not statistically significantly different from the average in their region (p > 0.05).
A dataset contains information on one country over a time period of multiple years. We would classify this dataset as
Time Series
In education, students' standardized test scores are sometimes used a measure of teachers' performance. There are many critiques of this practice. One is that test score data do not actually measure student learning, which is what we are trying to measure. Instead they test scores mostly reflect students' test taking abilities. Which aspect of measurement is this critique based on?
Validity
Whether the measurement accurately reflects the object or phenomenon you are trying to measure
Validity
Let's say Kwik Trip is interested in researching the success of their customer loyalty program "Kwik Rewards". Which of the following research questions is likely out of scope for this project?
Which employees have the most new loyalty program sign ups per shift? Yes. While answering this question may tell us something about employee performance, it does not focus on the success of the customer loyalty program itself and analysis by employee may mask overall trends.
You are constructing a sample of consumers from across the 4 main regions of the US: Northeast, Midwest, West, & South. You want to test how your sample's regional distribution compares to the country as a whole. You know what proportion of the US lives in each region Suppose you conduct the appropriate test and get a p-value of 0.12. What is the practical interpretation of your test?
Your sample is representative of the country
The differences between our three main study types comes down to the way treatment is determined. In a natural experiment, how is treatment determined?
by some outside party/force
Study design determines ________ while sampling design determines _______
causality, generality
Suppose you have data on the dollar amount of monetary donations given by individual UWL alums during the annual "telethon" as well as data on the characteristics of the donors, such as graduation year. What would each row of the data represent when viewing the data in spreadsheet form?
data for a single donor
Suppose a survey has collected data on typical one-way driving commute times of working adults living in a metropolitan area with limited access to public transportation. If the data is ratio scale, what would be an appropriate visualization to create?
histogram to show distribution
An experiment with random sampling but no random assignment gives us results that are
not causal, but generalizable
Suppose you want to test the relationship between college GPA and income at age 30. If your sample only includes people with engineering degrees, it likely suffers from what problem?
sample bias
Generally speaking, ________ sampling is the most effective but expensive sampling technique and __________ is the least precise but cheapest sampling technique
simple random; cluster
Nonresponse error occurs because
some individuals in the population of interest do not participate, either because they are not able to be contacted or they choose not to participate.
If response choices in a fixed alternative question are exhaustive, this means...
they allow for every possible answer that someone could wish to provide
