Exam 1 Study Guide
A soft drinks company developed a new fizzy drink (Delicious Fizz). A researcher conducted a series of blind tasting trials to measure consumer response to the new drink; this involved consumers drinking Delicious Fizz and a rival fizzy drink (Rival Fizz) and then rating the products on a ten-point taste scale. The results for consumption of Rival Fizz and taste was p = 0.02, while the results for consumption of Delicious Fizz and taste was p = 0.015. How should the researcher interpret her findings?
'Delicious Fizz' had greater significance
Why do business analysts use SPSS rather than performing calculations by hand?
- quantitative data analysis is so complex today it is essential to use a stats package - it reduces the chance of making errors in your calculations - it equips you with a useful transferable skill
Which of the following is not a file extension for files saved in SPSS?
.doc
Which of the numbers below might IBM SPSS report as 10.574 E−05? Answer choices 0.00010574 10.569 1057400.0 0000.10574
0.00010574
Given a test is normally distributed with a mean of 30 and a standard deviation of 6: What is the probability that a single score drawn at random will be greater than 34? 0.0228 0.2524 0.1826
0.2524
A researcher working in a Human Resources department was interested in gender and sales figures so he conducted a t-test. The mean for males was 66.25 and the mean for females was 78.24, whith both groups having a standard deviation of 7. What is the effect size using Cohen's d?
1.712
The owner of the large chain of coffee shops called 'MoonBucks' decided to calculate how much revenue was gained from lattes each month in a nationwide sample of 2445 cafés. To measure the variance of revenue gained from lattes, he computes SS = 351,936 for this sample. What are the degrees of freedom for variance?
2444
Twenty-one cats were given 300g of tuna each. The time in seconds was measured until they had eaten all of the tuna: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57 Compute the median.
32
If you see in SPSS the number 8.51 E-02 reported, what is the actual value of this number?
8.51x 10^-2
Which of the numbers below might IBM SPSS report as 8.96 E+03? Answer choices 89.60 8960.0 0.008960 8.960
8960.0
a hr manager conducted a review of overtime worked by employees. her sample size was 60 employees and there was a mean of 90 hours worked overtime per month. her confidence interval was 95%. what would be the upper boundary confidence interval for this study?
91.86 hours
Which of the following is true about a 95% confidence interval of the mean:
95 out of 100 confidence intervals will contain the population mean.
Approximately what percentage of people would have scores lower than an individual with a z-score of 1.65 in a normally distributed sample? Answer choices 95% 98% It is not possible to calculate this unless the mean and standard deviation are given. 1%
95%
Which of the following terms best describes the sentence: 'organizations with employee training programmes will not employ fewer men or women'. A)A directional hypothesis B)An operational definition C)A null hypothesis D)A non-directional hypothesis
A non-directional hypothesis
A researcher in a Human Resources Unit presented a recent study, which showed a statistical significance between length of staff lunch breaks and low productivity; how can she explain to her manager that this does not mean that the length of staff lunch breaks should be reduced?
A significant result does not mean that the effect is important
In SPSS, what is the data view window?
A spreadsheet into which data can be entered
Which of these statements is correct about one- and two-tailed tests?
A statistical model that tests a directional hypothesis is called a one-tailed test, whereas one testing a non-directional hypothesis is called a two-tailed test
'Children can learn a second language differently before the age of 7 than after.' Is this statement:
A two-tailed hypothesis
Why do business analysts use SPSS rather than performing calculations by hand?
All of the above.
What is the standard error?
All of the options describe the standard error.
Which of the following best describes the variable 'Gender'? Answer choices: A between-group variable. A coding variable. All of the possible answers are correct. A grouping variable.
All of the possible answers are correct
If my null hypothesis is 'Dutch people do not differ from English people in height', what is my alternative hypothesis?
All of the statements are plausible alternative hypotheses.
What is the SPINE of statistics?
An acronym for the five core concepts needed to understand statistical models
You have just joined the sales modeling team for a start-up software company. Your boss has decided that from now on the team will adopt a Bayesian approach. However, not all staff understand what this is, your boss asks you to present a training session. How would you explain a Bayesian approach in your session introduction?
An approach that shows you to update the likelihood of your statistical model as more data is collected
To generate a correlation coefficient between two variables with ordinal data. Which set of instructions should give you SPSS?
Analyse-Crosstabs-Descriptive Statistics-Spearman-OK
To generate a correlation coefficient between two variables with ordinal data. Which set of instructions should give you SPSS?
Analyze->Crosstabs->Descriptive Statistics->Spearman->OK
A human resources manager in the IT sector was concerned about unconscious bias in recruitment panels. There were two posts and seven candidates, four men and three women. Theoretically, all the candidates have an equal probability of being hired as they all match the selection criteria. However, the manager has data that suggests that it is more likely men will be hired based on data from across the IT sector and within her own company. However, the manager has implemented many equality initiatives within her company and therefore wants to determine the probability that still no women will be hired. What formula could she use to determine this probability and assess the impact of unconscious bias in her company's recruitment?
Bayes' theorem
Confidence intervals:
Can be used instead of conventional statistics based on point estimates.
Your manager had asked you to identify the number of men responding in your annual staff survey. How would you generate this output?
Click on Analyze->Descriptive Statistics->Frequencies
Your manager had asked you to identify the number of men responding in your annual staff survey. How would you generate this output?
Click on-Analyse-Descriptive Statistics-Frequencies
How would you use the drop-down menus in SPSS to generate a frequency table?
Click on: Analyze; Descriptive Statistics; Frequencies
An HR manager was interested in employee use of company on-site gyms across twenty sites. Different researchers collected and analyzed data across each of the sites but the resultant twenty reports showed differing p-values, some sites found a statistical significance between opening hours of on-site gyms and employee usage and others did not. Which of the following would be useful for her to review?
Confidence Intervals
When items on a questionnaire appear to correspond to the construct that the questionnaire claims to measure it is said to have: Answer choices: Factorial validity Ecological validity Content validity Criterion validity
Content validity
A colleague in your research agency has phoned and asked you in which sub-dialog box the chi-square test can be found. Which do you recommend?
Crosstabs-Statistics
A colleague in your research agency has phoned and asked you in which sub-dialogue box the chi-square test can be found. Which do you recommend?
Crosstabs-Statistics
Ordinal level data are characterized by: Answer choices: Equal intervals between each adjacent score. A fixed zero. Data that can be meaningfully arranged by order of magnitude. None of the above.
Data that can be meaningfully arranged by order of magnitude
Ordinal level data are characterized by?
Data that can be meaningfully arranged by order of magnitude
An analyst at your firm and you are discussing missing data. What might you suggest as an appropriate strategy for dealing with larger quantities of missing data?
Define missing values using the 'recode' function
An analyst at your firm and you are discussing missing data. What might you suggest as an appropriate strategy for dealing with larger quantities of missing data?
Define missing values using the 'recode' function.
In your experiment (Q12) you also ask some qualitative questions to enrich the statistical data. What is the correct way to record non-numerical values in SPSS?
Define the variable as 'string'.
For what is the 'variable view' in IBM SPSS's data editor used? Answer choices: Entering data. Writing syntax. Viewing output from data analysis. Defining characteristics of variables.
Defining characteristics of variables
There are basically two types of statistics - descriptive and inferential. Which of the following sentences are true about descriptive statistics?
Descriptive statistics describe the data.
'Reducing the advertising budget will reduce short-term sales performance'. State the direction of this hypothesis.
Directional
Reducing the advertising budget will reduce short-term sales performance. State the direction of the hypothesis
Directional
The degree to which a statistical model represents the data collected is known as the: Answer choices: Fit Homogeneity Reliability Validity
Fit
You have been asked to assess various atmospheric environments for a brand new fashion retail store. If you are therefore constructing a data file for a repeated-measure design with 190 subjects and three conditions (light and airy, warm and cosy, dark and intense), how many columns and rows will the file have?
Four columns and ten rows
You have been asked to assess various atmospheric environments for a brand new fashion retail store. If you are therefore constructing a data file for a repeated-measure design with 190 subjects and three conditions (light and airy, warm and cosy, dark and intense), how many columns and rolls will the file have?
Four columns and ten rows
Which of the following statements is true?
If the confidence interval for the difference between two means does include zero then the difference between the means is statistically significant.
Why is the standard error important?
It gives you a measure of how well your sample parameter represents the population value.
How is a variable name different from a variable label?
It refers to codes rather than variables
Which of the following could not be represented by columns in the SPSS data editor?
Levels of between-group variables
If we calculated an effect size and found it was r = .42 which expression would best describe the size of effect? Answer choices: Small Small to medium Medium to large Large
Medium to large
Why are large samples desirable in statistical models?
More likely to reflect the population under study
Your CEO has just read a book on criticisms of the NHST and worries that all company data analysis is now flawed and will lead to huge financial losses. How might you reassure her?
NHST does have its flaws but if we incorporate an examination of effect sizes into our analysis, we should be able to trust our research outputs
Assume a researcher found that the correlation between a test she had developed and exam performance was .5 in a study of 25 students. She had previously been informed that correlations under .30 are considered unacceptable. The 95% confidence interval was [0.131, 0.747]. Can you be confident that the true correlation is at least 0.30?
No you cannot, because the lower boundary of the confidence interval is .131, which is less than .30, and so the true correlation could be less than .30.
A stockmarket trader conducted a Bayesian analysis of variations in skirt length and stock market growth. He calculated a Bayes factor of 1. Should he use skirt length as a predictor of stock market growth?
No, a Bayes factor of 1 suggests that it is not worth investing in the stock market based on skirt length variations.
You work in a data analyst unit for a large fast food restaurant chain, planning a customer survey and a colleague informs you that a 95% confidence interval has a 95% probability of containing a population parameter. Because of this, she insists that a survey distributed at one restaurant will provide significant results. Do you agree with her?
No, because 95% probability is a long-run probability requiring that multiple tests to be done
A recruitment analyst wanted to examine the likelihood that advertising on social media is more effective than in print media for recruiting the best candidates. She conducted one study where the probability of making a Type I error was 0.05 and a Type II error was 0.2. Does her research have empirical probability?
No, to have empirical probability the likelihood of an effect being detected requires a series of repeated identical experiments, where the probability of making a Type I error is 0.05 and a Type II error is 0.2.
An experimenter measured 30 children's IQ. He then rank-ordered the children and assigned them a score from 30 (most intelligent) to 1 (least intelligent) to create a new variable. Does this new variable consist of: Nominal data Interval data Ratio data Ordinal data
Ordinal data
In our previous example, the human resources manager had already calculated the probability of women being hired based on sector wide data. In the Bayesian approach, what sort of probability is this?
Prior probability
Why do business analysts use SPSS rather than performing calculations by hand?
Quantitative data analysis is so complex today it is essential to use a stats package, it reduces the chance of making errors in your calculations, it equips you with a useful transferable skill (answer is all of the above)
What operation does the 'Recode into Different Variables' initiate?
Redistributes a range of values into a new set of categories and creates a new variable.
What operation does the "Recode into Different Variable" initiate ?
Redistributes a range of values into a new set of categories and creates new variable.
When cross-tabulating two variables, it is conventional to?
Represent the independent variable in rows and the dependant variable in columns.
When cross-tabulating two variables it is conventional to
Represent the independent variable in rows and the dependent variable in columns
Why might your experimental data file have 'missing data'?
Some of a participant's responses might be missing.
Why might your experimental data file have "missing data"?
Some of the participant's responses might be missing
Which of the following is not a transformation that can be used to correct skewed data? Answer choices Log transformation Square root transformation Reciprocal transformation Tangent transformation
Tangent Transformation
If we use the mean as a model, what does the variance represent? Answer choices: The average error between the model and the observed data. The total error between the model and the observed data. The squared total error between the model and the observed data. The square-rooted average error between the model and the observed data.
The average error between the model and the observed data
In general, as the sample size (N) increases:
The confidence interval gets narrower.
Which of the following is the least affected by outliers? Answer choices The range The mean The median The standard deviation
The median
If my experimental hypothesis were 'Eating cheese before bed affects the number of nightmares you have', what would the null hypothesis be?
The number of nightmares you have is not affected by eating cheese before bed.
What is the power of a statistical test?
The probability that it will find an effect when one exists
A 95% confidence interval is:
The range of values of the statistic which probably contains the true value of the statistic in the population.
Your business studies lecturer has devoted the past ten weeks to teaching you the Bayesian approach and is now asking that you offer a critique of it. What key criticism could you raise?
The reliance on a prior probability is overly subjective and therefore can be open to a researcher's degrees of freedom.
You are the newly appointed business analyst for a large national bank (250000 customers). At a team meeting, your boss presents the results of a survey of customers regarding their opinion (measured on a ten point scale) of a new financial product. The survey sample (of 500customers) showed a significant (p=0.23) level of satisfaction with the new financial product. How should you interpret this results for your boss?
The results is not significant but the small sample size may be missing large differences in customer satisfaction.
If we were to pull all possible samples from a population, calculate the mean for every sample, and construct a graph of the shape of the distribution based on all of the means, what would we have? Answer choices The population distribution of the mean The sampling distribution of the mean The bootstrap distribution of the mean The standard error of the mean
The sampling distribution of the mean
What is the relationship between sample size and the standard error of the mean?
The standard error decreases as the sample size increases.
Which of the following statements is true?
The standard error is calculated solely from sample attributes.
Of what is the standard error a measure?
The variability of sample estimates of a parameter.
Which of the following is not strictly a legitimate business hypothesis?
There will be no difference in productivity between younger and older employees
What is the null hypothesis for the following question: Is there a relationship between heart rate and the number of cups of coffee drunk within the last 4 hours?
There will be no relationship between heart rate and the number of cups of coffee drunk within the last 4 hours.
Under a null hypothesis, a sample value yields a p-value of .015. Which of the following statements is true?
This finding is statistically significant at the .05 level of significance.
A member of your market research team conducted tests of a new television advert with twenty different groups of consumers, in which they rated their satisfaction (on a ten point scale) with the advert and likelihood of purchasing the advertised product (on a five-point scale). He is worried about the family wide error rate across the tests, what advice would you give him.
Use a Bonferroni Correction
What are variables?
Variables are measured constructs that vary across entities in the sample.
A 95% confidence interval for the difference between two population means is found to be (−0.08, 0.15). Which of the following statements is true?
We can be 95% confident that the true difference between the population means falls between −0.08 and 0.15.
A Type I error occurs when:
We conclude that there is an effect in the population when in fact there is not.
A Type II error occurs when :
We conclude that there is not an effect in the population when in fact there is.
You lead a product-testing unit for a large pharmaceutical company. Your team has conducted forty trials of a new antibiotic but you are not sure if the results are conclusive enough to urge the company to start producing the new drug. A new data analyst has joined your team suggesting that meta-analysis might be a good idea, do you agree?
Yes, because the forty trials were identical and tested the same research question and therefore we can calculate an average effect size for the new drug.
You are the CEO of a small financial forecasting company. You have decided to adopt a Bayesian approach to data analysis and modeling. When you announce the new policy, your staff are unhappy and unconvinced, as they are used to a NHST approach. You stress that the Bayesian approach has several key advantages, including which of the following
You can evaluate the likelihood of the null hypothesis being true
Your CEO has followed your advice and now wants you to measure effect sizes. You report a Pearson's r of 0.50 for the impact of Unblock Me Now drain cleaner on reducing drain blockage time. Your CEO wants to know if this is bad, as she remembers that a p-value of 0.30 is not good. What do you tell her?
You tell her that effect size and p-values are not the same and that a Pearson's r of 0.50 is a large effect, suggesting she should rollout the launch of Unblock Me Now.
Which of the following terms best describes the sentence: 'organizations with employee training programs will not employ fewer men or women'.
a non-directional hypothesis
hypothesis
a proposed explanation for a fairly narrow phenomenon or set of observations. It is not a guess, but an informed, theory-driven attempt to explain what has been observed.
When a null hypothesis is rejected, the probability of committing a type II error is _____.
all of the above
Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)? Answer choices All of the options are true. Some feature of the data should be normally distributed. The samples being tested should have approximately equal variances. The data should be at least interval level.
all of the options are true
Theory
an explanation or set of principles that is well substantiated by repeated testing and explains a broad phenomenon
In which sub-dialog box can the Chi Square test be found?
crosstab: statistics
In your experiment, you also ask some qualitative questions to enrich the data. What is the correct way to record non-numerical values in SPSS?
define the variable as 'string'.
A trainee data analyst for a large social media company, which has falling site usage, has just completed a study into factors that affect site users' satisfaction levels. However, he finds only one statistically significant factor, which he includes in his report but he deliberately omits the other six non-significant findings. What is the term for what the data analyst has done?
p-hacking
A trainee data analyst for a large social media company, which has falling site usage, has just completed a study into factors that affect site users' satisfaction levels. However, he finds only one statistically significant factor, which he includes in his report but he deliberately, omits the other six non-significant findings. What is the term for what the data analyst has done?
p-hacking
What is the relationship between the sum of squared errors (SS), the sample size (n) and the variance (s2)? Answer choices: SS = s2/(n - 1). s2 = SS(n - 1). n = (s2/SS) - 1. s2 = SS/(n - 1).
s2 = SS/(n - 1)
If we calculated an effect size and found it was r = .21 which expression would best describe the size of effect?
small to medium
What is the standard error?
the standard deviation of sample means
Your business studies lecturer has set you the following hypothesis to test, 'there will be no association between consumer socioeconomic status and level of private health insurance'. What type of hypothesis is this?
two tailed
What symbol is used to represent the standard error of the mean?
σx̅
Theory and Hypothesis similarities and differences
Both theories and hypotheses seek to explain the world, but a theory explains a wide set of phenomena with a small set of well-established principles, whereas a hypothesis typically seeks to explain a narrower phenomenon and is, as yet, untested. Both theories and hypotheses exist in the conceptual domain, and you cannot observe them directly.
Which of the following best describes the relationship between sample size and significance testing?
In large samples even small effects can be deemed 'significant'.
The 99% confidence interval usually is:
Narrower than the 95% confidence interval.
A researcher was assessing customer satisfaction with MakeMebeautiful, a new beauty product. He had a sample size of 75 and a P-value of 0.10. Does the researcher recommend that the company stop promoting the product?
No, because the sample size is small and p-values are easily affected by sample size.
In hypothesis testing, which hypothesis do we test?
Null
What does NHST stand for?
Null Hypothesis Significance Testing
What are parameters?
Parameters are estimated from the data and are (usually) constructs believed to represent some fundamental truth about the relations between variables in the model.
What is the alternative hypothesis for the following question: Does eating salmon make your skin glow?
People who eat salmon will have a more glowing complexion compared to those who don't.
Which of these statements about statistical power is not true?
Power is the ability to detect an effect, we can use power to determine how big a sample is required to detect an effect of a certain size, power is linked to the probability of making a Type 1 error (answer is all of the above is true)
A null hypothesis
Predicts that an experimental treatment will have no effect on a dependent variable of interest
Variation due to some genuine effect is known as: Answer choices: Unsystematic variation Systematic variation Homogeneous variance Residual variance
Systematic variation
A business analyst in a software design company reviewed a series of national surveys of user satisfaction (rated on a ten-point satisfaction scale) with a new gaming interface the company had recently launched. She found that the survey mean was 8. However, her standard error was high (89.5), how should she interpret her results?
The sample mean might not be representative of the population mean
What does a significant test statistic tell us?
There is an effect in the population of sufficient magnitude to be scientifically interesting
A Type 1 error is when?
We conclude that there is a meaningful effect in the population when in fact there is not.
If research suggests that the mean number of insurance quotations a person makes in a year with a standard deviation of 4, what is the z-score for a score of 18?
-2
What is the conventional level of probability that is often accepted when conducting statistical tests in social science and business?
0.05
Children can learn a second language faster before the age of 7'. Is this statement:
A one-tailed hypothesis