Marketing Research - Exam #3
degree of freedom for chi_square test
( R− 1 ) × ( C − 1 ) (# rows - 1) x (# columns - 1)
Rxy (correlation coefficient) is bounded between ___ and ___
-1 ans 1 ()
data matrix
-the most prevalent visualized form of a data set -each row represents a record & each column represents a variable
For a variable coded "0" or "1," there are 102 "0s" and 101 "1s." What is the value of the measure of the central tendency for this type of data? 101.5 102 0 (mode) 1 None of the above
0 (mode)
A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. What is the approximate standard deviation of the sample? Hint: When the probability is p, the standard deviation is given by sqrt(p*(1-p)), where sqrt represents square root. 0.5 0.3 0.8 0.9 More information is needed.
0.3
In a data set, there is variable that measures each respondents purchase quantity of yogurt per week. The value of purchase quantity is "12, 0, 0, 1, 1, 1, 6, 10, 11." What is the median? 4.66 1 42 6.421 None of the above
1
In a data set, there is variable that measures each respondents purchase quantity of yogurt per week. The value of purchase quantity is "12, 0, 0, 1, 1, 1, 6, 10, 11." What is the mode? 4.66 1 42 6.421 None of the above
1
A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. The computed t-statistic for testing the hypothesis is 1.33 5 4.5 0.5 not shown above.
1.33
Given the above table, what percentage of households have a VCR and have a family size less than 4? 10% 38% 34% 24% 28%
10%
The FactFinder Research firm conducted a survey for a national food manufacturer, and one of the issues addressed by the research was to determine how many pounds of fish were consumed per capita annually. In the survey they found one person who consumed only one pound of fish per year while 10 people reported 200 pounds per year. The range was: 200 1 to 2,000 201 199 190
199
What is the value of the test statistic that would be used in the comparison of the two means? What is the degree of freedom? Options presented as (t-statistics, Degrees of Freedom) 2; 14 1; 16 0.5; 14 4; 16 none of the above.
2; 14
Given the above table, in absolute numbers how many millions of households own a VCR? 10 100 48 34 52
34
Given the above table, what percentage of households have a family size less than 4? 34% 66% 48% 52% 100%
48%
What is the value of the test statistic (based on the answer to the previous question) useful for determining whether there exists any relationship between VCR ownership and family size? 136.2 7.12 0.973 422.1
7.12
to interrupt other coefficients of continuous variables similarly, if p-value ___ .05, then there is no effect
> greater than
A researcher had calculated the sample chi-square statistic to be equal to χ2 = 7.71. For a significance level of 0.05, the critical value of the chi-square statistic is 7.78. The appropriate conclusion is that: A - the null hypothesis should not be rejected. B - the null hypothesis should be rejected. C - the null hypothesis should not be accepted. D - A and C are correct. E - A and B are correct.
A and C are correct.
nominal
A group/category o Gender (male vs. female)
The principle that guides the framing of a null hypothesis is: One either rejects or accepts the null hypothesis on the basis of the evidence at hand. A null hypothesis may be rejected but can never be accepted. One must always reject the null hypothesis unless the evidence is convincingly (determined by the level of significance) to the contrary. One must always accept the null hypothesis unless the evidence is convincingly (determined by the level of significance) to the contrary. There is no guiding scientific principle and the analyst should frame them as he or she sees fit.
A null hypothesis may be rejected but can never be accepted.
A researcher is interested in comparing the usage of bank debit cards by consumers in rural (r) and urban (u) areas. Specifically, she wants to know if consumers in rural areas use bank debit cards less than consumers in urban areas. Each year for the past five years, she has surveyed 16 individuals (one-half urban, one-half rural) randomly selected from across the United States. The results of the current study indicate that ~ people in urban areas use bank debit cards 12 times per month on average (x̅u =12) ~ people in rural areas use bank debit cards 10 times per month on average (x̅r =10) ~ the standard deviation in means of bank debit card usage in both rural and urban areas is 2 ( su = sr = 2) What test is appropriate for answering the above research question? A one-tailed z test of population mean A one-tailed t test of difference A two-tailed t test of difference A one-tailed z test of difference A two-tailed z test of difference.
A one-tailed t test of difference
A researcher is interested in comparing the usage of bank debit cards by consumers in rural (r) and urban (u) areas. Specifically, she wants to know if consumers in rural areas use bank debit cards less than consumers in urban areas. Each year for the past five years, she has surveyed 16 individuals (one-half urban, one-half rural) randomly selected from across the United States. The results of the current study indicate that ~ people in urban areas use bank debit cards 12 times per month on average (x̅u =12) ~ people in rural areas use bank debit cards 10 times per month on average (x̅r =10) ~ the standard deviation in means of bank debit card usage in both rural and urban areas is 2 ( su = sr = 2) What test is appropriate for answering the above research question? A one-tailed z test of population mean. A one-tailed t test of difference. A two-tailed t test of difference. A one-tailed z test of difference. A two-tailed z test of difference.
A one-tailed t test of difference.
Mike Shula is a head football coach. His athletic department spends $30,000 a season on Lizard-Aide, a flavored drink that supposedly contributes to the performance of his players. This year, an independent sports testing association has decided to test the merits of Lizard-Aide and Shula's university has been selected as a member of the national sample. The study is an experiment in which the players, unknown to them, are divided into two segments. Segment 1 receives the real Lizard-Aide prior to and during the games. Segment 2 receives a placebo, which is nothing more than sugar-flavored colored water in containers made to make the sugar-flavored water appear to be Lizard-Aide. It is common practice that, following each game, the coaches evaluate films and give each player a grade ranging from 0 to 100. After the season the sports testing association collects the data. They now have a mean score of performance for each of the two segments for all of the athletic departments participating in the study. If you are the researcher, what statistical test would you conduct? A one-tailed z test of population mean. A one-tailed t test of difference. A two-tailed t test of difference. A one-tailed z test of difference. A two-tailed z test of difference.
A one-tailed t test of difference.
confidence interval
A range that would be expected to contain the population parameter of interest.
Ordinal
A ranking (e.g., Usage type: heavy/light user)
Interval
A rating with equal distance (e.g., IQ, product rating)
Ratio
A real zero (e.g., age, height, weight, income, speed)
Chi-Square test
A statistical analysis for identifying if there exists association between two nominal/ordinal variables in the population
Identify in which of the following it would be useful for a marketing manager to test for differences between segments: A New Zealand winery wants to investigate differences between light, medium, and heavy wine drinkers. A retailer wishes to know if customer satisfaction is different between in-store versus online shoppers. A beverage company wants to know if a new beverage concept differs between users versus nonusers of the current brand. A department store wishes to know the differences between online catalog versus mail order catalog shoppers. All of the above situations would benefit from tests for differences between segments.
All of the above situations would benefit from tests for differences between segments.
heirarchical
As we go down, more refined information can be obtained and more useful for marketing decision making.
Let µu and µr be the population mean of usage rates for people in the urban and rural areas, respectively. Which of the following is the null hypothesis that the researcher should use in comparing the usage rates. A) H0 : µu= µr ; H1 : µu ≠ µr B) H0 : x̅u= x̅r ; H1 : x̅u ≠ x̅r C) H0 : µu= µr ; H1 : µu > µr D) H0 : µu= µr ; H1 : µu < µr E) H0 : x̅u= x̅r ; H1 : x̅u > x̅r
C) H0 : µu= µr ; H1 : µu > µr
Pontiac wants to know what types of persons respond favorably to proposed style changes in the Firebird. Frito-Lay wants to know what kinds of people buy from the Frito-Lay line. These are questions that may be answered through: Chi-square analysis correlation analysis analysis of variance regression analysis none of the above
Chi-square analysis
A researcher is interested in analyzing two nominal variables to determine if the observed pattern of frequency corresponds to the expected pattern. The appropriate statistical technique is: Chi-square test t-test for one mean t-test for two means z-test None of the above
Chi-square test
Nominal/Ordinal + Nominal/Ordinal
Chi-square test
Estimates of Betas
Constant Term () Coefficient Term ()
Interval/Ratio + Interval/Ratio
Correlation Analysis
Which of the following statements about cross tabulation is FALSE? The cross tabulation provides information on the joint occurrence of two variables. Cross tabulation is very useful for studying associations between categorical variables. Cross tabulation is a necessary step for test of differences. All of the above are true.
Cross tabulation is a necessary step for test of differences
Which of the following is predicted in the regression formula? Null variable Independent variable Controlled variable Dependent variable
Dependent variable
Descriptive Data Analysis
Entails rearranging, ordering, and manipulating the data to generate descriptive information that is easy to understand and interpret
Goal of Explanatory Modeling
Explain relationship between predictors (explanatory variables) and target familiar use of regression in data analysis
Chi-square test finding/example
Finding - Whether there exists significant association between two nominal/ordinal variables in the population Example - (Gender vs. Coupon use)Are female more likely to be heavy or light users of coupon than male?
Test of Differences finding/example
Finding - Whether there exists significant difference in the interval/ratio variable between two consumer segments in the population(classified by the nominal/ordinal variable) Example - (Gender vs. Internet usage) Are males more likely to use Internet than females?
Correlation Analysis finding/example
Finding - Whether there exists significant positive/negative linear association between two interval/ratio variables in the population Example - (Height vs. weight) Are height and weight correlated with each other?
Goal of Explanatory Modeling Model
Fit the data well and understand the contribution of explanatory variables to the model "goodness-of-fit" R^2 residual analysis p-values
Suppose we wanted to test the hypothesis that the mean familiarity rating exceeds 4.0, the neutral value on a seven-point scale. The hypotheses may be formulated as ________. H0 : μ1 = μ2 ; H1 : μ1 ≠ μ2 H0 : (σ1)^2 = (σ2)^2 ; H1 : (σ1)^2 ≠ (σ2)^2 H0 : μ = 4.0 ; H1 : μ > 4.0 H0 : π1 = π2 ; H1 : π1 ≠ π2 H0 : π1 ≠ π2 ; H1 : π1 = π2
H0 : μ = 4.0 ; H1 : μ > 4.0
null hypothesis
H0: a statement about the population parameter that always comes with an "=" sign, which is presumed to be true until presented with enough evidence to reject it.
What is the correct expression of the null hypothesis for the alternative hypothesis: the percentage of users who use the Internet for shopping is greater than .40? H0: π < .40 H1: π > .40 H0: π = .40 H1: π ≠ .40 H0: π ≥ .40
H0: π = .40
A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. The correct hypotheses for this situation are: H0: π = 0.88; H1: π ≠ 0.88 H0: π ≤ 0.88; H1: π > 0.88 H0: π = 0.88; H1: π < 0.88 H0: π ≠ 0.88; H1: π = 0.88 H0: π = 0.88; H1: π > 0.88
H0: π = 0.88; H1: π > 0.88
alternative hypothesis
H1: contradicts the null hypothesis about a population parameter •Here, "contradict" means "totally opposite, ≠" (two-tailed test)or "one-way opposite, > or <" (one-tailed test) •We are more interested in the alternative hypothesis because this statement is helpful in answering research questions and devising marketing strategies •The alternative hypothesis can be either rejected or accepted.
A ________ is a statement about the value of a population parameter based on prior knowledge, assumptions, or intuition. theory test specification marketing guess hypothesis
Hypothesis
Type II Error (False Negative)
If Null Hypothesis is False but you fail to reject the null hypothesis, then Type II error
Type I Error (False Positive)
If Null Hypothesis is True but you reject the null hypothesis
Causal Relationship
Meaning - Identifying the casual relationship between 2 variables using hypothesis test of experimental data Analysis - experiment EX - How much additional sales can result from 5% price reduction and 10% increase in advertising?
Inference
Meaning - inferring the value of the population mean of a variable using hypothesis test Analysis - test of population mean EX - is customer satisfaction of our store this year greater than 4 on a 5-pts scale?
Association
Meaning - investigating the association between 2 variables in the sample using hypothesis test Analysis - test of differences, Chi-square test, correlation analysis EX - Is customer satisfaction of our store this year greater than 4 on.a 5-pts scale?
description
Meaning - summarizing the sample data Analysis - descriptive statistical analysis EX - The average customer satisfaction of DELL in the sample
Which of the following is similar to a lead indicator of the multiple regression analysis findings since it is one of the first pieces of information provided in a multiple regression output? Multiple C Multiple L Multiple R Multiple T
Multiple R
Test of Differences: nominal/ordinal variable + interval/ratio variable possibilities
Nominal/Ordinal variables (demographics, geographic, behavior) : o Gender (male vs. female) o Income (high income vs. low income) o Usage rate (heavy user vs. light user) o Usage status (loyal customers vs. switchers) o Region (east vs. west vs. central) o Developed environments (urban & suburban vs. town & rural) Interval/Ratio variables (what we're interested in measuring): o Willingness to pay (e.g., how much to pay a new pair of athletic shoes) o Usage rate (e.g., the number of hours spent on Internet) o Purchase intention o Shopping basket size o Propensity to click a banner ado Intention to download and use a banking app
________ are the totals observed by counting the number of respondents who are in each cross-tabulation cell. Observed frequency Expected frequency Cell column totals Cell row totals Grand totals
Observed frequency
Goal of Predictive Modeling Model
Optimize predictive accuracy explaining role of predictors is not primary purpose!!!
What sample relationship can be inferred from the above table between family size and owning a VCR? Nothing can be inferred. The smaller the family the more likely they are to own a VCR. A lower proportion of large families (4 or more) own VCRs than small families (less than 4). Owning a VCR causes the family size to increase. Ownership of a VCR tends to increase as family size increases.
Ownership of a VCR tends to increase as family size increases.
Which of the following represents a very powerful tool because it tells us what factors are related to the dependent variable, how each factor influences the dependent variable (the sign), and how much each factor influences it? Chi-square analysis Analysis of variables Correlation analysis Regression Analysis
Regression Analysis
________ is the predictive analysis technique in which one or more variables are used to predict the level of another by use of the straight-line formula. Regression analysis Correlation Analysis of variables Predictive analytics
Regression analysis
Hypothesis Test of Correlation Steps
Step 0 - Establish the marketing research question Step 1 - State the Hypotheses Step 2 - Establish the criterion Step 3 - Compute the test statistic Step 4 - make a decision
Which of the following forms the basis of regression analysis? Dual-line equation Straight-line equation Variable equation None of the above
Straight-line equation
Test Statistic for Population Mean
T-Test
A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. Based on the t-statistic, what finding do we have? Hint: The threshold value associated with 0.05 significance level is t0.975(399) =1.96. The clothing manufacturer should NOT introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should introduce a new line of sports clothes because there is statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should NOT introduce a new line of sports clothes because there is statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line. None of the above.
The clothing manufacturer should NOT introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line.
confidence level
The probability that if a poll/test/survey were repeated over and over again, the results obtained would be the same (1 - α.) (α is usually set at 5%)
If the threshold t value is 1.746, which of the following statement about the research findings is true? The researcher CANNOT reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher can reject the null hypothesis but do NOT find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher CANNOT reject the null hypothesis and did NOT find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher can reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area. More information is need before a decision about the null hypothesis can be made.
The researcher can reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area.
Tommy Prothro, a marketing manager for Golden Snack Bars, has commissioned marketing research to determine if one recipe of snack bar is superior to another recipe. More than 400 persons who were "snack bar eaters" were involved in taste tests and, after tasting both recipes, they were asked which recipe they would purchase the next time they purchased snack bars. Tommy is now looking at the data and he sees that recipe A had 53 percent stating a preference whereas recipe B had 47 percent. Tommy's brand manager felt this was "significant" evidence that the firm should produce recipe A. But Tommy wanted more evidence so he asked the research firm to run a test to determine if there was a significant difference between the two recipes. When the firm ran the test, they reported a t value of 5.64. Furthermore, the threshold t value for one-tailed test is 1.650 and the threshold t value for two-tailed test is 1.960. This means: Tommy should follow the advice of his brand manager. There is statistical significance and one recipe of snack bar is not superior to another recipe. There is no statistically significant difference and one recipe of snack bar is not superior to another recipe. There is no statistically significant difference and one recipe of snack bar is superior to another recipe. There are statistically significant differences between the two recipe preferences in the population and one recipe of snack bar is superior to another recipe.
There are statistically significant differences between the two recipe preferences in the population and one recipe of snack bar is superior to another recipe.
***VERY IMPORTANT*** T/F if the test statistic is GREATER than the threshold value (calculated using α =0.05) >> if the p-value is SMALLER than α =0.05 >> we REJECT the NULL hypothesis and ACCEPT the ALTERNATIVE hypothesis
True
***VERY IMPORTANT*** T/F the test statistic is LESS than the threshold value (calculated using α =0.05) >> the p-value is GREATER than α =0.05 >> then we FAIL to REJECT the NULL hypothesis and REJECT the alternative hypothesis
True
T/F If we decrease the value of alpha from 5% to 1%, then the probability of a type II error will nearly always increase.
True
T/F The greater the explanatory power of the multiple regression finding, the better and more useful it is for the researcher.
True
T/F The larger the test statistic, the smaller the p-value, the stronger the evidence to reject the null hypothesis
True
T/F only the alternative can be accepted for marketing decision making
True
T/F when we try to decrease the probability of one type of error, the probability for the other type increases
True
Types of Errors
Type I and Type II
How to calculate the T Statistic in test of Differences
We calculate the t-statistic given by where • There are two segments indexed by j=1,2. • is the sample mean of segment j; • is the population mean of segment j if the null hypothesis is true; • is the sample standard deviation of ; and are the sample standard deviation of the respective segments; and• n is the sample size.
central tendency
a central or typical value for a probability distribution. It tells where the bulk of the data is ~ mean, median, mode
data set
a collection of data used in the analysis
variable
a construct that represents a quantity of the sample
The intersection of a row and column in a cross-tabulation table is called: a cross-tabulation cell a dangerous intersection a chi-square a cross-cell interaction a row box
a cross-tabulation cell
test statistic
a mathematical formula derived from the sample data and null hypothesis ~ used to determine the p-value Ex: ~ t statistic ~ Chi-square statistic
scatter plot
a plot on which data is displayed as a collection of points, each having the value of two interval.ratio variables determining the position on the x-axis and y-axis helps to visualize the correlation between two variables
Correlation Analysis
a statistical analysis for identifying the direction and the strength of the linear association between two interval/ratio variables
Test of Differences
a statistical method for comparing the population mean (of an interval/ratio variable) between two segments (segmented based on a nominal/ordinal variable). • The underlying assumption is that consumers are heterogeneous and can be segmented based on some characteristics • This analysis is popular in marketing because we are interested in how different consumer segments behave differently or respond differently to the marketing mix, which has important implications for designing marketing mix to target some consumer segments. • It could be more than two segments. But in this course, we restrict our attention to the two-segment scenario.
Test of population mean
a statistical method for testing the value of the population mean based on the sample ~ conveys information about the central tendency of the population (average Jane/Joe) and is useful for devising marketing strategies
Cross tabulation
a statistical process of organizing 2 nominal/ordinal variables by groups or catergorties to determine their association in the sample
In the formula for calculating the standard deviation, the differences between each observation and the mean is squared. If we did not square these differences, the standard deviation would: be too small to be of any usefulness always be near zero not be normally distributed not be interpreted by z scores none of the above; the formula does not require that the differences be squared
always be near zero
2 minor issues researchers should pay attention to in the data cleaning stage
ambiguities inconsistencies
If Rxy<0, X and Y have negative correlation:
an increase in X is associated with a decrease Y, and a decrease in X is associated with an increase Y.
If Rxy>0, X and Y have positive correlation:
an increase in X is associated with an increase in Y, and a decrease in X is associated with a decrease in Y
The ________ is used to test the statistical significance of the observed association in cross-tabulation. contingency coefficient Cramer's V phi coefficient chi-square statistic t statistic
chi-square statistic
The chi-square test is performed by: comparing a metric variable with a categorical variable comparing one frequency table with one cross-tabulation table comparing the difference among categories of more than three variables comparing observed frequency with expected frequency comparing the pie chart with the stacked bar chart
comparing observed frequency with expected frequency
3rd step of hypothesis testing
compute the test statistic
corresponding joint frequency table =
contingency table
parameters
corresponding population values
Use the following table for the next 5 questions. Family Size and Ownership of a VCR by Household (Figures in millions of households) The preceding table is an example of one-way classification. cross tabulation. one-way tabulation. cross classification. none of the above.
cross tabulation
Which of the following emphasizes the division of the sample into subgroups so as to learn the association between two nominal or ordinal variables? longitudinal analysis coding cross-sectional analysis cross tabulation one-way tabulation
cross tabulation
Prior to analysis, the data from a survey is arranged into a(n) ________. information graph data graph information matrix data matrix data table
data matrix
The ________ is the variable customarily termed "Y" in the regression straight-line equation. null variable dependent variable controlled variable independent variable
dependent variable
Measures of variability are concerned with: central tendency depicting "typical" difference between the values in a set of values depicting the similarities between one data matrix and another all of the above none of the above
depicting "typical" difference between the values in a set of values
The KEY of Cross Tabulation is to ...
derive the joint frequency distribution of 2 variables
Computing the average number of dollars college students have on their credit card balances exemplifies: description inference comparing differences finding relationships type III error
description
gives us sample information, but not population information
descriptive data analysis
65% of males bought a snack when they rented a DVD and 40% of females bought a snack when they rented a DVD. This is an example where a researcher would: determine if there is a difference between the average of two samples determine if there is a difference between the average of two populations determine if there is a significant association between the two populations determine if there is a difference between males and females in terms of the types of snacks purchased determine if there is a difference between DVD renters and non-renters in terms of snacks
determine if there is a difference between the average of two populations
When making a comparison between two segments of respondents to determine whether or not they are statistically different, in concept, the researcher is considering the two segments as two: different populations different answers different tests common segments related segments
different populations
The last step involved in hypothesis testing is ________. reject or do not reject the null hypothesis draw a marketing research conclusion compare the probability with level of significance alpha (α)determine the probability associated with the test statistic under the null hypothesis none of the above is true
draw a marketing research conclusion
A toy store owner is interested in knowing, at the 5% level of significance, whether parents spend greater than $100 on toys per visit to her store or not. In this case, the owner has a null hypothesis that " parents spend exactly $100 on toys per visit to her store " and an alternative hypothesis "parents spend greater than $100 on toys per visit to her store." A sample is taken and calculate the t statistic as 1.7, which is smaller than the threshold value 1.96. Based on the statistical result, we should: fail to reject the null hypothesis. do not support the hypothesis. accept the alternative hypothesis. immediately take another sample. try to take another sample in another toy store.
fail to reject the null hypothesis.
American Express executives wish to know if there is an association between credit card balance carried and the number of credit cards owned. This is an example of: description inference finding associations finding causal relationships type II error
finding associations
Assume a college professor wanted to know if the number of hours studied by her students was related to students' test scores. She would use: description inference comparing differences finding associations type III error
finding associations
American Express executives wish to know if there is a difference between the average dollar balance carried on credit cards between males and females. This is an example of: description inference findings associations finding causal relationships transforming
findings associations
Thinking of a standard deviation and the shape of the distribution, the distribution is "stretched out at both ends" (dispersed) when the standard deviation is: high moderate low very high none of the above; the two concepts are not related
high
Counting noses
how many? ~ frequency and percentage
contingent table is useful for ...
identifying associations in the sample
The chi-square test is useful for determining: if an association exists between three nominal or ordinal variables if an association exists between two interval or ratio variables if an association exists between two nominal or ordinal variables if a non-linear relationship exists between two variables if there is a linear relationship between two nominal or ordinal variables.
if an association exists between two nominal or ordinal variables
Tommy Prothro, a marketing manager for Golden Snack Bars, has commissioned marketing research to determine if one recipe of snack bar is superior to another recipe. More than 400 persons who were "snack bar eaters" were involved in taste tests and, after tasting both recipes, they were asked which recipe they would purchase the next time they purchased snack bars. Tommy is now looking at the data and he sees that recipe A had 53 percent stating a preference, whereas recipe B had 47 percent. Tommy's brand manager felt this was "significant" evidence that the firm should produce recipe A. But Tommy wanted more evidence so he asked the research firm to run a test to determine if there was a significant difference between the two recipes. By doing this, Tommy would get information that would allow him to determine: if there are real differences between the two recipe preferences in the population. if the differences between the recipes are really 6 percent or more. the number of consumers in each target market preferring recipe A versus B. whether or not the statisticians in the research firm agree with his brand manager. None of the above; there is no statistical test to determine significant differences between two percentages.
if there are real differences between the two recipe preferences in the population.
data inconsistency
incompatible answers to different questions
The ________ variable is customarily termed x in the regression formula. null dependent independent controlled
independent
Suppose that you are interested in know if the customer satisfaction of a restaurant is greater than 4 on a 5-pt scale. You should use: description inference finding causal relationship finding associations type III error
inference
to perform correlation analysis, both variables must be ___/___ data
interval/ratio data EX: o Height and weight. o Income and willingness to pay (7-pts scale) o Sales and customer satisfaction (5-pts scale)
rationale
is to determine the likelihood of selecting a sample in hand, if the null hypothesis is true
higher variation =
larger range
If we adopt a 95 percent level of confidence, we need a P value to be significant if it is: less than 0.01 less than or equal to 0.05 greater than 0.05 greater than or equal to 0.05 0.90 or greater.
less than or equal to 0.05
If we adopt a 95 percent level of confidence, we need a p-value, to be significant (i.e., flag is waving) if it is: less than or equal to 0.05 less than 0.01greater than 0.05 greater than or equal to 0.05 0.90 or greater
less than or equal to 0.05
correlation analysis can only capture ___ ___
linear relationship
last step of hypothesis testing
make decisions/conclusion
The value obtained by summing all elements in a set and dividing by the number of elements is the ________. mean median mode range standard deviation
mean
curvilinear relationship
means that some smooth pattern describes the relationship
linear relationship
means the two variables have a "straight-line" relationship
variation
measures how dispersed or spread out the sample data is
Sample Covariance (Sxy)
measures the direction of the linear relationship between two variables (X and Y)
Sample Standard Deviation (Sx x Sy)
measures the variation of the variable (X or Y) ***always POSITIVE***
2 major issues in the Data Cleaning stage
missing values unreliable respondents
The ________ is the value that occurs most frequently. mean median mode range standard deviation
mode
Degrees of Freedom =
n-1 (n is the sample size)
what is the degree of freedom when "n" is the sample size?
n-2
When a computed t value (for a test of population mean with the alternative hypothesis as "the population mean is greater than the hypothesized value"), say 4.21, is larger than t0.95(n-1) = 1.65, then this amounts to: support for the null hypothesis; the population mean and the hypothesized mean is no support for the null hypothesis; the two percentages are NOT different support for the null hypothesis, the two percentages are NOT different no support for the null hypothesis; the two percentages are different None of the above; a z value is inappropriate for testing the differences between two averages.
no support for the null hypothesis; the two percentages are different
When a computed t value (for a test for differences between two percentages), 4.21, is larger than the standard t value, 1.96, then this amounts to: support for the null hypothesis; the two percentages are different. no support for the null hypothesis; the two percentages are not different. support for the null hypothesis, the two percentages are not different. no support for the null hypothesis; the two percentages are different. None of the above; a z value is inappropriate for testing the differences between two percentages.
no support for the null hypothesis; the two percentages are different.
What is the degree of freedom associated with the above test of Chi-square analysis? Hint: degree of freedom = (m-1) * (n-1) = (2-1) * (2-1) = 1 where m is the number of rows and n is the number of columns. 9 6 199 198 none of the above.
none of the above.
Which of the following states that the difference between the population parameters between two groups is zero? null parameter null hypothesis alternative hypothesis null alternative hypothesis zero hypothesis
null hypothesis
3 types of descriptive data analysis
o Counting noses (frequency, percentile) o Central tendency (mean, mode, median) o Variation (range, standard deviation)
cross tabulation is the most common data analysis in market research because ...
o easy to conduct/understand o gain better insights concerning complex market phenomenon o very useful for marketing segmentation and targeting
Chi-Squares is always _____ - tailed in nature (no subscript in the threshold value)
one
The alternative hypothesis: the percentage of users who use the Internet for shopping is greater than .40, is a ________. correlation analysis two-tailed test test of difference one-tailed test none of the above is true
one-tailed test
When a researcher is determining if the difference between two segments' parameters are statistically significant, he or she is considering the two segments as two separate populations and the question is whether or not the two different populations' ________________. z scores are the same t scores are the same parameters are different associations are different summarization values are the same
parameters are different
For purposes of comparison, we can convert the frequency by dividing the frequency of each value by the total number of observations, which results in the: mean variable count percentage standard deviation
percentage
Goal of Predictive modeling
predict target values in other data where we have predictor values, but not target values
correlation coefficient (Pearson's R)
rxy the statistical measure of the correlation between two variables (X and Y)
Statisitcs
sample values
2nd step of hypothesis testing
set the criterion for a decision (then collect data)
Cross tabulation works best for ...
small number of categories
If you have a question that has an interval or ratio data, which of the following should be used to report the variability? frequency distribution cumulative percentage distribution percentage distribution and range standard deviation and range accumulative percentage standard deviation
standard deviation and range
In the step-by-step approach to the presentation of hypothesis tests, the first step is: state the hypothesis by performing appropriate hypothesis test computations, state if the hypothesis is supported or not supported test the hypothesis only on variables that are metric if the hypothesis is not supported, compute confidence intervals to provide the client with the appropriate confidence intervals none of the above; there is no step-by-step approach to the presentation of the hypothesis test.
state the hypothesis
1st step of hypothesis testing
state the hypothesis (one null, one alternative)
Researchers keep in mind that the independence assumption stipulates that the independent variables must be ________ before running a multiple regression. statistically independent and uncorrelated with one another statistically dependent and uncorrelated with one another statistically independent and correlated with one another statistically dependent and correlated with one another
statistically independent and uncorrelated with one another
Low multiple R-squared and adjusted R -squared values signal the ________. regression plane contains multiple errors regression analysis should be rerun straight-line model does not apply well straight-line model applies well
straight-line model does not apply well
-1 Rxy shows
strong negative strength of correlation between X and Y
+1 Rxy shows
strong positive strength of correlation between X and Y
An analyst is interested in testing the hypothesis H0 : μ = 15,000 ; H1 : μ < 15,000. The data consist of 20 observations of an interval variable. The correct statistical procedure is the: z-test t-test. Chi-square test. analysis of variance. goodness-of-fit.
t-test
If Level of significance (α) is mentioned to be 5% and suppose n=20, the appropriate tcrit value to compare against is:
t1-.05(20-1) = t.95(19) [For One Tailed Test] t1-.05/2(20-1) = t1-.025(19) = t1-.025(19) = t.975(19) [For Two Tailed Test]
one-tailed test (degrees of freedom)
tcrit value = t1-α (Degrees of Freedom)
two-tailed test (degrees of freedom)
tcrit value = t1-α/2 (Degrees of Freedom)
Nominal/Ordinal + Interval/Ratio
test of differences
Chi-square statistic measures ...
the "distance" between the sample and the null hypothesis
standard deviation
the average deviation from the sample mean
If rxy=0, X and Y have zero correlation:
the change in X is independent of the change in Y.
range
the difference between the max. value and the min. value (maximum value) - (minimum value)
A statistically significant test of population mean means: there is practical significance. the differences between the population mean and the hypothesized value would remain in a large number of trials if we repeated the survey over many times. the p values are very large. the z values are very small. the x values are average.
the differences between the population mean and the hypothesized value would remain in a large number of trials if we repeated the survey over many times.
Rxy (correlation coefficient) measures ...
the direction of the correlation between X and Y
monotonic relationship
the general direction of a relationship between two variables is known EX: - increasing relationship - decreasing relationship
p-value
the likelihood of obtaining the sample parameter if the null hypothesis is true comparable to "the number of jury members who believe the defendant is innocent."
40% women video renters buy snacks; 65% male video renters buy snacks. A computed t statistic is 4.5. Given that the threshold t value is 1.96, this means: nothing; z does not determine anything the null hypothesis is not supported; there is a true difference between the two percentages the null hypothesis is supported; there is a true difference between the two percentages the alternative hypothesis is not supported; there is a true difference between the two percentages the null alternative is supported; there is a true difference between the two percentages
the null hypothesis is not supported; there is a true difference between the two percentages
40% women video renters buy snacks; 65% male video renters buy snacks. A computed t statistic is 4.5. Given that the threshold t value is 1.96, this means: nothing; z does not determine anything the null hypothesis is not supported; there is a true difference between the two percentages the null hypothesis is supported; there is a true difference between the two percentages the alternative hypothesis is not supported; there is a true difference between the two percentages the null alternative is supported; there is a true difference between the two percentages
the null hypothesis is not supported; there is a true difference between the two percentages
frequency
the number of observations in each category of the variable ~ mainly used for nominal and ordinal data ~ preferred graphic illustration = bar graph
alpha (significance level)
the probability of making the wrong decision when the null hypothesis is true (α).
percentage
the proportion of observations in each category of the variable ~ mainly used for nominal and ordinal data ~ frequency / sample size ~ preferred graphic illustration = pie graph
statisitcal inference
the set of procedures used in which sample statistics are used to estimate population parameters
The size of Rxy measure ...
the strength of the correlation between X and Y
The functions of data analysis "match up" with: the types of problems the types of research objectives the types of type I errors the types of type II errors the types of type III errors
the types of research objectives
The BEST way to handle missing values when analyzing the data is to: leave the response blank discard the record or questionnaire with missing values substitute the sample mean for the missing value both A and B are correct there is no single best way for handling missing values; rather, their treatment depends on the purpose of the study, the incidence of missing values, and the methods that will be used to analyze the data
there is no single best way for handling missing values; rather, their treatment depends on the purpose of the study, the incidence of missing values, and the methods that will be used to analyze the data
The logic of the chi-square test would argue that, for a significant relationship to exist: there should be large differences between the observed and expected frequency there should be few differences between the observed and expected frequency there should be no differences between the observed and expected frequency there should be negative differences between the observed and expected frequency there should be only one difference between the observed and expected frequency
there should be large differences between the observed and expected frequency
What's the last step in hypothesis testing?
to compare the probability with the level of significance alpha (α)
objective of hypothesis testing
to infer population info from the sample. In this course, the population information of interest could be • the population mean of a variable (L11-2) • the association between two variables (L12-1, L12-2, L12-3) • The causal relationship between two variables (L13)
Nonmonotonic relationship
two variables are associated, but only in a very general sense. The presence (or absence) of one variable is associated with the presence (or absence) of another.
Hypothesis Test of Correlation
used to test whether there exists correlation between two variables in the population
When is a two-tailed test used in the hypothesis test of correlation?
used when we only want to know whether the correlation between two variables is different from zero or not H0: Rxy = 0 H1: Rxy ≠ 0 Then, compare the absolute value of the t statistic with the threshold value t0.975(n-2) as and n-2 is the degree of freedom where "n" is the sample size If |t| > t0.975(n-2) then we REJECT the Null and ACCEPT alternative If |t| < t0.975(n-2) then we FAIL to REJECT the Null and REJECT the alternative
When is a one-tailed test used in the hypothesis test of correlation?
used when we only want to know whether the correlation between two variables is greater/less than zero or not H0: Rxy = 0 H1: Rxy > 0 or Rxy < 0 When H1: Rxy > 0 we compare the t statistic with t0.95(n-2) ~ if t > t0.95(n-2) then we reject the null and accept the alternative When H1: Rxy < 0 t statistic must be smaller than 0. And then we compare the negative t statistic with t0.95(n-2) ~ If t > t0.95(n-2) then we reject the null and accept the alternative ~ If t < t).95(n-2) then we fail to reject the null and reject the alternative
two-tailed test
used when we only want to know whether the population mean is unequal to a certain value (H0: = c; H𝝁 1: ≠c) EX: H0: Americans spend on average 5 hours on Internet (𝝁=5) H1: The amount of time spent on Internet is unequal to 5 (𝝁≠5) We compare the absolute value of the t statistic with t0.975(n-1) (t-critical value) as the threshold value where n is the sample size o If |t| > t0.975(n-1) then we reject the null and accept the alternative. o If |t| < t0.975(n-1) then we fail to reject the null and reject the alternative.
Test of Differences: two-tailed test
used when we only want to know whether two segments have different population means or not (H0: ; H1:) • We compare the absolute value of the t statistic with t0.975(+-2) as the threshold/critical-value and +-2 is the degree of freedom where is the sample size of segment 1 and is the sample size of segment 2. o If |t| > t0.975(+-2) then we reject the null and accept the alternative. o If |t| < t0.975(+-2) then we fail to reject the null and reject the alternative.
one-tailed test
used when we want to know if the population mean is either greater or smaller than a certain value (H0: = c ;𝝁H1: > c) or (H𝝁 0: = c ; H𝝁 1: < c) •When H0: > c, We compare the t statistic with t𝝁 0.95(n-1) (t-critical value). o If t > t0.95(n-1) then we reject the null and accept the alternative. o If t < t0.95(n-1) then we fail to reject the null and reject the alternative •When H0: < c, t statistic must be smaller than 0. And then we𝝁compare the negative t-statistic with t0.95(n-1) (t-critical value). o If -t > t0.95(n-1) then we reject the null and accept the alternative. o If -t < t0.95(n-1) then we fail to reject the null and reject the alternative.
Test of Differences: one-tailed test
used when we want to know if the population mean of one segment is greater/smaller than that of the other segment (H0: ; H1:) or (H1:).•When H1: , We compare the t statistic with t0.95(+-2). o If t > t0.95(+-2) then we reject the null and accept the alternative. o If t < t0.95(+-2) then we fail to reject the null and reject the alternative.•When H1: , t statistic must be smaller than 0. And then we compare the negative t-statistic with t0.95(+-2). o If -t > t0.95(+-2) then we reject the null and accept the alternative. o If -t < t0.95(+-2) then we fail to reject the null and reject the alternative
Data ambiguity
vague responses to open-ended questions
What can be inferred from test of population mean
we infer where the population mean is and answer questions about what value the population mean is likely to be given the sample data
test statistic for hypothesis test of correlation
where Rxy is the correlation coefficient in the sample; "n" is the sample size
chi-square needs to be used when ...
you want to draw inference from the sample to the population
Three Association Analyses
~ Variable A / Variable B ~ Nominal / Ordinal ~Interval/Ratio
level of significance (alpha)
~ refers to a criterion of judgment upon which a decision (of "rejecting the null and accepting the alternative" or "failing to reject the null and rejecting the alternative") is made regarding the value stated in a null hypothesis in Step 1 • α is usually set at 5% ~ comparable to "the defendant is convicted if all jurors believe that the defendant is guilty, after presented with evidence."
standard error
~ the measure of variably in the sampling distribution ~ while standard deviation describes the variability of individual data points from the mean, SE measures the variability of the entire distribution
(Predictive Model) predictive value of dependent variable is
Ŷ = α + b1X1 + b2X2 + .......+bkXk where , ,...., are the estimated coefficients and X1i, X2i....., Xki are the value of predictor variables for our observations.
Linear Regression Assumptions
• The choice of predictors and their form is correct (linearity) • The observations are independent of one another • The variability of outcome variable (given a set of predictors)is same regardless of the values of predictors (homoskedasticity) • The noise ε follows a normal distribution