Marketing Research - Exam #3

Ace your homework & exams now with Quizwiz!

degree of freedom for chi_square test

( R− 1 ) × ( C − 1 ) (# rows - 1) x (# columns - 1)

Rxy (correlation coefficient) is bounded between ___ and ___

-1 ans 1 ()

data matrix

-the most prevalent visualized form of a data set -each row represents a record & each column represents a variable

For a variable coded "0" or "1," there are 102 "0s" and 101 "1s." What is the value of the measure of the central tendency for this type of data? 101.5 102 0 (mode) 1 None of the above

0 (mode)

A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. What is the approximate standard deviation of the sample? Hint: When the probability is p, the standard deviation is given by sqrt(p*(1-p)), where sqrt represents square root. 0.5 0.3 0.8 0.9 More information is needed.

0.3

In a data set, there is variable that measures each respondents purchase quantity of yogurt per week. The value of purchase quantity is "12, 0, 0, 1, 1, 1, 6, 10, 11." What is the median? 4.66 1 42 6.421 None of the above

1

In a data set, there is variable that measures each respondents purchase quantity of yogurt per week. The value of purchase quantity is "12, 0, 0, 1, 1, 1, 6, 10, 11." What is the mode? 4.66 1 42 6.421 None of the above

1

A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. The computed t-statistic for testing the hypothesis is 1.33 5 4.5 0.5 not shown above.

1.33

Given the above table, what percentage of households have a VCR and have a family size less than 4? 10% 38% 34% 24% 28%

10%

The FactFinder Research firm conducted a survey for a national food manufacturer, and one of the issues addressed by the research was to determine how many pounds of fish were consumed per capita annually. In the survey they found one person who consumed only one pound of fish per year while 10 people reported 200 pounds per year. The range was: 200 1 to 2,000 201 199 190

199

What is the value of the test statistic that would be used in the comparison of the two means? What is the degree of freedom? Options presented as (t-statistics, Degrees of Freedom) 2; 14 1; 16 0.5; 14 4; 16 none of the above.

2; 14

Given the above table, in absolute numbers how many millions of households own a VCR? 10 100 48 34 52

34

Given the above table, what percentage of households have a family size less than 4? 34% 66% 48% 52% 100%

48%

What is the value of the test statistic (based on the answer to the previous question) useful for determining whether there exists any relationship between VCR ownership and family size? 136.2 7.12 0.973 422.1

7.12

to interrupt other coefficients of continuous variables similarly, if p-value ___ .05, then there is no effect

> greater than

A researcher had calculated the sample chi-square statistic to be equal to χ2 = 7.71. For a significance level of 0.05, the critical value of the chi-square statistic is 7.78. The appropriate conclusion is that: A - the null hypothesis should not be rejected. B - the null hypothesis should be rejected. C - the null hypothesis should not be accepted. D - A and C are correct. E - A and B are correct.

A and C are correct.

nominal

A group/category o Gender (male vs. female)

The principle that guides the framing of a null hypothesis is: One either rejects or accepts the null hypothesis on the basis of the evidence at hand. A null hypothesis may be rejected but can never be accepted. One must always reject the null hypothesis unless the evidence is convincingly (determined by the level of significance) to the contrary. One must always accept the null hypothesis unless the evidence is convincingly (determined by the level of significance) to the contrary. There is no guiding scientific principle and the analyst should frame them as he or she sees fit.

A null hypothesis may be rejected but can never be accepted.

A researcher is interested in comparing the usage of bank debit cards by consumers in rural (r) and urban (u) areas. Specifically, she wants to know if consumers in rural areas use bank debit cards less than consumers in urban areas. Each year for the past five years, she has surveyed 16 individuals (one-half urban, one-half rural) randomly selected from across the United States. The results of the current study indicate that ~ people in urban areas use bank debit cards 12 times per month on average (x̅u =12) ~ people in rural areas use bank debit cards 10 times per month on average (x̅r =10) ~ the standard deviation in means of bank debit card usage in both rural and urban areas is 2 ( su = sr = 2) What test is appropriate for answering the above research question? A one-tailed z test of population mean A one-tailed t test of difference A two-tailed t test of difference A one-tailed z test of difference A two-tailed z test of difference.

A one-tailed t test of difference

A researcher is interested in comparing the usage of bank debit cards by consumers in rural (r) and urban (u) areas. Specifically, she wants to know if consumers in rural areas use bank debit cards less than consumers in urban areas. Each year for the past five years, she has surveyed 16 individuals (one-half urban, one-half rural) randomly selected from across the United States. The results of the current study indicate that ~ people in urban areas use bank debit cards 12 times per month on average (x̅u =12) ~ people in rural areas use bank debit cards 10 times per month on average (x̅r =10) ~ the standard deviation in means of bank debit card usage in both rural and urban areas is 2 ( su = sr = 2) What test is appropriate for answering the above research question? A one-tailed z test of population mean. A one-tailed t test of difference. A two-tailed t test of difference. A one-tailed z test of difference. A two-tailed z test of difference.

A one-tailed t test of difference.

Mike Shula is a head football coach. His athletic department spends $30,000 a season on Lizard-Aide, a flavored drink that supposedly contributes to the performance of his players. This year, an independent sports testing association has decided to test the merits of Lizard-Aide and Shula's university has been selected as a member of the national sample. The study is an experiment in which the players, unknown to them, are divided into two segments. Segment 1 receives the real Lizard-Aide prior to and during the games. Segment 2 receives a placebo, which is nothing more than sugar-flavored colored water in containers made to make the sugar-flavored water appear to be Lizard-Aide. It is common practice that, following each game, the coaches evaluate films and give each player a grade ranging from 0 to 100. After the season the sports testing association collects the data. They now have a mean score of performance for each of the two segments for all of the athletic departments participating in the study. If you are the researcher, what statistical test would you conduct? A one-tailed z test of population mean. A one-tailed t test of difference. A two-tailed t test of difference. A one-tailed z test of difference. A two-tailed z test of difference.

A one-tailed t test of difference.

confidence interval

A range that would be expected to contain the population parameter of interest.

Ordinal

A ranking (e.g., Usage type: heavy/light user)

Interval

A rating with equal distance (e.g., IQ, product rating)

Ratio

A real zero (e.g., age, height, weight, income, speed)

Chi-Square test

A statistical analysis for identifying if there exists association between two nominal/ordinal variables in the population

Identify in which of the following it would be useful for a marketing manager to test for differences between segments: A New Zealand winery wants to investigate differences between light, medium, and heavy wine drinkers. A retailer wishes to know if customer satisfaction is different between in-store versus online shoppers. A beverage company wants to know if a new beverage concept differs between users versus nonusers of the current brand. A department store wishes to know the differences between online catalog versus mail order catalog shoppers. All of the above situations would benefit from tests for differences between segments.

All of the above situations would benefit from tests for differences between segments.

heirarchical

As we go down, more refined information can be obtained and more useful for marketing decision making.

Let µu and µr be the population mean of usage rates for people in the urban and rural areas, respectively. Which of the following is the null hypothesis that the researcher should use in comparing the usage rates. A) H0 : µu= µr ; H1 : µu ≠ µr B) H0 : x̅u= x̅r ; H1 : x̅u ≠ x̅r C) H0 : µu= µr ; H1 : µu > µr D) H0 : µu= µr ; H1 : µu < µr E) H0 : x̅u= x̅r ; H1 : x̅u > x̅r

C) H0 : µu= µr ; H1 : µu > µr

Pontiac wants to know what types of persons respond favorably to proposed style changes in the Firebird. Frito-Lay wants to know what kinds of people buy from the Frito-Lay line. These are questions that may be answered through: Chi-square analysis correlation analysis analysis of variance regression analysis none of the above

Chi-square analysis

A researcher is interested in analyzing two nominal variables to determine if the observed pattern of frequency corresponds to the expected pattern. The appropriate statistical technique is: Chi-square test t-test for one mean t-test for two means z-test None of the above

Chi-square test

Nominal/Ordinal + Nominal/Ordinal

Chi-square test

Estimates of Betas

Constant Term () Coefficient Term ()

Interval/Ratio + Interval/Ratio

Correlation Analysis

Which of the following statements about cross tabulation is FALSE? The cross tabulation provides information on the joint occurrence of two variables. Cross tabulation is very useful for studying associations between categorical variables. Cross tabulation is a necessary step for test of differences. All of the above are true.

Cross tabulation is a necessary step for test of differences

Which of the following is predicted in the regression formula? Null variable Independent variable Controlled variable Dependent variable

Dependent variable

Descriptive Data Analysis

Entails rearranging, ordering, and manipulating the data to generate descriptive information that is easy to understand and interpret

Goal of Explanatory Modeling

Explain relationship between predictors (explanatory variables) and target familiar use of regression in data analysis

Chi-square test finding/example

Finding - Whether there exists significant association between two nominal/ordinal variables in the population Example - (Gender vs. Coupon use)Are female more likely to be heavy or light users of coupon than male?

Test of Differences finding/example

Finding - Whether there exists significant difference in the interval/ratio variable between two consumer segments in the population(classified by the nominal/ordinal variable) Example - (Gender vs. Internet usage) Are males more likely to use Internet than females?

Correlation Analysis finding/example

Finding - Whether there exists significant positive/negative linear association between two interval/ratio variables in the population Example - (Height vs. weight) Are height and weight correlated with each other?

Goal of Explanatory Modeling Model

Fit the data well and understand the contribution of explanatory variables to the model "goodness-of-fit" R^2 residual analysis p-values

Suppose we wanted to test the hypothesis that the mean familiarity rating exceeds 4.0, the neutral value on a seven-point scale. The hypotheses may be formulated as ________. H0 : μ1 = μ2 ; H1 : μ1 ≠ μ2 H0 : (σ1)^2 = (σ2)^2 ; H1 : (σ1)^2 ≠ (σ2)^2 H0 : μ = 4.0 ; H1 : μ > 4.0 H0 : π1 = π2 ; H1 : π1 ≠ π2 H0 : π1 ≠ π2 ; H1 : π1 = π2

H0 : μ = 4.0 ; H1 : μ > 4.0

null hypothesis

H0: a statement about the population parameter that always comes with an "=" sign, which is presumed to be true until presented with enough evidence to reject it.

What is the correct expression of the null hypothesis for the alternative hypothesis: the percentage of users who use the Internet for shopping is greater than .40? H0: π < .40 H1: π > .40 H0: π = .40 H1: π ≠ .40 H0: π ≥ .40

H0: π = .40

A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. The correct hypotheses for this situation are: H0: π = 0.88; H1: π ≠ 0.88 H0: π ≤ 0.88; H1: π > 0.88 H0: π = 0.88; H1: π < 0.88 H0: π ≠ 0.88; H1: π = 0.88 H0: π = 0.88; H1: π > 0.88

H0: π = 0.88; H1: π > 0.88

alternative hypothesis

H1: contradicts the null hypothesis about a population parameter •Here, "contradict" means "totally opposite, ≠" (two-tailed test)or "one-way opposite, > or <" (one-tailed test) •We are more interested in the alternative hypothesis because this statement is helpful in answering research questions and devising marketing strategies •The alternative hypothesis can be either rejected or accepted.

A ________ is a statement about the value of a population parameter based on prior knowledge, assumptions, or intuition. theory test specification marketing guess hypothesis

Hypothesis

Type II Error (False Negative)

If Null Hypothesis is False but you fail to reject the null hypothesis, then Type II error

Type I Error (False Positive)

If Null Hypothesis is True but you reject the null hypothesis

Causal Relationship

Meaning - Identifying the casual relationship between 2 variables using hypothesis test of experimental data Analysis - experiment EX - How much additional sales can result from 5% price reduction and 10% increase in advertising?

Inference

Meaning - inferring the value of the population mean of a variable using hypothesis test Analysis - test of population mean EX - is customer satisfaction of our store this year greater than 4 on a 5-pts scale?

Association

Meaning - investigating the association between 2 variables in the sample using hypothesis test Analysis - test of differences, Chi-square test, correlation analysis EX - Is customer satisfaction of our store this year greater than 4 on.a 5-pts scale?

description

Meaning - summarizing the sample data Analysis - descriptive statistical analysis EX - The average customer satisfaction of DELL in the sample

Which of the following is similar to a lead indicator of the multiple regression analysis findings since it is one of the first pieces of information provided in a multiple regression output? Multiple C Multiple L Multiple R Multiple T

Multiple R

Test of Differences: nominal/ordinal variable + interval/ratio variable possibilities

Nominal/Ordinal variables (demographics, geographic, behavior) : o Gender (male vs. female) o Income (high income vs. low income) o Usage rate (heavy user vs. light user) o Usage status (loyal customers vs. switchers) o Region (east vs. west vs. central) o Developed environments (urban & suburban vs. town & rural) Interval/Ratio variables (what we're interested in measuring): o Willingness to pay (e.g., how much to pay a new pair of athletic shoes) o Usage rate (e.g., the number of hours spent on Internet) o Purchase intention o Shopping basket size o Propensity to click a banner ado Intention to download and use a banking app

________ are the totals observed by counting the number of respondents who are in each cross-tabulation cell. Observed frequency Expected frequency Cell column totals Cell row totals Grand totals

Observed frequency

Goal of Predictive Modeling Model

Optimize predictive accuracy explaining role of predictors is not primary purpose!!!

What sample relationship can be inferred from the above table between family size and owning a VCR? Nothing can be inferred. The smaller the family the more likely they are to own a VCR. A lower proportion of large families (4 or more) own VCRs than small families (less than 4). Owning a VCR causes the family size to increase. Ownership of a VCR tends to increase as family size increases.

Ownership of a VCR tends to increase as family size increases.

Which of the following represents a very powerful tool because it tells us what factors are related to the dependent variable, how each factor influences the dependent variable (the sign), and how much each factor influences it? Chi-square analysis Analysis of variables Correlation analysis Regression Analysis

Regression Analysis

________ is the predictive analysis technique in which one or more variables are used to predict the level of another by use of the straight-line formula. Regression analysis Correlation Analysis of variables Predictive analytics

Regression analysis

Hypothesis Test of Correlation Steps

Step 0 - Establish the marketing research question Step 1 - State the Hypotheses Step 2 - Establish the criterion Step 3 - Compute the test statistic Step 4 - make a decision

Which of the following forms the basis of regression analysis? Dual-line equation Straight-line equation Variable equation None of the above

Straight-line equation

Test Statistic for Population Mean

T-Test

A large clothing manufacturer plans to introduce a new line of sports clothes for women if preliminary market research shows that more than 88% of the population is favorably impressed by the new line. Four hundred women were surveyed; 360 of the women were favorably impressed. The research manager wants to test the hypothesis at the 0.05 significance level. Based on the t-statistic, what finding do we have? Hint: The threshold value associated with 0.05 significance level is t0.975(399) =1.96. The clothing manufacturer should NOT introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should introduce a new line of sports clothes because there is statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should NOT introduce a new line of sports clothes because there is statistical evidence that more than 88% of the population is favorably impressed by the new line. The clothing manufacturer should introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line. None of the above.

The clothing manufacturer should NOT introduce a new line of sports clothes because there is NO statistical evidence that more than 88% of the population is favorably impressed by the new line.

confidence level

The probability that if a poll/test/survey were repeated over and over again, the results obtained would be the same (1 - α.) (α is usually set at 5%)

If the threshold t value is 1.746, which of the following statement about the research findings is true? The researcher CANNOT reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher can reject the null hypothesis but do NOT find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher CANNOT reject the null hypothesis and did NOT find statistical evidence that people in urban areas use bank debit cards more than people in rural area. The researcher can reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area. More information is need before a decision about the null hypothesis can be made.

The researcher can reject the null hypothesis and find statistical evidence that people in urban areas use bank debit cards more than people in rural area.

Tommy Prothro, a marketing manager for Golden Snack Bars, has commissioned marketing research to determine if one recipe of snack bar is superior to another recipe. More than 400 persons who were "snack bar eaters" were involved in taste tests and, after tasting both recipes, they were asked which recipe they would purchase the next time they purchased snack bars. Tommy is now looking at the data and he sees that recipe A had 53 percent stating a preference whereas recipe B had 47 percent. Tommy's brand manager felt this was "significant" evidence that the firm should produce recipe A. But Tommy wanted more evidence so he asked the research firm to run a test to determine if there was a significant difference between the two recipes. When the firm ran the test, they reported a t value of 5.64. Furthermore, the threshold t value for one-tailed test is 1.650 and the threshold t value for two-tailed test is 1.960. This means: Tommy should follow the advice of his brand manager. There is statistical significance and one recipe of snack bar is not superior to another recipe. There is no statistically significant difference and one recipe of snack bar is not superior to another recipe. There is no statistically significant difference and one recipe of snack bar is superior to another recipe. There are statistically significant differences between the two recipe preferences in the population and one recipe of snack bar is superior to another recipe.

There are statistically significant differences between the two recipe preferences in the population and one recipe of snack bar is superior to another recipe.

***VERY IMPORTANT*** T/F if the test statistic is GREATER than the threshold value (calculated using α =0.05) >> if the p-value is SMALLER than α =0.05 >> we REJECT the NULL hypothesis and ACCEPT the ALTERNATIVE hypothesis

True

***VERY IMPORTANT*** T/F the test statistic is LESS than the threshold value (calculated using α =0.05) >> the p-value is GREATER than α =0.05 >> then we FAIL to REJECT the NULL hypothesis and REJECT the alternative hypothesis

True

T/F If we decrease the value of alpha from 5% to 1%, then the probability of a type II error will nearly always increase.

True

T/F The greater the explanatory power of the multiple regression finding, the better and more useful it is for the researcher.

True

T/F The larger the test statistic, the smaller the p-value, the stronger the evidence to reject the null hypothesis

True

T/F only the alternative can be accepted for marketing decision making

True

T/F when we try to decrease the probability of one type of error, the probability for the other type increases

True

Types of Errors

Type I and Type II

How to calculate the T Statistic in test of Differences

We calculate the t-statistic given by where • There are two segments indexed by j=1,2. • is the sample mean of segment j; • is the population mean of segment j if the null hypothesis is true; • is the sample standard deviation of ; and are the sample standard deviation of the respective segments; and• n is the sample size.

central tendency

a central or typical value for a probability distribution. It tells where the bulk of the data is ~ mean, median, mode

data set

a collection of data used in the analysis

variable

a construct that represents a quantity of the sample

The intersection of a row and column in a cross-tabulation table is called: a cross-tabulation cell a dangerous intersection a chi-square a cross-cell interaction a row box

a cross-tabulation cell

test statistic

a mathematical formula derived from the sample data and null hypothesis ~ used to determine the p-value Ex: ~ t statistic ~ Chi-square statistic

scatter plot

a plot on which data is displayed as a collection of points, each having the value of two interval.ratio variables determining the position on the x-axis and y-axis helps to visualize the correlation between two variables

Correlation Analysis

a statistical analysis for identifying the direction and the strength of the linear association between two interval/ratio variables

Test of Differences

a statistical method for comparing the population mean (of an interval/ratio variable) between two segments (segmented based on a nominal/ordinal variable). • The underlying assumption is that consumers are heterogeneous and can be segmented based on some characteristics • This analysis is popular in marketing because we are interested in how different consumer segments behave differently or respond differently to the marketing mix, which has important implications for designing marketing mix to target some consumer segments. • It could be more than two segments. But in this course, we restrict our attention to the two-segment scenario.

Test of population mean

a statistical method for testing the value of the population mean based on the sample ~ conveys information about the central tendency of the population (average Jane/Joe) and is useful for devising marketing strategies

Cross tabulation

a statistical process of organizing 2 nominal/ordinal variables by groups or catergorties to determine their association in the sample

In the formula for calculating the standard deviation, the differences between each observation and the mean is squared. If we did not square these differences, the standard deviation would: be too small to be of any usefulness always be near zero not be normally distributed not be interpreted by z scores none of the above; the formula does not require that the differences be squared

always be near zero

2 minor issues researchers should pay attention to in the data cleaning stage

ambiguities inconsistencies

If Rxy<0, X and Y have negative correlation:

an increase in X is associated with a decrease Y, and a decrease in X is associated with an increase Y.

If Rxy>0, X and Y have positive correlation:

an increase in X is associated with an increase in Y, and a decrease in X is associated with a decrease in Y

The ________ is used to test the statistical significance of the observed association in cross-tabulation. contingency coefficient Cramer's V phi coefficient chi-square statistic t statistic

chi-square statistic

The chi-square test is performed by: comparing a metric variable with a categorical variable comparing one frequency table with one cross-tabulation table comparing the difference among categories of more than three variables comparing observed frequency with expected frequency comparing the pie chart with the stacked bar chart

comparing observed frequency with expected frequency

3rd step of hypothesis testing

compute the test statistic

corresponding joint frequency table =

contingency table

parameters

corresponding population values

Use the following table for the next 5 questions. Family Size and Ownership of a VCR by Household (Figures in millions of households) The preceding table is an example of one-way classification. cross tabulation. one-way tabulation. cross classification. none of the above.

cross tabulation

Which of the following emphasizes the division of the sample into subgroups so as to learn the association between two nominal or ordinal variables? longitudinal analysis coding cross-sectional analysis cross tabulation one-way tabulation

cross tabulation

Prior to analysis, the data from a survey is arranged into a(n) ________. information graph data graph information matrix data matrix data table

data matrix

The ________ is the variable customarily termed "Y" in the regression straight-line equation. null variable dependent variable controlled variable independent variable

dependent variable

Measures of variability are concerned with: central tendency depicting "typical" difference between the values in a set of values depicting the similarities between one data matrix and another all of the above none of the above

depicting "typical" difference between the values in a set of values

The KEY of Cross Tabulation is to ...

derive the joint frequency distribution of 2 variables

Computing the average number of dollars college students have on their credit card balances exemplifies: description inference comparing differences finding relationships type III error

description

gives us sample information, but not population information

descriptive data analysis

65% of males bought a snack when they rented a DVD and 40% of females bought a snack when they rented a DVD. This is an example where a researcher would: determine if there is a difference between the average of two samples determine if there is a difference between the average of two populations determine if there is a significant association between the two populations determine if there is a difference between males and females in terms of the types of snacks purchased determine if there is a difference between DVD renters and non-renters in terms of snacks

determine if there is a difference between the average of two populations

When making a comparison between two segments of respondents to determine whether or not they are statistically different, in concept, the researcher is considering the two segments as two: different populations different answers different tests common segments related segments

different populations

The last step involved in hypothesis testing is ________. reject or do not reject the null hypothesis draw a marketing research conclusion compare the probability with level of significance alpha (α)determine the probability associated with the test statistic under the null hypothesis none of the above is true

draw a marketing research conclusion

A toy store owner is interested in knowing, at the 5% level of significance, whether parents spend greater than $100 on toys per visit to her store or not. In this case, the owner has a null hypothesis that " parents spend exactly $100 on toys per visit to her store " and an alternative hypothesis "parents spend greater than $100 on toys per visit to her store." A sample is taken and calculate the t statistic as 1.7, which is smaller than the threshold value 1.96. Based on the statistical result, we should: fail to reject the null hypothesis. do not support the hypothesis. accept the alternative hypothesis. immediately take another sample. try to take another sample in another toy store.

fail to reject the null hypothesis.

American Express executives wish to know if there is an association between credit card balance carried and the number of credit cards owned. This is an example of: description inference finding associations finding causal relationships type II error

finding associations

Assume a college professor wanted to know if the number of hours studied by her students was related to students' test scores. She would use: description inference comparing differences finding associations type III error

finding associations

American Express executives wish to know if there is a difference between the average dollar balance carried on credit cards between males and females. This is an example of: description inference findings associations finding causal relationships transforming

findings associations

Thinking of a standard deviation and the shape of the distribution, the distribution is "stretched out at both ends" (dispersed) when the standard deviation is: high moderate low very high none of the above; the two concepts are not related

high

Counting noses

how many? ~ frequency and percentage

contingent table is useful for ...

identifying associations in the sample

The chi-square test is useful for determining: if an association exists between three nominal or ordinal variables if an association exists between two interval or ratio variables if an association exists between two nominal or ordinal variables if a non-linear relationship exists between two variables if there is a linear relationship between two nominal or ordinal variables.

if an association exists between two nominal or ordinal variables

Tommy Prothro, a marketing manager for Golden Snack Bars, has commissioned marketing research to determine if one recipe of snack bar is superior to another recipe. More than 400 persons who were "snack bar eaters" were involved in taste tests and, after tasting both recipes, they were asked which recipe they would purchase the next time they purchased snack bars. Tommy is now looking at the data and he sees that recipe A had 53 percent stating a preference, whereas recipe B had 47 percent. Tommy's brand manager felt this was "significant" evidence that the firm should produce recipe A. But Tommy wanted more evidence so he asked the research firm to run a test to determine if there was a significant difference between the two recipes. By doing this, Tommy would get information that would allow him to determine: if there are real differences between the two recipe preferences in the population. if the differences between the recipes are really 6 percent or more. the number of consumers in each target market preferring recipe A versus B. whether or not the statisticians in the research firm agree with his brand manager. None of the above; there is no statistical test to determine significant differences between two percentages.

if there are real differences between the two recipe preferences in the population.

data inconsistency

incompatible answers to different questions

The ________ variable is customarily termed x in the regression formula. null dependent independent controlled

independent

Suppose that you are interested in know if the customer satisfaction of a restaurant is greater than 4 on a 5-pt scale. You should use: description inference finding causal relationship finding associations type III error

inference

to perform correlation analysis, both variables must be ___/___ data

interval/ratio data EX: o Height and weight. o Income and willingness to pay (7-pts scale) o Sales and customer satisfaction (5-pts scale)

rationale

is to determine the likelihood of selecting a sample in hand, if the null hypothesis is true

higher variation =

larger range

If we adopt a 95 percent level of confidence, we need a P value to be significant if it is: less than 0.01 less than or equal to 0.05 greater than 0.05 greater than or equal to 0.05 0.90 or greater.

less than or equal to 0.05

If we adopt a 95 percent level of confidence, we need a p-value, to be significant (i.e., flag is waving) if it is: less than or equal to 0.05 less than 0.01greater than 0.05 greater than or equal to 0.05 0.90 or greater

less than or equal to 0.05

correlation analysis can only capture ___ ___

linear relationship

last step of hypothesis testing

make decisions/conclusion

The value obtained by summing all elements in a set and dividing by the number of elements is the ________. mean median mode range standard deviation

mean

curvilinear relationship

means that some smooth pattern describes the relationship

linear relationship

means the two variables have a "straight-line" relationship

variation

measures how dispersed or spread out the sample data is

Sample Covariance (Sxy)

measures the direction of the linear relationship between two variables (X and Y)

Sample Standard Deviation (Sx x Sy)

measures the variation of the variable (X or Y) ***always POSITIVE***

2 major issues in the Data Cleaning stage

missing values unreliable respondents

The ________ is the value that occurs most frequently. mean median mode range standard deviation

mode

Degrees of Freedom =

n-1 (n is the sample size)

what is the degree of freedom when "n" is the sample size?

n-2

When a computed t value (for a test of population mean with the alternative hypothesis as "the population mean is greater than the hypothesized value"), say 4.21, is larger than t0.95(n-1) = 1.65, then this amounts to: support for the null hypothesis; the population mean and the hypothesized mean is no support for the null hypothesis; the two percentages are NOT different support for the null hypothesis, the two percentages are NOT different no support for the null hypothesis; the two percentages are different None of the above; a z value is inappropriate for testing the differences between two averages.

no support for the null hypothesis; the two percentages are different

When a computed t value (for a test for differences between two percentages), 4.21, is larger than the standard t value, 1.96, then this amounts to: support for the null hypothesis; the two percentages are different. no support for the null hypothesis; the two percentages are not different. support for the null hypothesis, the two percentages are not different. no support for the null hypothesis; the two percentages are different. None of the above; a z value is inappropriate for testing the differences between two percentages.

no support for the null hypothesis; the two percentages are different.

What is the degree of freedom associated with the above test of Chi-square analysis? Hint: degree of freedom = (m-1) * (n-1) = (2-1) * (2-1) = 1 where m is the number of rows and n is the number of columns. 9 6 199 198 none of the above.

none of the above.

Which of the following states that the difference between the population parameters between two groups is zero? null parameter null hypothesis alternative hypothesis null alternative hypothesis zero hypothesis

null hypothesis

3 types of descriptive data analysis

o Counting noses (frequency, percentile) o Central tendency (mean, mode, median) o Variation (range, standard deviation)

cross tabulation is the most common data analysis in market research because ...

o easy to conduct/understand o gain better insights concerning complex market phenomenon o very useful for marketing segmentation and targeting

Chi-Squares is always _____ - tailed in nature (no subscript in the threshold value)

one

The alternative hypothesis: the percentage of users who use the Internet for shopping is greater than .40, is a ________. correlation analysis two-tailed test test of difference one-tailed test none of the above is true

one-tailed test

When a researcher is determining if the difference between two segments' parameters are statistically significant, he or she is considering the two segments as two separate populations and the question is whether or not the two different populations' ________________. z scores are the same t scores are the same parameters are different associations are different summarization values are the same

parameters are different

For purposes of comparison, we can convert the frequency by dividing the frequency of each value by the total number of observations, which results in the: mean variable count percentage standard deviation

percentage

Goal of Predictive modeling

predict target values in other data where we have predictor values, but not target values

correlation coefficient (Pearson's R)

rxy the statistical measure of the correlation between two variables (X and Y)

Statisitcs

sample values

2nd step of hypothesis testing

set the criterion for a decision (then collect data)

Cross tabulation works best for ...

small number of categories

If you have a question that has an interval or ratio data, which of the following should be used to report the variability? frequency distribution cumulative percentage distribution percentage distribution and range standard deviation and range accumulative percentage standard deviation

standard deviation and range

In the step-by-step approach to the presentation of hypothesis tests, the first step is: state the hypothesis by performing appropriate hypothesis test computations, state if the hypothesis is supported or not supported test the hypothesis only on variables that are metric if the hypothesis is not supported, compute confidence intervals to provide the client with the appropriate confidence intervals none of the above; there is no step-by-step approach to the presentation of the hypothesis test.

state the hypothesis

1st step of hypothesis testing

state the hypothesis (one null, one alternative)

Researchers keep in mind that the independence assumption stipulates that the independent variables must be ________ before running a multiple regression. statistically independent and uncorrelated with one another statistically dependent and uncorrelated with one another statistically independent and correlated with one another statistically dependent and correlated with one another

statistically independent and uncorrelated with one another

Low multiple R-squared and adjusted R -squared values signal the ________. regression plane contains multiple errors regression analysis should be rerun straight-line model does not apply well straight-line model applies well

straight-line model does not apply well

-1 Rxy shows

strong negative strength of correlation between X and Y

+1 Rxy shows

strong positive strength of correlation between X and Y

An analyst is interested in testing the hypothesis H0 : μ = 15,000 ; H1 : μ < 15,000. The data consist of 20 observations of an interval variable. The correct statistical procedure is the: z-test t-test. Chi-square test. analysis of variance. goodness-of-fit.

t-test

If Level of significance (α) is mentioned to be 5% and suppose n=20, the appropriate tcrit value to compare against is:

t1-.05(20-1) = t.95(19) [For One Tailed Test] t1-.05/2(20-1) = t1-.025(19) = t1-.025(19) = t.975(19) [For Two Tailed Test]

one-tailed test (degrees of freedom)

tcrit value = t1-α (Degrees of Freedom)

two-tailed test (degrees of freedom)

tcrit value = t1-α/2 (Degrees of Freedom)

Nominal/Ordinal + Interval/Ratio

test of differences

Chi-square statistic measures ...

the "distance" between the sample and the null hypothesis

standard deviation

the average deviation from the sample mean

If rxy=0, X and Y have zero correlation:

the change in X is independent of the change in Y.

range

the difference between the max. value and the min. value (maximum value) - (minimum value)

A statistically significant test of population mean means: there is practical significance. the differences between the population mean and the hypothesized value would remain in a large number of trials if we repeated the survey over many times. the p values are very large. the z values are very small. the x values are average.

the differences between the population mean and the hypothesized value would remain in a large number of trials if we repeated the survey over many times.

Rxy (correlation coefficient) measures ...

the direction of the correlation between X and Y

monotonic relationship

the general direction of a relationship between two variables is known EX: - increasing relationship - decreasing relationship

p-value

the likelihood of obtaining the sample parameter if the null hypothesis is true comparable to "the number of jury members who believe the defendant is innocent."

40% women video renters buy snacks; 65% male video renters buy snacks. A computed t statistic is 4.5. Given that the threshold t value is 1.96, this means: nothing; z does not determine anything the null hypothesis is not supported; there is a true difference between the two percentages the null hypothesis is supported; there is a true difference between the two percentages the alternative hypothesis is not supported; there is a true difference between the two percentages the null alternative is supported; there is a true difference between the two percentages

the null hypothesis is not supported; there is a true difference between the two percentages

40% women video renters buy snacks; 65% male video renters buy snacks. A computed t statistic is 4.5. Given that the threshold t value is 1.96, this means: nothing; z does not determine anything the null hypothesis is not supported; there is a true difference between the two percentages the null hypothesis is supported; there is a true difference between the two percentages the alternative hypothesis is not supported; there is a true difference between the two percentages the null alternative is supported; there is a true difference between the two percentages

the null hypothesis is not supported; there is a true difference between the two percentages

frequency

the number of observations in each category of the variable ~ mainly used for nominal and ordinal data ~ preferred graphic illustration = bar graph

alpha (significance level)

the probability of making the wrong decision when the null hypothesis is true (α).

percentage

the proportion of observations in each category of the variable ~ mainly used for nominal and ordinal data ~ frequency / sample size ~ preferred graphic illustration = pie graph

statisitcal inference

the set of procedures used in which sample statistics are used to estimate population parameters

The size of Rxy measure ...

the strength of the correlation between X and Y

The functions of data analysis "match up" with: the types of problems the types of research objectives the types of type I errors the types of type II errors the types of type III errors

the types of research objectives

The BEST way to handle missing values when analyzing the data is to: leave the response blank discard the record or questionnaire with missing values substitute the sample mean for the missing value both A and B are correct there is no single best way for handling missing values; rather, their treatment depends on the purpose of the study, the incidence of missing values, and the methods that will be used to analyze the data

there is no single best way for handling missing values; rather, their treatment depends on the purpose of the study, the incidence of missing values, and the methods that will be used to analyze the data

The logic of the chi-square test would argue that, for a significant relationship to exist: there should be large differences between the observed and expected frequency there should be few differences between the observed and expected frequency there should be no differences between the observed and expected frequency there should be negative differences between the observed and expected frequency there should be only one difference between the observed and expected frequency

there should be large differences between the observed and expected frequency

What's the last step in hypothesis testing?

to compare the probability with the level of significance alpha (α)

objective of hypothesis testing

to infer population info from the sample. In this course, the population information of interest could be • the population mean of a variable (L11-2) • the association between two variables (L12-1, L12-2, L12-3) • The causal relationship between two variables (L13)

Nonmonotonic relationship

two variables are associated, but only in a very general sense. The presence (or absence) of one variable is associated with the presence (or absence) of another.

Hypothesis Test of Correlation

used to test whether there exists correlation between two variables in the population

When is a two-tailed test used in the hypothesis test of correlation?

used when we only want to know whether the correlation between two variables is different from zero or not H0: Rxy = 0 H1: Rxy ≠ 0 Then, compare the absolute value of the t statistic with the threshold value t0.975(n-2) as and n-2 is the degree of freedom where "n" is the sample size If |t| > t0.975(n-2) then we REJECT the Null and ACCEPT alternative If |t| < t0.975(n-2) then we FAIL to REJECT the Null and REJECT the alternative

When is a one-tailed test used in the hypothesis test of correlation?

used when we only want to know whether the correlation between two variables is greater/less than zero or not H0: Rxy = 0 H1: Rxy > 0 or Rxy < 0 When H1: Rxy > 0 we compare the t statistic with t0.95(n-2) ~ if t > t0.95(n-2) then we reject the null and accept the alternative When H1: Rxy < 0 t statistic must be smaller than 0. And then we compare the negative t statistic with t0.95(n-2) ~ If t > t0.95(n-2) then we reject the null and accept the alternative ~ If t < t).95(n-2) then we fail to reject the null and reject the alternative

two-tailed test

used when we only want to know whether the population mean is unequal to a certain value (H0: = c; H𝝁 1: ≠c) EX: H0: Americans spend on average 5 hours on Internet (𝝁=5) H1: The amount of time spent on Internet is unequal to 5 (𝝁≠5) We compare the absolute value of the t statistic with t0.975(n-1) (t-critical value) as the threshold value where n is the sample size o If |t| > t0.975(n-1) then we reject the null and accept the alternative. o If |t| < t0.975(n-1) then we fail to reject the null and reject the alternative.

Test of Differences: two-tailed test

used when we only want to know whether two segments have different population means or not (H0: ; H1:) • We compare the absolute value of the t statistic with t0.975(+-2) as the threshold/critical-value and +-2 is the degree of freedom where is the sample size of segment 1 and is the sample size of segment 2. o If |t| > t0.975(+-2) then we reject the null and accept the alternative. o If |t| < t0.975(+-2) then we fail to reject the null and reject the alternative.

one-tailed test

used when we want to know if the population mean is either greater or smaller than a certain value (H0: = c ;𝝁H1: > c) or (H𝝁 0: = c ; H𝝁 1: < c) •When H0: > c, We compare the t statistic with t𝝁 0.95(n-1) (t-critical value). o If t > t0.95(n-1) then we reject the null and accept the alternative. o If t < t0.95(n-1) then we fail to reject the null and reject the alternative •When H0: < c, t statistic must be smaller than 0. And then we𝝁compare the negative t-statistic with t0.95(n-1) (t-critical value). o If -t > t0.95(n-1) then we reject the null and accept the alternative. o If -t < t0.95(n-1) then we fail to reject the null and reject the alternative.

Test of Differences: one-tailed test

used when we want to know if the population mean of one segment is greater/smaller than that of the other segment (H0: ; H1:) or (H1:).•When H1: , We compare the t statistic with t0.95(+-2). o If t > t0.95(+-2) then we reject the null and accept the alternative. o If t < t0.95(+-2) then we fail to reject the null and reject the alternative.•When H1: , t statistic must be smaller than 0. And then we compare the negative t-statistic with t0.95(+-2). o If -t > t0.95(+-2) then we reject the null and accept the alternative. o If -t < t0.95(+-2) then we fail to reject the null and reject the alternative

Data ambiguity

vague responses to open-ended questions

What can be inferred from test of population mean

we infer where the population mean is and answer questions about what value the population mean is likely to be given the sample data

test statistic for hypothesis test of correlation

where Rxy is the correlation coefficient in the sample; "n" is the sample size

chi-square needs to be used when ...

you want to draw inference from the sample to the population

Three Association Analyses

~ Variable A / Variable B ~ Nominal / Ordinal ~Interval/Ratio

level of significance (alpha)

~ refers to a criterion of judgment upon which a decision (of "rejecting the null and accepting the alternative" or "failing to reject the null and rejecting the alternative") is made regarding the value stated in a null hypothesis in Step 1 • α is usually set at 5% ~ comparable to "the defendant is convicted if all jurors believe that the defendant is guilty, after presented with evidence."

standard error

~ the measure of variably in the sampling distribution ~ while standard deviation describes the variability of individual data points from the mean, SE measures the variability of the entire distribution

(Predictive Model) predictive value of dependent variable is

Ŷ = α + b1X1 + b2X2 + .......+bkXk where , ,...., are the estimated coefficients and X1i, X2i....., Xki are the value of predictor variables for our observations.

Linear Regression Assumptions

• The choice of predictors and their form is correct (linearity) • The observations are independent of one another • The variability of outcome variable (given a set of predictors)is same regardless of the values of predictors (homoskedasticity) • The noise ε follows a normal distribution


Related study sets

Needed are Hard & How to Overcome Them

View Set

General insurance practice questions

View Set

Chapter 11 Legal Environment of Business

View Set

Chapter 22 Test Review-Civil War-Davis

View Set

Dosage Calculations Assignment Quiz

View Set

Insurance Policies Common to All Lines

View Set

Exam 1 Botany 201 Chapter 1, 4, 5

View Set

OS Installation & Upgrade Methods core 2

View Set