BSAD 3500 Final Exam

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Advantages and disadvantages of secondary data.

-Advantages: time saving, inexpensive -Disadvantages: out of date, definitions/categories not right, not specific enough

Be able to calculate sampling error for proportions and means and construct confidence intervals.

1. 90%=1.65 95%=1.96 99%=2.57 2. sampling error for proportions= z*sqrt(p(1-p)/n 3. z=s/sqrt n

What is cross-tabulation? What is Cramer's V?

1. A multivariate technique used for studying the relationship between two or more categorical variables 2. cross tabs consider the joint distribution of sample elements across variables 3. it is the most used multivariate data analysis technique in applied marketing research 4. Cramer's V is a statistic used to measure the strength of relationship between categorical variables; equivalent to R is regression

How could one check for response set bias across a range of questions (not in the book)?

Additional details about whether questionnaires were removed

Multivariate Chi-Square Test

Associated with Cross Tabulation• Independent Variable = Categorical (Nominal or ordinal) •Dependent Variable = Categorical (Nominal or ordinal)• Measure of the strength of association = Cramer's V•Example: For a food study, did women/men (coded as 1 and 2) see approximately equal frequency of the ice cream and yogurt stimuli (coded as 1 and 2)

Be able to identify a sampling frame.

the list of population elements from which a sample will be drawn; the list could consist of geographic areas, institutions, individuals, or other units

Be able to interpret p-value relative to alpha (significance level).

when p-value is less than .05, it is statistically significant

6: What is logistic regression?

•Single or multiple continuous or dummy coded categorical (0 and 1) independent variables and a single dummy coded dichotomous categorical dependent variable Example: Examine the relationship between the amount of time spent browsing on Amazon, Gender (dummy coded as 0= male and 1= female), and whether one purchased or did not purchase during the session (dummy coded as 0= did not purchase and 1= purchased)

Differentiate type I and type II errors.

Type I error means rejecting the null hypothesis when it's actually true, while a Type II error means failing to reject the null hypothesis when it's actually false.

What is frequency analysis?

a count of the number of cases that fall into each of the possible response categories

What is a standard deviation?

a measure of the variation of responses on a variable. The standard deviation is the square root of the calculated variance on a variable

Differentiate the different mutating join types.https://dplyr.tidyverse.org/reference/join.html

1. inner join: all rows in x and y 2. left join: includes all rows in x 3. right join: all rows in y 4. full join: all rows in x or y 5. semi join: return all rows from x with a match in y 6. anti join: return all rows from x without a match in y

What do the terms continuous and categorical measures refer to? How do they relate to NOIR.

1. nominal and ordinal measures are referred to as categorical measures 2. interval and ratio measures are referred to as continuous measures

3: Be familiar with the six true/false questions in this blog.https://casetext.com/analysis/robust-misinterpretation-of-confidence-intervals-by-courts

(1) The probability that the true mean is greater than 0 is at least 95%. Correct Answer: False (2) The probability that the true mean equals 0 is smaller than 5%. Correct Answer: False (3) The "null hypothesis" that the true mean equals zero is likely to be incorrect. Correct Answer: False (4) There is a 95% probability that the true mean lies between 0.1 and 0.4. Correct Answer: False (5) We can be 95% confident that the true mean lies between 0.1 and 0.4. Correct Answer: False (6) If we were to repeat the experiment over and over, then 95% of the time the true mean would fall between 0.1 and 0.4. Correct Answer: False

Pew Research: What are some key points regarding the margin of error from the following report (Focus on sections 1, 2, 4, 5). What is the design effect?

1. Because surveys only talk to a sample of the population, we know that the result probably won't exactly match the "true" result that we would get if we interviewed everyone in the population. The margin of sampling error describes how close we can reasonably expect a survey result to fall relative to the true population value. 2. News reports about polling will often say that a candidate's lead is "outside the margin of error" to indicate that a candidate's lead is greater than what we would expect from sampling error, or that a race is "a statistical tie" if it's too close to call. It is not enough for one candidate to be ahead by more than the margin of error that is reported for individual candidates (i.e., ahead by more than 3 points, in our example). To determine whether or not the race is too close to call, we need to calculate a new margin of error for the difference between the two candidates' levels of support. 4. the reported margin of error for a poll applies to estimates that use the whole sample (e.g., all adults, all registered voters or all likely voters who were surveyed). But polls often report on subgroups, such as young people, white men or Hispanics. Because survey estimates on subgroups of the population have fewer cases, their margins of error are larger - in some cases much larger. 5. Without adjustment, polls tend to overrepresent people who are easier to reach and underrepresent those types of people who are harder to interview. In order to make their results more representative pollsters weight their data so that it matches the population - usually based on a number of demographic measures. Weighting is a crucial step for avoiding biased results, but it also has the effect of making the margin of error larger. Statisticians call this increase in variability the design effect.

Criteria for establishing causality (consistent variation, etc.).

1. Consistent Variation: evidence of the extent to which X and Y occur together or vary together in the way predicted by the hypothesis. 2. Time Order: evidence that shows X occurs before Y 3. Eliminations of Other Explanations: evidence that allows the elimination of factors other than X as the cause of Y

Differentiate exploratory, descriptive, and causal research.

1. Exploratory: discover ideas and insights 2. Descriptive: determine the frequency with which something occurs or the extent to which two variable covary. 3. Casual: used to establish cause-and-effect relationships between variables.

2: Differentiate the frequentist vs. Bayesian perspectives (See chapter 12 slide). Know the gist of how evidence is evaluated from a Bayesian perspective (Hint: See table 4 below).https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1167&context=jps

1. Frequentist perspective 🞂P-value < 0.05 🞂Some suggest moving p-value threshold from .05 to .003 🞂Results are either supported or not supported 2. Bayesian: 🞂Compute Bayes Factor 🞂Levels of support (from weak to very strong) 🞂Construct credible intervals 🞂Posterior probability

What is a common misunderstanding regarding the normal distribution? The link below is provided to supplement the class discussion.

1. Myth #1: Most data are normally distributed 2.Myth #2: The normal distribution is central to statistical theory 3. Myth #3: Normalizing data renders it normally-distributed

Be able to identify nominal, ordinal, interval, and ratio data.

1. Nominal: measurement in which numbers are assigned to objects or classes of objects solely for the purpose of identification 2. Ordinal: Measurement in which numbers are assigned to data on the basis of some order. Examples include brand preference, social class, hardness of minerals, graded quality of lumber 3. Interval: measurement in which the assigned numbers legitimately allow the comparison of the size of the differences among and between members. Examples typically involve temperature scale, grade point average, attitude toward brands, and movie ratings. EX: what is your overall opinion about football from unfavorable to favorable. 4. Ratio: measurement that has a natural, or absolute, zero and therefore allows the comparison of absolute magnitudes of the numbers. Typical examples include units sold, number of purchases, age, and income.

When should you use Pearson product-moment correlation analysis

1. Pearson product-moment correlation coefficient: a statistic that indicates the degree of linear association between two continuous variables. The correlation coefficient can range from -1 to1.

Differentiate sampling and non-sampling error.

1. Sampling error: the difference between results obtained from a sample and results that would have been obtained had info been gathered from or about every member of the population 2. Non-Sampling error: an error that occurs during data collection, causing the data to differ from the true values

When should a company conduct a simulated, controlled, or standard test market?

1. Standard Test Market: a test market in which the company sells the product through its normal distribution channels. 2. Controlled Test Market: an entire test program conducted by an outside service in a market in which it can guarantee distribution. 3. Simulated Test Market: A study in which consumer ratings and other information are fed into a computer model that then makes projections about the likely level of sales for the product in the market.

Differentiate probability and nonprobability samples and know the subtypes.

1. nonprobability samples involve personal judgments somewhere in the selection process: convenience, judgment, and quota 2. probability is a sample in which each target population element has a known, nonzero change of being included in the sample: simple random sample, systematic

Differentiate Likert, semantic differential, summated, itemized, and constant sum scales

1. Summated Rating: A self-report technique for attitude measurement in which respondents indicate their degree of agreement or disagreement with each of a number of statements. 2. Semantic Differential: A self-report technique for attitude measurement in which respondents are asked to check which cell between a set of bipolar adjectives or phrases best describes their feelings toward the object 3. Graphic Rating: A scale in which individuals indicate their ratings of an attribute typically by placing a check at the appropriate point on a line that runs from one extreme of the attribute to the other 4. Itemized: a scale on which individuals must indicate their ratings of an attribute or object by selecting the response category that best describes their position on the attribute or object 5. Likert scales: agree, disagree, etc.

What is the difference between unstandardized and standardized betas in regression?

1. Unstandardized: used for constructing a regression formula 2. Standardized: used for comparing the effect size of competing independent variables

What is a median split? How does one perform a media split?

1. a median split is a technique for converting a continuous measure into a categorical measure with two approximately equal sized groups. The groups are formed by splitting the continuous measure at its median value

What is the two-box technique?

1. a technique for converting an interval-level rating scale into a categorical measure for presentation purposes 2. the percentage of respondents choosing on of the top two positions on a rating scale is usually reported

What is an outlier? What is an easy-to-generate visual for detecting outliers?

1. an observation so different in magnitude from the rest of the observations that the analyst choose to treat it as a special case 2. histogram

Differentiate univariate and multivariate analysis.

1. analysis involving individual variables in univariate analysis 2. analysis involving multiple variable is multivariate analysis

What descriptive statistics are appropriate for categorical measures? Continuous measures?

1. categorical: nominal and ordinal measures. 2. continuous: interval and ratio

When should a chi-square goodness-of-fit test be used? How is it different from the Pearson chi-square test of independence?

1. chi-square goodness-of-fit test: a statistical test to determine whether some observed pattern of frequencies correspond to an expected pattern 2. Pearson chi-square (x^2) test of independence is a commonly used statistic for testing the null hypothesis that categorical variables are independent of one another

Differentiate the coefficient of multiple determination and the coefficient of determination.

1. coefficient of multiple determination (R^2): a measure representing the relative proportion of the total variation in the dependent variable that can be explained or accounted for by the fitted regression equation. When there is only one predictor variable, this value is referred to as the coefficient of determination 2. coefficient of determination: The coefficient of determination is a statistical measurement that examines how differences in one variable can be explained by the difference in a second variable, when predicting the outcome of a given event. In other words, this coefficient, which is more commonly known as R-squared (or R2), assesses how strong the linear relationship is between two variables, and is heavily relied on by researchers when conducting trend analysis.

When is it appropriate to calculate sampling error?

1. confidence intervals for the proportion 2. confidence intervals for means

What are descriptive statistics? In contrast, what are test statistics?

1. descriptive statistics: statistics that describe the distribution of responses on a variable

Differentiate one sample, paired sample, and independent sample t-tests.

1. independent sample t-test: Independent Variable = Nominal or ordinal two level (typically groups) variable Dependent variable = Single continuous measure (interval or ratio) 2. Paired Sample T-Test: Examine the difference between exactly two continuous measures (interval or ratio) that are measured according to the same number of scale points (e.g., 1-10) (e.g., pretest and posttest or ratings between two different variables)

Know about hypothesis testing. In industry research reports does one write out the null hypothesis? *Know Dr. Amos' suggested revisions to the null and alternate hypothesis definitions. What word does Dr. Amos indicate one should NEVER use when discussing hypothesis testing?

1. p-hacking and other data dredging tactics have become concerns due to the over-reliance on the p-value as a demarcating factor between "good" and "bad" results. This over-reliance on the p-value is a major contributing factor to the previously discussed "publication bias" phenomenon. P-values are also sensitive to sample size, making it easier for researchers to achieve statistical significance for something that is not practically meaningful. As implied by several student posts, the null hypothesis that there is no relationship/difference is rarely actually the case in most research. What matters more is "effect size." 2. The null hypothesis is almost always false is a nonsense argument. It is irrational to assume that statistically there is exactly zero relationship between variables. In research, we test the likelihood that a significant relationship is due to chance. Statistically speaking, even variables with no relationship will still show some level of correlation, even if only by chance. This argument is often coupled with the sample size sensitivity issue, which is a much more serious issue.

Differentiate stratified, cluster, and quota sampling.

1. quota: a non-probability sample chosen so that the proportion of sample elements with certain characteristics is about the same as the proportion of the elements with the characteristics in the target population 2. stratified: Sample in which (1) the population is divided into mutually exclusive and exhaustive subsets and (2) a simple random sample of elements is chosen independently from each group/subset 3. cluster: the population is divided into mutually exclusive and exhaustive subsets

What three factors are needed to calculate sample size?

1. z-score 2. standard deviation 3. x

Differentiate simple regression and multiple regression.

1. •Simple Regression •Independent Variable = Single continuous measure (interval or ratio) •Dependent Variable = Single continuous measure (interval or ratio) •Measure of Strength of Association (R^2 = Coefficient of Determination) •Example: Examine the relationship between the amount of time spent browsing on Amazon per session (ratio independent) and the amount of money spent per browsing session (ratio dependent variable) 2. Multiple Regression •Independent Variable = Two or more continuous measures (interval or ratio) •Dependent Variable = Single continuous measure (interval or ratio) •Measure of Association (R^2 = Coefficient of Multiple Determination) •Example: Examine the relationship between the amount of time spent browsing on Amazon (ratio) per session, number of visits to the site per week (ratio), and the amount of money spent per browsing session (dependent variable)

Differentiate within-subjects and between-subjects ANOVAs.

1: Between-Subjects ANOVA: •Independent Variable = Nominal or ordinal variable with two or more levels (typically groups) •Dependent variable = Single continuous measure (interval or ratio) •Example: Compare the 1-7 healthiness ratings (dependent) for three brand names: The Clean Truth, Happy Karma, Yum (independent coded as 1, 2, & 3) 2. Within-Subjects (a.k.a. Repeated Measures) ANOVA: •Examine the difference between two or more continuous measures (interval or ratio) that are measured according to the same number of scale points (e.g., 1-10) •Test the heart rate of marathon runners pre-race, mid-race, and post-race •Example: For a food study, we could compare the meaning rating between the organic, natural, and healthy ratings to test whether they are significantly different

regression analysis

3. regression: a statistical technique used to derive an equation representing the influence of a single (simple regression) or multiple (multiple regression) independent variables on a continuous dependent, or outcome, variable

ANOVA

Between-Subjects ANOVA• Independent Variable = Nominal or ordinal variable with two or more levels (typically groups) • Dependent variable = Single continuous measure (interval or ratio)• Example: Compare the 1-7 healthiness ratings (dependent) for three brand names: The Clean Truth, Happy Karma, Yum (independent coded as 1, 2, & 3)• Within-Subjects (a.k.a. Repeated Measures) ANOVA• Examine the difference between two or more continuous measures (interval or ratio) that are measured according to the same number of scale points (e.g., 1-10)• Test the heart rate of marathon runners pre-race, mid-race, and post-race• Example: For a food study, we could compare the meaning rating between the organic, natural, and healthy ratings to test whether they are significantly different

4: What is cluster analysis? How is it useful for marketing?

Cluster analysis is a set of data reduction techniques which are designed to group similar observations in a dataset, such that observations in the same group are as similar to each other as possible, and similarly, observations in different groups are as different to each other as possible.

5: What is conjoint analysis?

Conjoint analysis is the optimal market research approach for measuring the value that consumers place on features of a product or service.

advanced topic 1: what is effect size?

Effect size is a quantitative measure of the magnitude of the experimental effect. The larger the effect size the stronger the relationship between two variables.


Set pelajaran terkait

F.E.M.A. - Community Preparedness - IS-909

View Set

Sociology Chapter 13: aging and the elderly

View Set

Environmental Science - Chapter 10

View Set

ITS 121 Quiz 2 Study Guide for Chapter 5

View Set