MAR5626 Test 2
Varimax rotation
A Varimax rotation maintains a 90 degree angle between all factors, which means the factors are unrelated to each other (r = 0) this rotation of exploratory factor analysis maximizes the variance explained by each factor.
Significance Level
A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true. The acceptable level of Type I error(e.g., α=.05)
Contingency Table
A data matrix that displays the frequency of some combination of responses to multiple variables.
factor analysis
A factor loading is the correlation between a variable and any factor Factor analysis seeks parsimony Factors are usually latent constructs like personality, intelligence, attitude, or perceived value
Goodness-of-Fit (GOF)
A general term representing how well some computed table or matrix of values matches some population or predetermined table or matrix of the same size.
Histogram
A graphical way of showing a frequency distribution in which the height of a bar corresponds to the observed frequency of the category.
t-test
A hypothesis test that uses the t-distribution. A univariate t-test is appropriate when the variable being analyzed is interval or ratio.
Factor Rotation
A mathematical way of further simplifying factor results that involves new reference axes for a set of variables The most common type of factor rotation is a process called Varimax
Statistical Power
A measure of how much ability exists to find a significant effect using a specific statistical tool. Increases as sample size increases Mathematically, it is a direct function of Type II error rate: Power = 1 - β Statistical power basically refers to our ability to identify a significant deviation from what the null hypothesis states when there actually is one.
Basic of Mediation
A mediator variable fits between an independent variable and a dependent variable and serves to facilitate a relationship between an independent variable and a dependent variable Marketing researchers often think of price effects as involving mediation in some way: -For instance, price promotions can increase sales -However, price promotions may increase sales only when the promotion creates a perception of increased value -Thus, perceived value mediates the effects of price promotions on sales
What is cluster analysis
A multivariate approach for identifying objects or individuals that are similar to one another in some respect Classifies individuals or objects into a small number of mutually exclusive and exhaustive groups Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity
uses of factor analysis
Data reduction exploratory factory analysis confirmatory factor analysis
In SPSS, Factor Analysis menu can be found in
Dimension Reduction
Median Split
Dividing a data set into two categories by placing respondents below the median in one category and respondents above the median in another. The approach is best applied only when the data do indeed exhibit bimodal characteristics.
Exploratory factor analysis (EFA)
EFA places emphasis on the creation of composite factor scores, which are numbers that represent an individual respondent's score on each individual latent factor; they are determined as a function of the variables in a factor analysis
Hypotheses about differences between groups
Examine how some variable varies from one group to another.
A researcher who is uncertain about the number of factors and which variables belong to each factor might pursue
Exploratory Factor Analysis
two interdependence techniques
Factor Analysis Cluster Analysis
Which measure indicates how strongly a measured variable is correlated with a factor?
Factor Loading
Factor Loadings
Factor analysis techniques produce loading estimates as regression produces regression coefficient estimates A factor loading indicates how strongly correlated a factor is with a measured variable EFA depends on the loadings for proper interpretation
A Type I error occurs when the researcher fails to reject the null hypothesis when the alternative hypothesis is true.
False
Direct Effect example
From slides (memorize)
Box and Whisker Plots
Graphic representations of central tendencies, percentiles, variabilities, and the shapes of frequency distributions.
higher p value means:
Higher p-values equal more support for an hypothesis.
Univariate Statistical Analysis
Tests of hypotheses involving only one variable. Testing of statistical significance
Bivariate Statistical Analysis
Tests of hypotheses involving two variables.
Which condition is created when the direct relationship between X and Y remains significant but is reduced in the presence of a mediator?
partial mediation
One-tailed Test
Appropriate when a research hypothesis implies that an observed mean can only be greater than or less than a hypothesized value. Only one of the "tails" of the bell-shaped normal curve is relevant. A one-tailed test can be determined from a two-tailed test result by taking half of the observed p-value. When there is any doubt about whether a one- or two-tailed test is appropriate, opt for the less conservative two-tailed test.
Nonparametric Statistics
Appropriate when the variables being analyzed do not conform to any known or continuous distribution.
Hypotheses about differences from some standard
-Examine how some variable differs from some preconceived standard. -Typify univariatestatistical tests.
Are alternative approaches available to assess the statistical significance of a mediated effect?
-One alternative for dealing with the potential for correlated residuals involves more complicated linear modeling approaches -Bootstrapping involves taking the available data and using sampling with replacement, generating many, many samples (typically 500, 1,000, or 2,000) and estimating the parameters in each of those samples
If your sample proportion is 75% out of 150 samples, what is your calculated standard error of proportion (i.e. Sp)?
.035
Frequency Table
A table showing the different ways respondents answered a question.
Outlier
A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.
Cross-Tabulation
Addresses research questions involving relationships among multiple less-than interval variables. Results in a combined frequency table displaying one variable in rows and another variable in columns.
Which of the following information is sufficient for you to run a hypothesis testing?
Alpha and p-value
Type II Error
An error caused by failing to reject the null hypothesis when the alternative hypothesis is true. Has a probability of beta (β). Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.
Type I Error
An error caused by rejecting the null hypothesis when it is true. Has a probability of alpha (α). a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist.
Confirmatory factor analysis (CFA)
CFA is the best single tool for assessing construct validity
_____________________ is a measure of the percentage of a variable's variation that is explained by the factors.
Communality
Communality
Communality is a measure of the percentage of a variable's variation that is explained by the factors A relatively high communality indicates that a variable has much in common with the other variables taken as a group Communality for any variable is equal to the sum of the squared loadings for that variable across all factors extracted These values are shown on factor analysis printouts
Average Variance Explained (Extracted)
If each loading is squared, the result represents how much variance that particular factor and a particular variable have in common If these summed factor loadings are averaged, we can estimate how much variance a factor has in common with the entire set of variables
Interdependence Techniques
Interdependence techniques make no distinction between independent and dependent variables and seek to identify the underlying structure of a set of data
A z-test or t-test requires a ____________ variable.
Interval or ratio
parametric statistics
Involve numbers with known, continuous distributions. Appropriate when: -Data are interval or ratio scaled. -Sample size is large.
What are the implications of the potential bias for the regression results?
Like with multicollinearity and heteroscedasticity, the t-tests of the parameter coefficients become unreliable as the bias becomes substantial
If your X-bar is ___________ the confidence interval you calculated, you __________ the null hypothesis.
Outside Reject
Statistical analysis always have a:
P value
Level of Scale Measurement
Parametric nonparametric
Which statistical technique identifies components based on all three sources of variance?
Principal components analysis
p-value
Probability value, or the observed or computed significance level. p-values are compared to significance levels to test hypotheses.
Price Index
Represent simple data transformations that allow researchers to track a variable's value over time and compare a variable(s) with other variables. Recalibration allows scores or observations to be related to a certain base period or base number.
Two-tailed Test
Tests for differences from the population mean that are either greater or less. Extreme values of the normal curve (or tails) on both the right and the left are considered. When a research question does not specify whether a difference should be greater than or less than, a two-tailed test is most appropriate.
Chi-square (χ2) test
Tests for statistical significance. Is particularly appropriate for testing hypotheses about frequencies arranged in a frequency or contingency table.
Marginals
Row and column totals in a contingency table, which are shown in its margins.
Index Numbers
Scores or observations recalibrated to indicate how they relate to a base number.
Conducting a Factor Analysis
Select variables for the analysis Identify an initial solution Decide on the number of factors to be included in the analysis Run the analysis again and include rotation Interpret the factors from the rotated solution
Characteristics of a good hypothesis
Shorter Specific Meaningful Questionable, not certain State something, not thing Should invite a comparison with data
alternative hypothesis
Statement that indicates the opposite of the null hypothesis
Multivariate Statistical Analysis
Statistical analysis involving three or more variables or sets of variables.
Percentage Cross-Tabulations
Statistical base - the number of respondents or observations (in a row or column) used as a basis for computing percentages.
Descriptive Analysis
The elementary transformation of raw data in a way that describes the basic characteristics such as central tendency, distribution, and variability.
Tabulation
The orderly arrangement of data in a table or other summary format showing the number of responses to each response category. Tallyingis the term when the process is done by hand.
Rank Order
The transformation involves multiplying the frequency by the ranking score for each choice resulting in a new scale. Ranking data can be summarized by performing a data transformation
null hypothesis and alternative hypothesis are mutually exclusive? T/F
True
relational hypothesis have a linear relationship T/F
True
Eigenvalues
a measure of how much variance is explained by each factor The most common rule is to base the number of factors with the eigenvalue greater than 1.0
Interquartile Range
a measure of variability
Hypothesis
a specific prediction that can be tested, about 2 or more variables
error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.
Type II error
How to choose the appropriate statistical technique
Type of question to be answered Number of variables involved Level of scale measurement
Hypothesis
Unproven proposition: a supposition that tentatively explains certain facts or phenomena. An assumption about nature of the world.
The most common type of factor rotation method is
Varimax
Moderator variable
a third variable that changes the nature of a relationship between the original independent and dependent variables.
K-means
a widely used tool for assigning observations to groups K-means requires that the user input the number of groups as part of the analysis Then, K-means iterates by shifting observations between clusters until some minimal distance criterion is reached (usually determined by software)
categorical variables
nominal ordinal Use frequency table such as cross tabulation
Elaboration analysis
an analysis of the basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample
direct effects hypothesis
assesses the relationship between an independent variable and a dependent variable where the hypothesis suggests a relationship that depends on no other variable most basic type of hypothesis in descriptive or casual research designs
Which of the following is the appropriate technique for addressing research questions involving relationships among multiple variables that are measured with a less-than interval scale
chi square or cross tab
The arrangement of respondents into groups calls for
cluster analysis
The transformation of raw data into a form that makes the data easier to understand and to interpret is called
descriptive analysis
Factor analysis identifies the number of factors using
eigenvalues
Types of Hypotheses
null hypothesis and alternative hypothesis
P < alpha
reject Ho
relational hypothesis
examine how changes in one variable vary with changes in another
The analysis of variables into dimensions calls for
factor analysis
An arrangement of data that shows the number of times each category occurs for a single categorial variable is called a(n)
frequency
If you want to identify the membership information of the observations in your sample for multiple cluster solutions (e.g., three clusters, four clusters, and five clusters), you are likely to use ______________ cluster analysis.
hierarchical
n cluster analysis, the researcher wants clusters to have high ____________ within-clusters and high between-cluster _________________.
homogeneity heterogeneity
Scores or observations recalibrated to indicate how they relate to a base number are best referred to as
index numbers
continuous variables
interval ratio use mean and standard deviation continuous is also known as scale in SPSS
P > alpha
reject Ho
null hypothesis
statement about the status quo no difference in sample and population
A well-stated hypothesis is NOT ____.
succinct tautological meaningful questionable tautological
Degrees of freedom
the number of observations minus the number of constraints or assumptions needed to calculate a statistical term d.f for t test: n-1
As data reduction techniques, factor analysis seeks parsimony in the _______________ while cluster analysis does in the _______________.
variables observations