PSYC307 Test 2

Ace your homework & exams now with Quizwiz!

What are the assumptions of Regression?

(1) Predictor variables must be at interval level (at least) or categorical (with two categories, i.e., male/ female, yes/no). (2) Outcome/ dependent variable must be at least interval level of measurement, continuous, and unbounded. (3) Predictor variables should have some variation in value (i.e., they do not have variances of 0). (4) There should be no multicollinearity. (5) Shouldn't be any external variables that influence the outcome variable. (6) Homoscedasticity. (7) Normally-distributed errors. (8) Linear relationships between variables. (9) Independent errors. (10) Need to have a reasonable sample size.

What is attitude?

"A general and enduring positive or negative feeling about some person, object or issue" (Worchel, Cooper, and Goethals, 1991).

What curve does the Absolute Threshold generate?

"S" shaped curve. The observer sometimes detects the stimulus at lower or higher intensities than at other times. The ogive (s-shaped) function is like a cumulative normal distribution - very low intensities are rarely detected and the probability of detection increases with increases in stimulus intensity. The absolute threshold is defined as the 50th percentile - point at which stimulus is detected half of the time.

What does the term 'unbounded' mean?

"Unconstrained" in terms of variability. If the measure could vary between 1 and 10, but the data you collected only vary between 3 and 7, they are constrained.

What are the three classical methods of measuring absolute threshold?

(1) Method of Limits; (2) Method of adjustment, and (3) Method of constant stimuli.

What is Multidimensional scaling?

Another alternative to Factor Analysis and another method of exploring structure. Representing data spatially by plotting variables in n-dimensional space. Distance between the points represents the similarity of the variables (highly correlated variables will be positioned close together). Looking for patterns in the associations (e.g., do the variables occupy particular areas of the space? or are they arranged in a line? etc). Very few assumptions. But interpretation pretty arbitrary.

What are alternatives to Factor Analysis?

Another approach to locating underlying dimensions of a data set = principle component analysis. This analysis is more similar to discriminant analysis. Another alternative is cluster analysis. Basic premise - variables can be grouped into discrete clusters. We do not expect the clusters to range from high to low, we just understand them as descriptive categories. Can use with data that don't conform to parametric test assumptions (e.g., ordinal data).

What is Temporal Stability?

Another aspect of reliability is how stable the scores remain over time (from one occasion to another). Called test-retest reliability (questionnaire given to a group of participants on two separate occasions; scores on the two questionnaires correlated; it test-retest scores are different - may be due to changes in the construct, extraneous factors to with the participants, measurement method, or non-reliability of the questionnaire.

What are the advantages of Questionnaires?

Apparent simplicity, versatility, and low cost way to collect data. Can sample more people for the same budget than with interviewing. Relatively well understood and numerous guides on how to design good questionnaires.

What are the functions of attitudes?

Basic idea is that attitudes mediate between a person's inner needs and the outside world. Value-Expressive function - enable us to express who we are and what we believe in. Ego-defensive function - enable us to protect our self-esteem or justify actions we feel guilty about. Knowledge function - enable us to know the world and make predictions. Utilitarian / adaptive function - enable us to gain rewards and avoid punishment.

What are arguments by Norman (2010) regarding performing parametric statistics on Likert scale data?

Because tests that use central tendency (e.g., t-tests, ANOVA) are extremely robust to violations of assumptions (with sample sizes above 5). Even tests that use variation instead of central tendency (like correlation and regression) are robust to violations of assumptions. This means, that these test can usually produce the "right" answer even if the data you are using violate some of the assumptions of the tests. So we can justify using parametric statistics on Likert Scales.

What is a construct?

A trait not directly observable, sometimes called latent variable, e.g., "aggression", "disability", "burnout".

What is the Staircase Method?

An extension of the classical methods to measure absolute threshold. Derived from the Methods of Limits. If, during a descending series of stimulus intensities, the observer says "Yes" (I detect it), the stimulus intensity is decreased on the next trial. If observer says "No", the intensity is increased.

How do you change sensitivity?

An observer's sensitivity to the signal can be represented by the distance between the two distributions. If the two are close together, the observer will be less accurate at detecting the signal. If they are far apart, the observer will be more accurate (their threshold will be lower) because there is little overlap in the functions. This distance is called d' (d-prime).

What is Thurstone Scale (1920s)?

First formal technique used to measure attitudes. (1) First collect (from a variety of people) a large number of statements expressing wide range of attitudes towards a given topic. Aiming for between 80 and 100 statements. (2) Remove duplicate and irrelevant statements. The remainder are printed on cards. (3) Judges are asked to sort the cards into 11 piles which range from extremely unfavourable to extremely favourable. The stack in the middle is for neutral statements. (4) A final rating is applied to each statement. This rating value is the average rating given by all judges.

Why is Standard or Stepwise not recommended?

Because the computer decided which order to enter the predictors. The choice it makes in each case is based on what is already in the model - a predictor might be discarded as a "poor fit" because of what has already been added - so important predictors might be left out. Alternatively, the stepwise method might result in too many predictors being added that don't make much contribution to predicting the outcome. You might use this method if you have no preconceptions about which predictors are the most important. If you do, you should use the backward method rather than the forward method.

When would you use a Mixed-design ANOVA?

For designs where some independent variables are between-subjects and other are within-subjects. E.g., all the participants do the test under all sleep conditions (sleep: within-subjects x gender: between-subjects).

What are the assumptions of ANOVA?

Dependent variable is measured on at least an interval scale; dependent variable scores should be independent of each other (the score of one person should not affect the scores of other people); the sampling distributions of the dependent variable should be normally distributed (if your sample size is greater than 30, it probably is...); the variance of the scores on the outcome variable should be equal at all levels of the independent variable (homogeneity of variance).

What is the assumption of reasonable sample size?

Depends on the size of the expected effect and the number of predictors you have. If you think the effect will be large, or you don't have many predictors, then a smaller sample size is okay (around 77 with a large effect size and up to 20 predictors). If you think the effect size will be small, you need hundreds in your sample.

What things do you need to control for in the Method of Limits?

Error of habituation: observer becomes accustomed to making the same response and goes beyond threshold. Error of anticipation: observer is aware hat the intensity is about to change and anticipate that change. These effects cancelled out by using ascending and descending series.

How is content validity established?

Established by a panel of experts in that area who assess the content of the questionnaire/ instrument. They check the readibility, clarity, and comprehensiveness.

What is validity?

Estimate of how well a test/ questionnaire measures what it's supposed to measure. (does it measure the construct it is supposed to measure, for the population for whom it is intended, and for the purpose (application) for which it is intended?

What is Magnitude Estimation?

Extension of difference threshold. Trying to ascertain the magnitude of the perceptual difference between two stimuli (does this light look twice as bright as this light?; does this tone sound twice as loud as this tone?). Observers asked to rate the intensity of a comparison stimulus compared to a standard stimulus intensity.

Item means and variance.

For each item, it is good to have a mean score close to the centre of the range of possible scores (items with a very high or very low mean be skewed, and will generally have a low variance). Items that have no variance are not very useful (if all people answer the item identically, the item will not help to discriminate among individuals with different levels of the construct). So, want each item to produce a range of different scores.

What is Oblique rotation?

Factors are allowed to correlate, so axes are not held at right angles.

What is the R-value and R2 value in the model summary?

In a regression, the model summary table produced by SPSS will have Model - R - R square - Adjusted R Square - Std. Error of the Estimate. The R value is the correlation between all the predictors and outcome variable. The R2(squared) value tells us the amount of variation in the outcome variable that can be explained by the predictor variable(s). For example, the R2 value here is .083, so we could conclude that the predictor variable accounts for 8.3% of the variation in the outcome variable.

What is the relationship between error in questions and the "true" score.

In questionnaires, "error" (the part of the score that can be explained by error) should not be closely associated with the "true" score (error will sometimes increase the score, sometimes decrease it in a random way). But, there should be a strong relationship between the "true" part of the score and the observed score (the score we measure). How can we measure the relationship between observed score and true score?

How can the problem of Weighing the Odds be solved?

In this situation we could use the Bonferri correction. Divide the p-value by the number of comparisons, e.g., if do 5 tests, we divide 0.05 by 5 = 0.01. This means the p-value for each individual test now needs to be less than .01 to be significant. One problem with this method is that we have now increases the chance of a type II error (accepting the null hypothesis when there is in fact a significant effect) because we have made the criteria for finding a significant effect stricter. So our tests will have less power to find effects where effects really exist.

How do SDT experiments differ from classical psychophysical procedures?

Instead of presenting differing intensities of a stimulus, as we do in classical psychophysical procedures, in SDT experiments, we: present a single, low-intensity, stimulus (usually called the signal), include trial where no stimulus is presented. These two types of trials (signal-present and signal-absent) are presented randomly.

How do you measure attitudes?

Measured by a series of statements to which respondents agree / disagree. Scale should be: Unidimensional (measure only one trait at a time), Reliable (responses from the same person about the same thing should be the same), Valid (should measure what it claims to measure), Linear (people who score twice as highly should have twice as strong an attitude), Reproducible (respondents should make as few errors as possible).

What are Univariate Data?

Measures/ scores of just one variable, .e.g, test scores of PSYC307 students. Single variables can be analysed using (for example): Mean, Range, Standard deviation, or plotting a histogram.

What is Cronbach's alpha?

One problem with split-half reliability is that there are many ways that the items could be split (half/half; odd; even). So, the result could be biased because of the way the data were split... so you could use Cronbach's alpha (sometimes called coefficient alpha, or a). (the most common measure of scale reliability). Sort of the same as splitting the data in two in every possible way and calculating the average of all possible split estimates.

What are alternate forms?

One way to assess reliability is to compare your questionnaire to another questionnaire that measures the same trait/ construct. This requires that the alternative questionnaire also be administered to your pilot sample. If the two tests produce correlated scores, they are likely measuring the same underlying trait/ construct - they are consistent. Problems - usually no suitable questionnaire exists; creating two questionnaires is even harder than creating one!

What are the advantages of Likert Scales?

Subtle items can be used for sensitive issues. Unidimensionality can be checked (the degree to which the items are measuring the same underlying trait or dimension). Sub-scales can measure a multidimensional attitude. Simple to construct. Easy to use.

What does Psychophysics provide?

Techniques for determining thresholds of detection.

What are problems with the Guttman Scales?

Tend to be most successful in areas that have an orderly progression. Attitudes tend not to be unidimensional (often don't know the dimensions until we try to assess them). Conditions of a Guttman scale are generally hard to meet.

Cronbach's alpha if item deleted.

The other column of interest is the Cronbach's alpha if item deleted. If any of these values are HIGHER than alpha then we should remove that item from our questionnaire.

What are the ideas around "Don't know" and "No opinion" responses?

They are used to overcome the non-attitude problem identified by Philip Converse (1964). If there is no "don't know" category, people might select an option that is offered, even if don't agree with it. To increase likelihood of a true response you need to give a non-attitude/ don't know option.

What is the structure of attitude?

Three component model views attitudes as having three components (ABC): Affective = feelings about the attitude object, Behavioural = predispositions to act towards the attitude object in a certain way, Cognitive = beliefs about the attitude object. Any given attitude may be based on lesser or greater amounts of any of these components.

What are the three major types of attitude measurement scales?

Thurston, Guttman, and Likert.

What are other types of regression?

We can also predict categorical data from continuous or categorical predictor variables using logistic regression. There are also ways to deal with multiple dependent measures called canonical variate analysis.

What is the Weber Fraction?

Weber discovered that the difference between the standard and the comparison hat is just detectable is a constant ratio (called Weber's Law or the Weber fraction). JND/S = K. Just Noticeable Difference / value of the standard stimulus = Weber's fraction (K). E.g., if standard weight = 100g: 5/100 = 0.05., if standard weight = 200g: 10/200 = 0.05. Thus, Weber's fraction remains constant. If your standard stimulus is large, you will need a larger comparison before you can detect a difference between them.

When do attitudes predict behaviour?

When behaviour is quite specific. When people tend to be more influenced by their own feelings than the environment. When people have very strong attitudes. When people have person experience relating to the attitude.

What are Multivariate Data?

When we collect information based on a large number of variables, we call these data multivariate, i.e., multivariate = multiple variables. Measures/ scores of more than two variables (two or more predictors/ independent variables: e.g., test score and IQ scores and number of hours spent studying for male and female PSYC307 students). We can analyse multiple variables using (for example): multiple regression, ANOVA, MANOVA, factor analysis. Makes the study of interaction possible. More efficient data collection. Address more realistic questions.

Revision: Multivariate.

When we collect information based on a large number of variables, we call these data multivariate. If have multiple variables and want to make a prediction we use multiple regression, if we want to analyse differences between groups we use factorial ANOVA, factorial repeated-measures ANOVA, mixed design ANOVA, MANOVA, or discriminant analysis.

What does homoscedasticity refer to?

for each predictor, the residuals should have the same variance.

What are the uses of questionnaires?

(1) To generate hypotheses - Questionnaires can be useful for asking exploratory questions. (2) To develop or validate a test - You may be developing a test, or exploring whether an existing test is reliable or valid. (3) To estimate population scores - When a range of measures exist (either published or those you're developed), you can estimate population scores on those tests. Compare with norms (responses of other groups who have taken it in the past). (4) To test a model or hypothesis - If measures of key constructs already exist, questionnaires can be useful for testing models/ hypotheses, confirming the factor structure underlying responses to items, examining differences between groups and evaluation of an intervention.

What can Factor Analysis be used for?

(1) Understand the structure of a set of variables, (2) Assess factorial validity - the extent to which items measure the same concept or variable, (3) Construct a questionnaire, or psychological test, to measure an underlying variable, (4) Provide evidence of construct validity, (5) Reduce a data set to a more manageable size while retaining as much information as possible, e.g., in multiple regression analysis, multicollinearity can be a problem. Factor analysis can deal with this by combining variables that are collinear.

What are common wording problem?

(1) Vague/ ambiguous terminology - if items paraphrased vaguely can't be sure what the responses mean. Referred to as vague quantifier. I'll defined items are similar problem. (2) Technical terminology - e.g., arachnophobia. (3) Hypothetical questions - hypothetical future situations must appear reasonable to participants. (4) Leading questions - suggest you want people to agree with you. (5) Value judgements - item wording should not contain implicit judgements (your own or your sponsor's views). (6) Context Effects - subtle effects on responses that depend on the nature of the rest of the questions. (7) Double-barrelled questions - avoid asking 2 questions in one. (8) Hidden assumptions. (9) Sensitive Issues - be aware that some of your respondents may be shocked, embarrassed, or annoyed by your questions. (10) Social Desirability - people tend to present themselves in positive light, leads to potential biases in response patterns.

When you have multivariate data, what is the problem with conducting multiple tests?

(1) Weighing the Odds and (2) Correlation of Measures.

What are decisions in questionnaire design?

(1)What information do you need? (2) Who are your target respondents? (3) How will you reach your respondents (phone, email, text)? (4) Decide question content. (5) Develop questions. (6) Put questions into order and format. (8) Pilot. (9) Develop final form through analyses and validation.

What is the Classical Test Theory?

A person's true score on any measurement scale can never be known. We can only infer a person's true score, which is only possible when that true score is consistent. Observed score = true score + error. The true score for a person is the mean score from the test when it has been taken an infinite number of times., e.g., if we measure a person's height with a wooden ruler ONCE, chances are it will not reflect their TRUE height. But if we measure their height 100 times, and make the mean of those measurements, we will be closer to their TRUE height.

What is the method of least squares?

A statistical method used to find the line that minimises the vertical deviations (called residuals) between the line and the data points.

What are the advantages and disadvantages of the Stair-case Method?

Advantages of the staircase-method include the staircase-method is extremely efficient. It requires much fewer presentations of stimuli (most are near the threshold level). Disadvantages are that the observers can become aware of the stimulus order. However, this can be overcome by using variations of the staircase-method, i.e., double-staircase (uses ascending and descending series in alternation. Random-double staircase (descending and ascending series running in random sequence).

What is the Forced Entry method?

All predictors forced into the model simultaneously. If you don't have good theoretical reasons for using the hierarchical method this is the best method.

What is Signal Detection Theory (SDT)?

Also known as Sensory Decision Theory. Some researchers questioned the ideas of absolute thresholds. They said the threshold is determined not only by characteristics of an observer;s sensory system but ALSO by characteristics of the observer. Argued that the threshold is the result of an observer's response criterion which changes according consequences, and the observer's own "values".

What are the ideas around the items?

As varied as possible; half the statements about the attitude object should be positive; half negative; ideal length is 30-40 items (pilot will require 60+, as 50% may need to be dropped); randomise order; alter order if necessary to ensure that: opening questions are mildly worded; there is a balance of positive and negative statements among the first few. Rating scale often has an uneven number of alternatives (5 or 7); allows respondents to 'sit on the fence', neutral option for 'don't know'/ 'not sure' (include for a less common attitude which may genuinely not be held by some respondents; omit for common, but sensitive, attitude, to preclude feigned non-response); can reverse direction for some items to ensure respondents don't just get into a pattern, or try to "break" the pattern. Devise a scoring key to turn responses into numbers (decide what a high score should mean and determine score direction accordingly).

What are the types of information gleaned from questionnaires?

Background and demographic characteristics; behaviour reports; attitudes, beliefs, opinions; knowledge; intentions, aspirations.

How can you assess each predictor to see how well it predicts the outcome variable?

By using the coefficients table, which gives us the parameters of the fitted regression line. The B(constant) is the intercept of the fitted line (here = 543.85). The B values for each predictor (age and days here) give us the slope associated with each predictor, if we hold the effects of the other predictors constant.

What is the SDT Theory?

Can think of signal-present trials as containing: Signal + Noises (S + N). Because on every trial there is noise emanating from the background. Signal-absent trials (N) still have the noise, but not the signal.

How do you calculate Hit rate and False Alarm rate?

Can use the matrix to calculate: Hit rate: proportion of signal-present trials on which the observer correctly detects the signal (Hit rate = N hits / N signal-present trials). False Alarm (FA) rate: proportion of signal absent trials on which the observer incorrectly detects the signal (FA rate = N FAs / N signal-absent trials). A liberal observer (says "yes" a lot) will have a high hit rate, but will ALSO have a high FA rate.

With regards to layout of questionnaires what are ideas around case identifiers and length?

Case identifiers: good to be able to identify individual questionnaire (not necessarily respondents), so if there's a problem with the data you can identify the questionnaire. Length: No rules, depends on topic, method of distribution, and enthusiasm of respondents; how can you ask what you want to know without tiring or boring your respondents? How long your questionnaire takes can be assessed in a pilot study. If it takes more than 45 minutes, the respondent would have to be very highly motivated; very short questionnaires (1 or 2 pages) aren't taxing but may not be taken seriously.

What are advantages of closed-ended formats?

Clarifies response alternatives for the participant; reduces ambiguous answers; avoids multiple, repetitious, answers; reduces coding errors in the data set (misinterpretations of an open-ended answer); quicker to answer; and easier to analyse.

What are the three main kinds of criterion-related validity?

Concurrent, Predictive, and Post-dictive

What are Forced-choice Techniques?

Consider now a procedural difference between classical psychophysical tasks and SDT tasks. On one, the stimulus is always presented (at differing levels of intensity). Observer says "yes" (or presses lever, button, etc) when stimulus is detected. On the other, the stimulus is not always presented. Observer says "yes" when they detect the stimulus, and "no" when they do not (or presses left lever for "yes" and right lever for "no", etc). I.e., the SDT procedure forces the observer to make an active response on ALL trials (remembering that in actual fact the observer is never forced to do anything).

What are disadvantages of closed-ended formats?

Creates artificial forced choice; rules out unexpected responses; making up response categories can be difficult; difference in shared meanings of words, e.g., tea.

What is Standard or Stepwise?

Decision about which order predictors are entered is made mathematically. Each predictor is evaluated in terms of what it adds to the prediction of the outcome variable. Here the computer "hunts" for the best outcome - i.e., it's not based on theory. This is the worst method. Therefore,(STEP-UP) start with just the intercept, the computer searchers for the best predictor and adds it, then searches for the next best etc. Predictors that make a statistically significant contribution are added one at a time. (STEP-DOWN) all predictors added to start with. Predictors that are not making a statistically significant contribution are removed one at a time.

What is construct validity?

Degree to which an instrument measures the trait or theoretical construct. Often confused with criterion-related validity as often the same correlation used for both. Construct validity more concerned about how the scale fits with the theory about the construct/ trait. Criterion-related validity more concerned about whether different measurements are related to each other. Used more to look for risk factors - ability to predict.

What are the ways of assessing construct validity?

Degree to which assessment measures a single trait (unidimensionality). This is assessed using statistical methods. Whether scores from different populations vary as predicted by the theory (e.g., sample of depressed people score higher on a measure of depression than a random sample of supermarket shoppers). Whether scores change as a function of age and intervention as predicted by theory.

How do you develop a Likert Scale?

Develop theoretical definition of attitude; assemble large number of relevant items; administer to pilot sample; compute pilot scores; item analysis; three parts to all Likert Scales: The cover ("face") sheet; the items; background (demographic) data on the respondent.

What is Dynamic vs. Static traits?

Dynamic traits: change depending on context and experience (e.g., motivation, attitude, opinion). Static traits: relatively stable from time to time (intelligence, personality). So, could be that your questionnaire is measuring a dynamic trait, and a person's response on one occasion may differ from their response on another day.

What is criterion-related validity?

Each item, or the scale as a whole, is associated with a criterion, or standard, against which it can be evaluated. In each case the score on the current instrument / questionnaire / test is correlated with another measure.

What is the Beta value?

Each variable has a beta value. Beta is a measure of importance of predictor. Looking the coefficients table you are looking for the variable which is the most important predictor. Therefore, you look for the largest number in B.

How well do attitudes predict behaviour?

Early research evidence suggested a weak to moderate link between attitudes and behaviour. But under certain conditions "attitudes significantly and substantially predict future behaviour" (Krauss, 1995, p. 58).

What is Item-total correlations?

High corrected item-total correlations indicate that the item correlates well with the scale as a whole.

What are the four possible outcomes in SDT experiments?

Hit ("Yes" & Signal Present); Miss ("No" & Signal Present); False Alarm ("Yes" & Signal Absent); Correct Rejection ("No" & Signal Absent).

What happens if your items are not measuring the true score?

If your items are not measuring the true score at all, but just random error, the amount of covariance between the items will be small, and the remaining error will be large. The resulting Cronbach's alpha will be small. If your items are all measuring the same thing (presumably the true score) then the covariance will be high and the remaining error low. Crobach's alpha will be large.

How reliable is a person's score?

If a person's responds to the same questionnaire on multiple occasions (let's call them samples), you will have a total (scale) score for each sample. You can calculate the mean score across all those samples. The mean of all these samples should be close to the person's "true" score. The standard deviation(s) of all the individual sample means from total sample mean can be used to calculate the standard error of the mean (SE). This is an estimate of the amount of "error" in the observed score. Low SEs show that the score is reliable (it doesn't vary much from time to time).

How is reliability identified using Cronbach's alpha?

If the mean alpha is high (close to the maximum of 1), we can be more certain that the individual items are measuring the same thing... in other words, that the test is reliable - (Note, that it still may not be valid... it might not really be measuring what you think it is measuring. All you know is that all the items are consistently measuring the same thing).

When examining the differences between groups when would you use a factorial ANOVA?

If we have on dependent variable and more than one independent variable, e.g., have two independent variables (amount of sleep: 2, 4, 8 hours) AND gender (Male, Female).

When examining the differences between groups when would you use a one-way ANOVA?

If we have one dependent variable and one independent variable and there are three or more levels of the independent variable. (note: ANOVA is sometimes referred to as the General Linear Model (GLM). If you design is within-subjects, used repeated measures ANOVA).

When examining the differences between groups when would you use a t-test?

If we have one dependent variable and one independent variable, and there are only two levels of the independent variable.

What is the problem, Weighing the Odds?

If we use a .05 significance level, the chance of making a Type I error when conducting a statistical test is 5%. This means that for any one test, there is a 5% chance that we will conclude that there is a significant effect when there really isn't. This means that the probability of not making a Type 1 error is 95% for a single test (1-.05 = .95, or 95%). For any one test, there is a 95% chance that we will conclude that there is a significant effect when there really IS an effect. If we carry out two tests, the overall probability of not making a type I error is (.95)squared = .95*.95 = .9025 or 90.25%. So the probability of a type I error is 1-.9025 = .0975 or 9.7%. The probability of making a type I error is called the familywise or experimentwise error rate. It is calculated as 1-(.95)squared by the n, where n is the number of tests carried out.

How many number of items should you include?

If your cronbach's Alpha/ coefficient alpha is consistently higher than 0.9 (your questionnaire is very reliable), you could consider shortening the questionnaire.

Why are high levels of multicollinearity problematic?

It causes the beta values to be less trustworthy. Two variables that account for the same variance in a sample do not improve the Rsquared value. If two predictors are highly correlated, and account for similar variance in the outcomes, how can we know which variable is more important?

What things affect Cronbach's alpha?

It is important to have all your items scored in the same direction, i..e, a score of "5" should always reflect high aggression and "1" low. If your items have not been phrased in this way, you will need to ensure that you adjust for this BEFORE calculating reliability.

What is the Just Noticeable Difference?

JND, is the smallest difference between two stimuli that an organism can detect. Weber measured this by placing a weight in both hands and having people judge which was heavier. The greater the difference between the two weights, the easier it is to detect the difference. Weber also found that as the standard weight was increased, a larger difference in the comparison was required to detect the difference.

What is the satisficing response?

Krosnick et al., (2000) challenged the idea that "don't know"/ "no opinion" options improve the quality of data. Respondents show a satisficing response indicating no opinion when they actually have one due to lack of motivation (can't be bothered thinking about it) and time pressures (haven't got the time).

What are the assumptions of Multiple Discriminate Analysis?

The discriminating variables must have normal distributions. Sample size 30+ per group (reliability increases with sample size).

What are problems with Thurstone's Scales?

Lots of work needed to develop the scales (but easier now with computers); Lack of uniqueness of individual's scores (the same score, which is the median value of the statements checked, can be obtained by checking different statements); The effects of the judges' own attitudes on the scale value of statements; Ordinal, not interval, level of measurement.

What are Bivariate Data?

Measures/ scores of two variables (e.g., test score and IQ score of PSYC307 students; test score for men and women). We can analyse two variables using (for example): with continuous data: Correlation, Regression, or plot a scattergram. When one variables is categorical and the other continuous: t-test, .e.g, to assess whether there was significant difference between the tests scores of mean and women. One-way ANOVA (when you have 1 IV with more than 2 levels). When both variables are categorical: Chi-square analysis, e.g., to assess whether the observed number of older and younger male and female students are the same as the expected number.

What method should be used then to overcome Weighing of Odds and Correlation of Measures?

Need to use multivariate analysis methods which take into account the relationships between variables and allows us to conduct tests on multiple variables at the same time.

What does no multicollinearity refer to?

No perfect linear relationship between two more more of the predictors. So, the predictor variables should not correlate to highly with each other.

The probability that the threshold will have a particular value is what?

Normally distributed.

What is Postdictive criterion-related validity?

Not very common; test you now to assess something that has already happened. e.g., might assess your maths ability and correlate it to maths grades received at school (in the past).

What is the Method of Adjustment?

Observer (or experimenter) adjusts the intensity of the stimulus in a continuous manner until the observer can just barely detect it. Can repeat several times and take the average as the threshold. Sometimes there are two stimuli presented and the observer adjusts one until is appear the same as the other, e.g., Muller-Lyer Illusion.

What are the ideas around existing scales and measures?

Often tempting to alter the wording of existing scales to make them sound better or to clarify them. Pre-existing scales may contain culturally specific phrases, or assume familiarity with cultural norms that are not appropriate for your sample, e.g., Wilson-Patterson Conservatism Scale (1968). Should you change them? Tampering with the wording will change the nature of the scale so it's no longer equivalent to the original. Can't then compare scores, without further validating the reworded scale. Always consider whether the scale is valid when you are using a questionnaire that was validated on a different population.

What is a Simple Regression Analysis?

One predictor/ outcome. Predicting one variable from another, e.g., using IQ to predict test score. Create scatterplot: predictor variable (IQ) on x-axis, outcome/ predicted variable (test scores) on y-axis. Calculate the regression line (straight line that best fits the data points). Some of the residuals are above the fitted line (positive) and some are below the line (negative). We cannot add them together, as they would cancel each other out. So we square all the residuals first before add them up.

What is split-half reliability?

One way to measure reliability, is to split the test in half (odd- and even- numbered items for example) and see if scores on one half are correlated with scores on the other half (across several respondents). We estimate the reliability of the entire questionnaire using the Spearman-Brown split half coefficient (just a slightly re-calculated correlation coefficient). If your test is measuring aggression (for example), people who score high on aggression on odd-numbered questions should score highly on even-numbered questions. Split-half reliability close to one shows that the scores on one half are positively correlated with the scores on the other half (high scores on one half predict high scores on the other).

What are disadvantages of Likert Scales?

Oner person's "strongly agree" is different to another's. Subtle items for sensitive issues can be hard to write and need large-scale empirical validation. Different responses may give same scores. People tend to avoid the extremes (central tendency bias). Likert Scales, along with other scales, can also have these problems: some people have a tendency to agree (acquiescence bias); some people may give socially desirable responses (social desirability bias).

What are computer-assisted questionnaires?

Online services can help with designing questionnaires. Questionnaires can be completed by respondents online. These systems also produce basic analyses (graphs, etc). Beware, they are not free!

How can you measure whether people are answering dishonestly?

Paulhus Deception Scale - a 40 item self report instrument that measures the tendency to distort or give socially desirable responses. Ask the same questions two different ways and see if you get the same answer. May be possible to convince respondents you have an alternative way of finding out about their behaviour (bogus pipeline).

What ideas around background and demographic data?

People are often reluctant to give this information. Needs careful wording. Age - do you need to know their exact age? Use age bands if not. How accurate does it need to be? Biological sex - make this forced choice, male / female or you will get inappropriate responses. Ethnicity and nationality - don't confuse nationality (e.g., British, French) with ethnicity (e.g., Maori, Asian). Ensure the information you gather can't be used to systematically disadvantage any group. Social class or socio-economic status - most define "class" based on the person's job but need sufficient information to classify appropriately. Income - one of the most sensitive issues to ask about.

What is the Hierarchical method?

Predictors enter the equation as specified by the researcher. Known predictors (from other research) added first in order of importance. Additions of predictors based on logical or theoretical considerations.

What is the two-alternative-forced choice (2AFC) procedure?

Present N or S+N then give observer choice of "yes" or "no" (used for non-visual stimuli like sound, odour...). Present N and S+N at the same time and ask observer to choose S+N (you see this method used a lot by optometrists). Benefit of this procedure (especially for non-human animals) is that you reinforce correct responses to both N and S+N trials (you can reinforce "hits" and "correct rejections"). This helps to reduce bias - tendency to say "yes" more than "no". Important to have equal numbers of N and S+N trial too. Higher numbers of one type of trial might bias responses towards that trial type.

What is the Method of Constant Stimuli?

Present different intensities of the stimulus in random order. Intensity which is detected 50% of the time taken as the threshold. This method is quite time-consuming (and pretty boring for humans!) but considered to be the most accurate.

What is the trade-off in presentation of a questionnaire?

Presentation - trade off between - better presentation and possibly higher response rate v. increased cost.

With regards to layout of questionnaires what are ideas around question order and question density?

Question order: growing convention to have demographics at the end - less taxing questions. Don't place sensitive questions right at the beginning (let them warm up to your topic). Question density: don't be tempted to cram lots of question into a small number of pages (makes the form look complex and may be confusing). Clear layout gives the best results.

How do constructs relate to questionnaires?

Questionnaire items are one way of assessing the magnitude of a construct. There is an assumption that the construct and the items are related. The construct is regarded as the cause of the score on each item, i.e., the magnitude of the construct is assumed to cause the questionnaire item to take on a certain value. The unobservable "actual magnitude" is the true score.

What is the difference between simple regression and multiple regression?

Regression analyses are used to make predictions. Simple regression: predicting one variable from one other variable. Multiple regression: predicting one variable from two or more variables, e.g., using IQ scores AND age to predict tests scores.

What are Likert Scales?

Rensis Likert (1932) developed the method of summed rating as Thurstone's scales were to cumbersome. Set out to develop a simpler and easier method that was just as reliable and valid. Start with pool of items chosen for their relevance to the attitude object. Written so that agreement represents either a favourable or unfavourable attitude towards the object.

What is the assumption of independent errors?

Residuals for any two observations should be uncorrelated (independent). The Durbin-watson test calculates serial correlations between residuals.

What are ROC Curves?

Responses of observers in an SDT task can also be plotted on a receiver operating characteristic (ROC) curve (Plots %FA against %Hits; hit & FA rate *100). Sensitivity to the signal is shown by the distance of a fitted line to the edges of the boxes - less sensitive observers will produce data points that sit further away. Observer's with the same sensitivity, but with different response criterion will produce points that fall on the same line.

What is Orthogonal (unrelated) rotation?

Retain the right angles of the axes. Use this method when all factors are independent (do not correlate).

What is Concurrent criterion-related validity?

Score on test/ questionnaire is compared with criterion measure obtained at the same time (the purpose of this type of validity measurement is often to replace a long assessment with a shorter/ cheaper one; so it would be very expensive for each new driver to take an hour-long driving test to ensure they know the road rule using a written test.

What is Predictive criterion-related validity?

Score on test/ questionnaire predicts criterion assessed later.

What are the different ways to assess validity?

Scrutinise the content validity. Compare the scores to some other measure (criterion-related validity). Analyse for construct validty.

How do Likert Scales work?

Series of statements (items); respondents indicate level of agreement or disagreement; responses are scored from 1-5; conventionally, strong agreement with a favourable item receives the highest score (5); scores on each item summed to obtain total score on attitude scale; the SUM of all items - the Likert Scale.

What is item analysis?

So, now we are able to assess the reliability of the scale as a whole using Cronbach's alpha. We will also want to look at each individual item in the scale and see how useful each item is in the context of the scale. In a reliable scale, all items should correlate with the total.

Can we perform parametric statistics on Likert Scales?

Some debate about wither Likert scales represent an ordinal or interval level of measurement. Each ITEM on a Likert scale represents an ordinal level of measurements. So, we probably shouldn't conduct parametric statistics on individual items on the Likert scale. BUT, remember that we are meant to add, the scores of all items together to get the Likert Scale score, therefore, we get total correct. What level of measurement is total number correct? Ratio. However, with this total, it does not make sense to say that the test-taker has no attitude. Therefore, the Likert Scale score is an internal level of measurement.

What is the Guttman Scale (1944)?

Statements arranged in order, so that an individual who agrees with a particular item also agrees with items or lower rank-order. When items constitute a true, unidimensional, Guttman scale, a respondent who endorses a particular item, also endorses every item having a lower scale value. Determines whether responses to the items chosen to measure a single attitude fall on a single dimension. Analysis aims to produce a cumulative ordinal scale. Guttman realised that it was hard to create a true scale for attitudes but thought it could be approximated. Extent to which a true scale is obtained is shown by the reproducibility coefficient (what proportion of respondents who endorse a particular item, endorse all items below it on the scale. 0.9 acceptable).

What is Psychophysics primarily concerned with?

Stimulus detection. The stimuli are typically sensory stimuli (visual, auditory, olfactory, kinaesthetic, gustatory.

What is the Method of Limits?

Stimulus intensities are presented in ascending or descending order. Observer indicates either when they first detect the stimulus, or when they are first aware of not being able to detect it.

What is Absolute Threshold?

The absolute threshold is the smallest amount of a stimulus necessary for an observer to detect the stimulus, e.g., if you add tiny pieces of TNT to soil, the point at which the dog can detect it is the absolute threshold for that dog.

What does the slope value represent?

The change in the outcome variable associated with a unit change in the predictor. So, in this case, the slope value shows us what happens to the number of friends when we increase the age value by 1 year. The t-statistic, and its associated significant (Sig.), or p-value, for the slope tells us whether this change is significantly different from zero (no change), e.g., here, as age increases by one year, the number of friends decreases by 6.84 and this change is significantly different from zero.

What is Factor Loading?

The coordinate of a variable along a classification axis is known as factorial loading. Can think of factor loading as the Pearson correlation between a factor and a variable. If we square the factor loading, we obtain a measure of the importance of a particular variable to a factor. The axis in a factor plot are straight lines, and so can be described mathematically. Therefore, factors can also be described in terms of this equation.

Why use the "corrected" correlations?

The corrected correlations exclude the item itself from the total (each item will correlate perfectly with itself, so if the item is included in the total, it will inflate/ increase the relationship). The uncorrected version includes the item in the total when calculating the individual correlations. This matters the most when there aren't very many items in the questionnaire (it will have the biggest effect on the correlation when there are few items). So, generally a good idea to use the corrected version.

What does the covariance measure?

The covariance measures the relationship between two variables (in a similar way as correlation).

What does the Cronbach's alpha/ coefficient alpha measure?

The proportion of variance in the scale scores that is attributable to the "true" score. So, remember that the variance of a single variable (in this case a question item) is the average amount that each score varies from the mean. So the variance of each item tells us about the amount that scores vary around the mean for single items. If we had 10 respondents answer a four-item questionnaire, we can calculate the mean score for each item across those 10 people, and the variance within each item (the average amount that scores for each item vary around that item mean). If a = 1, then the scale is perfectly reliable.

What is Psychophysics?

The relationship between the physical properties of stimuli and the perceptual response to these stimuli.

How do you change an observer's response criterion?

The response criterion of an observer can be changed, without changing the sensitivity of the observer to the stimulus. We could reward the observer for "hits" more than for other types of responses. Observer will start saying "yes" more. Results is a higher hit rate but also higher FA rate. We could reward the observer for CRs more than for other types of responses. Observer will start saying "no" more. Results is a lower hit rate but lower FA rate.

Why is the psychophysical function of absolute threshold S-shaped?

The stimulus itself might be more difficult to detect at some times than at others. The background "noise" might be interfering with detection. The observer's sensitivity changes from trial to trial due to "noise" in the nervous system (neurons sometimes generate a signal even when there is no signal, and sensitivity of receptors fluctuates slightly from trial to trial).

What is the problem of the Correlation of Measures?

There may be relationships between the variables. E.g., a t-test may show that students with high IQ score higher on the test than those with low IQ. Another t-test might show that students with high hours of sleep score higher on the test than those with low hours of sleep. Can we conclude separately that both lots of sleep predicts high test scores AND that high IQ predicts high test scores? It might be that students with high IQs sleep longer so the variable "sleep" is not adding any information because it is correlated with IQ.

What is reliability and item analysis?

The usual way to look at reliability is based on the idea that each item (or sets of items) should produce results consistent with the overall questionnaire. So, someone with a high overall score on the whole questionnaire should also score highly on individual items in that questionnaire. It is sort of like correlation - each item should correlate highly with the questionnaire as a whole, or groups of items should correlate highly with other groups of items.

How do we decide how to enter predictors into the regression model?

There are three ways to enter predictors, (1) Standard (or stepwise), (2) Hierarchical, (3) Forced.

How do you check wither a regression model is significant?

To check whether our regression model is significantly better than a "best guess" at predicting our outcome variable(s), we look at the F-statistic and its associated significance value (p-value). F and Sig. in the SPSS output ANOVA. If our model is significant, we can conclude that at least one of our predictors does a good job predicting scores on the outcome variable.

How do you infer a person's true score on a construct from a questionnaire?

To infer a person's true score on a construct from a questionnaire, then, we need multiple items (questions) = multiple measurements of the same thing (just as with the height example). The more items (questions) we have in a questionnaire that measure the same construct, the closer to the person's "true" score we will get, because each new item helps to reduce (eliminate) some of the the error.

With regards to layout of questionnaires what are ideas around typeface and size and numbered response options?

Typeface and size: small densely packed text may be hard to read, choose a clear font, size 12 or bigger. Use different font or colour for instructions and bold or italic lettering for filter questions. Numbered response options: sometimes best to avoid numbered options. Respondents may: think bigger = better, not understand what numbers mean, be offended that their responses will be reduced to numbers; can use tick boxes, linear scales instead.!

What are ideas around respondent motivation?

Unless present, provide explanatory notes (aims of study, why compliance is important); encourage individual to feel that their responses are valued by you and will be treated with respect. Ensure confidentiality, particularly if you need to be able to identify particular respondents. Explain how respondents can access feedback / results. Thank them for their help.

How can you reduce the problems of closed-ended formats?

Use focus groups or interviews to suggest range of responses. Piloting questionnaire.

What is the Factorial repeated measures ANOVA?

Used to analyse data from within-subjects designs with more than one independent variable and one dependent variable, e.g., all participants sit test after different amounts of sleep (sleep: 2, 4, 8 hours).

What is an example of how response criterion affects threshold?

Using Method of Constant Stimuli, five intensities of tone presented randomly. Observer answers "yes" when they hear it, "no" when they don't.The observer (Laurie) ma answer "yes" much more tan another observer (Chris). (1) Call Laurie a liberal responder; (2) Call Chris a conservative responder. Person (1) ends up with a lower threshold than Person (2).

What is Multiple Discriminant Analysis?

Using dependent variables to discriminate between groups. Determine which variables discriminate between two or more naturally occurring groups. Classifying cases into different groups with a better than chance accuracy. Computationally very similar to ANOVA.

What are the ideas around Attitudes: Measurement?

Usually questions are about affect (feeling). Cognition (thoughts) or intention to act may also be included. Ideally includes all three (Affect, Behaviour, Cognition) - but note can only measure intention to act, not behaviour itself in a questionnaire. Attitude scales usually give a total score indicating the direction and intensity of the person's attitude.

What are errors in questionnaires?

When we measure height, our measurement includes some error. When we measure an unobservable construct, some of our measurement also includes error, e.g., if we want to measure how much the average NZ man likes rugby, we might ask respondents to rate their agreement with the statement: "I regularly follow the performance of All Black players in rugby games". Part of the respondent's answer might be influenced by how much they like rugby, but part might be error (e.g., influenced because their son is an All Black, and therefore, they follow rugby for a different reason). So, the "true" score is a subset (a part) of their answer.

When examining the differences between groups when would you use multivariate analysis of variance (MANOVA)?

When you have more than one independent variable AND more than one dependent variable.

How do we select and assess items?

Will firstly develop a pool of items, and the pilot them (administer to a large, representative sample), sample size should be at least five times the number of items. Then evaluate the items in terms of how "good" they are at measuring the underlying construct. (how strong the relationship is between our observed scores and the "true" score). In this situation, a measure is considered reliable if it reflects mostly true score (and, thus, the measurements of that true score is consistent).

What is the equation for a straight line? and what does it tell us?

Y = a + bx The straight line equation tells us a (where the fitted line cross the y-axis; intercept), b (how quickly the data change; slope of steepness of line). Whether relationship is positive or negative (given by the sign of the slope).

What is Cronbach's alpha also referred to?

internal-consistency reliability - because, essentially, it measures the degree to which each item measures the same thing - the degree to which each item is consistent with the whole questionnaire.

What is the assumption of normally-distributed errors?

residuals in the model should be random, normally distributed variables with a mean of 0. So, the observed values should sit close to the scores predicted by the model, and values very different from the predicted scores should happen only occasionally.


Related study sets

Principles of Marketing Chapter 1

View Set

Popular Landmarks Around the World

View Set

Small business & Entrepreneurial Development Strategies Chapter 1 & 2 Terms

View Set

BN Ch 22 Care of pt with alterations in health

View Set

Lección 2 Estructura 2.2 Forming questions in Spanish ¡Inténtalo!Fill in the blanks Activity Textbook InstructionsFill in the blanks with intonation and tag questions based on the statements. Follow the model.

View Set