PSYC235 Lecture 4 & 5 Reliability and Validity/ Psychometric tests
What is validity?
The extent to which the scores from a meausre represent the variable they are intended to. Validity is concerned with ascertaining if something does what it is supposed to do i.e. the truthfulness of a measure.
What are the 3 basic kinds of validity?
1) Face validity 2) Content validity 3) Criterion validity
What are 3 different types of reliability?
1) Test-retest reliability (reliability over time) 2) Internal consistency (reliability across items) 3) Inter-rater reliability (reliability across different researchers)
How is content validity assessed?
By carefully checking the measurement method against the conceptual definition of the construct
How can internal consistency be assessed?
By using the split-half method
What is content validity?
The extent to which a measure covers the construct of interest i.e. whether the measure includes all necessary items to measure the concept in question
What is face validity?
The extent to which a measurement method appears to measure the construct of interest 'on the face of it'. It is the simplest test of validity and is largely subjective.
What is Inter-Rater reliability?
The extent to which different observers are consistent in their judgements.
Give an example of how inter-rater reliability would have been measured in Bandura's bobo doll study?
The observers rating of how many acts of aggression a particular child committed while playing with the bobo doll should have been highly positively correlated.
What is concurrent validity?
This is a type of criterion validity that looks at how well the test correlates with other established tests, at around the same time.
What is predictive validity?
This is a type of criterion validity that looks at how well the test predicts something in the future such as job performance or degree grade etc.
What is convergent validity?
This is a type of criterion validity where criteria also included other measures of the same construct.
What is criterion validity?
This is the extent to which people's scores on a measure are correlatred with other variables (known as the criteria) that oen would expect them to be correlated with.
What is discriminant validity?
This is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct, i.e. the degree to which iterms designed to measure different constructs discriminate between each other. A new scale should not correlate with other scales designed to measure a different construct.
What Pearson's r value indicated good reliability?
+.80
What is a criteria in 'Criterion validity'? Give an example
A criterion can be any variable that one has reason to think should be correlated with the construct being measured. E.g. you would expect test anxiety scores to be negatively correlated with exam performance. You would also expect test anxiety scores to positively correlated with general anxiety and blood pressure during exams.
On SPSS what do you do to check internal consistency using split-half?
Analyse > scale > reliability analysis Load items in chronological order in the box (although this method depends on the way you want to split the items) Click 'model' drop down menu and select split-half Output will appear. Here, we are most interested in Spearman-Brown Coefficient. You look at equal length items row if you have equal number of items. If you have odd number of items, you report unequal length. You would report whether the scale has acceptable internal consistency (r= .) r should always be reported to 2 decimal places with no 0 before the decimal point.
On SPSS what do you do to check test-retest reliability?
Analyse> Correlate> Bivariate > transfer both variables across to variables box> Tick Pearsons > Tick flag significant correlations > Clikc OK Output should then appear on screen. You then report whether the scale has acceptable test-retest reliablility (r= . p=) r should always be reported to 2 decimal places, with no 0 before the decimal point.
On SPSS, what do you do to check internal consistency using Cronbach's alpha?
Analyze > scale > reliability analysis > transfer items we are interested in over. Click on 'model' drop down menu and select 'Alpha'. Click on statistics and select descriptives from 'item' 'scale' and 'scale if item deleted'. Output will then be shown. Report whether the scale has acceptable internal consistency (a= .)
When testing Chronbach's alpha on a scale, what correlation coefficient is seen as showing good internal consistency?
Around +.70 (Kline, 1999)
What should the correlation coefficient of a split half method be if there is good internal consistency?
Around +.80
How is inter-rater reliability assessed?
Assessed using Cronbach's alpha when judgements are quantitative or Cohen's k when the judgements are categorical
Give an example of convergement validity?
Correlating a new measure of test anxiety with an existing measure of the same construct
What sort of statistics are usually used to look at reliability measures?
Correlation coefficients
What is the most common measure of internal consistency?
Cronbach's a (Cronbach's alpha)
What is an issue with using test-retest reliability?
For many tools, the second time the measure/task is administered, it is not effectively under the same conditions
Give an example of a psychometric test that shows good test-retest reliability?
IQ tests are thought to be consistent across time, e.g. a person who is intelligent would score highly on an IQ test and would score highly again if tested in a couple of months
If a scale has good internal consistency, what should be the result of the split half method?
If a scale is very reliable, a person's score on one half of the scale should be the same or similar to their score on the other half of the scale. Therefore, the two halves of the questionnaire should correlate highly across participants
What is internal consistency?
Internal consistency is the consistency in people's responses across items on a multiple-item measure.
How is test-retest reliability assessed?
It is assessed by using the measure on a group of people at one time and using it again on the same group at a later time. You then look at the test-retest correlation between the two sets of score (usually Pearon's r)
Generally, do long or shorter scales have better psychometric properties?
Longer scales
Is Chronbach's alpha only calculated for unidimensional scales?
No, it is also calculated for subscales
What is an issue with long scales?
People tend to drop out more
Give an example of how a scale measuring loneliness could show internal consistency?
People's responses to different iterms would need to correlate with each other- if not, it would not make sense to claim that all items are measuring the same construct
What does the Cronbach's alpha correlation coefficient range between?
Ranges between 0 to 1. The higher the score, the more reliable the scale is. 1= perfect reliability 0= no reliability
What does Cronbach's alpha refer to?
Refers to how closesly related a set of items are as a group. It measures the extent to which different items on the same test correlate with each other.
What is reliability?
Reliability refers to the consistency of a measure. If a measure is reliable, this means it's consistent.
Give an example of items a self-esteem questionnaire should include to have good face validity
Should have items asking about the persons sense of worth and whether they think they have good qualities.
What does internal consistency tell you?
Tells you that all items on the measures are supposed to reflect the same underlying construct. Therefore, people's score should correlate with each other.
What is test-retest reliability?
Test-retest reliability is the extent to which scores from tasks are consistent over time. It is concerned with whether the person has a similar score on tests over time, e.g. they should get approximately the same answers on a questionnaire at both points they are tested
What is the split half method?
This method involved splitting items on a questionnaire into two halves with each half meausring the same elements but in slightly different ways E.g. items could be split into 1st and 2nd halves or even and odd numbers. The score is more computed for each set of items and the relationship between the two sets of scores is then examined.
What is an issue with the split half method for testing internal consistency?
Thre are several ways in which a set of data can be split into two and so the results could be a product of the way in which the data were split
You should always show that your scale/measure has been validated for use within your population. True or False?
True
How is face validity tested?
Usually informally. It cold be assessed as part of the pilot stage, e.g. the researcher could ask pilot ptps whether a measure appears to measure what it intends to.
High test-retest correlations make sense when the construct being measured is assumed to be consistent over time (e.g. intelligence) but when do they not make sense?
When the constructs being measured are not assumed to be stable over time, e.g. changes in mood. A measure of mood that produced low test-retest reliability over a period of a month would not be an issue
If you have subscales, do you have to check internal consistency for each subscale?
Yes. For example, if you have a scale with 3 sub scales (e.g. significant other support, family support, peer support) you would report 3 cronbach's alpha's.
How do you calculate Chronbach's alpha for a subscale?
You calculate a chronbach's alpha for each individual subscale and report the alpha for each