Types of Reliability and Validity
Internal Reliability
Internal reliability assesses the consistency of results across items within a test
Improving inter-rater reliability
Clearly defining behavioural categories Pilot study Test-re-test
Construct Validity
Construct validity is the appropriateness of inferences made on the basis of observations or measurements (often test scores), specifically whether a test measures the intended construct (theory).
External Reliability
External reliability refers to the extent to which a measure varies from one use to another.
External Validity
External validity is the validity of generalised (causal) inferences in scientific research, usually based on experiments as experimental validity. In other words, it is the extent to which the results of a study can be generalized to other situations and to other people
Criterion Validity
In psychometrics, criterion validity is a measure of how well one variable or set of variables predicts an outcome based on information from other variables, and will be achieved if a set of measures from a personality test relate to a behavioural criterion on which psychologists agree.
Predictive Validity
In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure. For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings
Internal Validity
Internal validity refers to how well an experiment is done, especially whether it avoids confounding (more than one possible independent variable [cause] acting at the same time). The less chance for confounding in a study, the higher its internal validity is.
Temporal Validity
Refers to how relevant the time period is in affecting the findings. e.g. A study on attitudes conducted decades ago cannot be expected to have temporal validity due to how quickly attitudes shift in society.
Face validity
The degree to look as though it measures what it's supposed to do
Validity
The extent to which a research technique actually measures the behaviour it is claimed to measure
Reliability
The extent to which the measurement of a particular behaviour is consistent.
Split-half Reliability
The split-half method assesses the internal consistency of a test, such as psychometric tests and questionnaires. There, it measures the extent to which all parts of the test contribute equally to what is being measured. This is done by comparing the results of one half of a test with the results from the other half. A test can be split in half in several ways, e.g. first half and second half, or by odd and even numbers. If the two halves of the test provide similar results this would suggest that the test has internal reliability. The reliability of a test could be improved through using this method. For example any items on separate halves of a test which have a low correlation (e.g. r = .25) should either be removed or re-written. The split-half method is a quick and easy way to establish reliability. However it can only be effective with large questionnaires in which all questions measure the same construct. This means it would not be appropriate for tests which measure different constructs. For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviours such depression, schizophrenia, social introversion. Therefore the split-half method was not be an appropriate method to assess reliability for this personality test.
Test- Re-test Reliability
The test-retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psychometric tests. It measures the stability of a test over time. A typical assessment would involve giving participants the same test on two separate occasions. If the same or similar results are obtained then external reliability is established. The disadvantages of the test-retest method are that it takes a long time for results to be obtained. The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results. Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.
Ecological validity
What extent findings are generalizable to every day life -the extent to which the task represents a real world task e.g when measuring memory lists of words to remember is usually low EV
Inter-rater reliability
When observers agree on observed behaviour High = 80% agree