Psychological Assessment (Reviewer)
Sources of error variance
1. Assessee 2. Assessor 3. Measuring instruments
The computation of a coefficient of split-half reliability generally entails three steps:
1. Divide the test into equivalent halves. 2. Calculate a Pearson r between scores on the two halves of the test 3. Adjust the half-test reliability using Spearman-Brown formula.
2 types of sampling procedure
1. Purposive sampling 2. Incidental sampling
Sources of error variance during test administration
1. Test environment like room temperature, level of lightning and amount of ventilation and noise. 2. Test taker variables like pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication 3. Formal learning experiences, casual life experiences, therapy, illness and changes in mood or mental state 4. Body weight, can be source of error variance 5. Examiner-related variables are potential sources of error variance.
Basic 3 approaches to the estimation of reliability
1. Test-retest 2. Alternate or parallel forms 3. Internal or inter-item consistency
Types of Norms
1. age norms 2. grade norms 3. national norms 4. national anchor norms 5. local norms 6. norms from a fixed reference group 7. subgroup norms 8. percentile norms
Validity
A test is considered valid for a particular purpose if it does. It measures what it purports to measure. A test reaction time is a valid test if it accurately measures reaction time. A test of intelligence is a valid test if it truly measures intelligence.
Construct
An informed, scientific concept developed or constructed to describe or explain behavior. We can't see, hear, or touch construct but we can infer their existence from an overt behavior
Traits
Any distinguishable, relatively enduring way in which one individual varies from another.
Test Scores
Are always subject to questions about the degree to which the measurement process includes error.
Grade norms
Are developed by administering the test to representative samples of children over a range of consecutive grade levels.
Norm-referenced testing and assessment
As a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker score and comparing it to scores of a group of test takers.
Which of the following don't describe the definition of error? a) refer to mistakes, miscalculations and the like. b) Traditionally refers to something that is more than expected; it is a component of the measurement process. c) Refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test. d) are always subject to questions about the degree to which the measurement process which includes miscalculations.
D
Grade norms
Designed to indicate the average test performance of test takers in a given school grade.
States
Distinguish one person from another but are relatively less enduring.
Refer to mistakes, miscalculations and the like.
Error
Refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test.
Error
Traditionally refers to something that is more than expected; it is a component of the measurement process.
Error
decision study developers
In the _______________________ examine the usefulness of test scores in helping the test user make decisions.
average proportional distance (APD)
It is a measure that focuses on the degree of difference that exists between item scores.
average proportional distance (APD)
It is a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.
Domain sampling theory
It is a test reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample.
Test-retest reliability
It is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time such as personality trait.
Variance
It is the standard deviation squared. A statistic useful in describing sources of test score variability is the ______________.
Split-half estimate
Methods of obtaining internal consistency estimates of reliability. Split-half estimate
Norms
Provide a standard with which the results of measurement can be compared.
Assumption 1
Psychological Traits and States Exist
Assumption 2
Psychological Traits and States can be Quantified and Measured
Categories of Measurement Error
Random error Systematic error
Systematic error
Refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
language of generalizability theory
Stated in the _________________________, generalizability study examines how much of impact different facets of the universe have on the test score.
Sources of Error Variance
Test construction, administration, scoring, and/or interpretation
Assumption 3
Test-Related Behavior Predicts Non-Test-Related Behavior
Assumption 6
Testing and Assessment can be conducted in a fair and unbiased manner
Assumption 7
Testing and assessment benefit society
Assumption 4
Tests and other Measurement Techniques Have Strengths and Weaknesses
test score, cumulative scoring
The ______________ is presumed to represent the strength of the targeted ability or trait or state and is frequently based on _______________.
Classical test theory or true score theory
The assumption is made that each test taker has a true score on a test that would be obtained but for the action of measurement error.
Error variance
The component of a test score attributable to sources other than the trait or ability measured.
administering, scoring, interpretation
The criteria for a good test would include clear instructions for ___________________, ____________, and ________________.
Reliability
The criterion of __________________ involves the consistency of the measuring tool. The precision with which the test measures and the extent to which error is present in measurements. In theory, the perfectly reliable measuring tool consistently measures in the same way.
coefficient of equivalence.
The degree of relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability which is often termed as ____________________.
Sampling
The process of selecting the portion of the universe deemed to be representative of the whole population.
A common goal of norm-referenced tests is to yield information on a test taker's standing or ranking relative to some comparison group of test takers.
True
A good test is a useful test, one can yields actionable results that will ultimately benefit individual test takers or society at large.
True
A good test is one that trained examiners can administer, score, and interpret with a minimum of difficulty.
True
A good test or more generally, a good measuring tool or procedure is reliable.
True
An estimate of the reliability of a test can be obtained without developing an alternate form of the test and without having to administer the test twice to some people
True
Assumption 2. Psychological Traits and States can be Quantified and Measured Measuring traits and states by means of a test entail developing not only appropriate test items but also appropriate ways to score the test and interpret the results.
True
Competent test users understand a great deal about the tests they use. They understand among other things, how a test was developed, the circumstances under which it is appropriate to administer the test, how the test should be administered and to whom, and how the test result should be interpreted.
True
Competent test users understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources.
True
In a world without tests, teachers and school administrators could arbitrarily place children in different types of special classes simply because that is where they believed the children belonged.
True
In a world without tests, there would be a great need for instruments to diagnose educational difficulties in reading and math and point the way to remediation.
True
In the context of IRT discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured.
True
One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test was intended.
True
Patterns of answer to true-false questions on one widely used test of personality are used in decision making regarding mental disorder.
True
Scorers and scoring system are potential sources of error variance.
True
Split-half reliability is also referred to odd-even reliability
True
Test are said to be homogeneous if they contain items that measure a single trait.
True
The criteria for a good test would include clear instructions for administering, scoring, and interpretation. It would also seem to be a plus if a test offered economy in time and money it took to administer, score, and interpret it.
True
The extent to which a test takers score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance.
True
The greater the proportion of the total variance attributed to true variance, the more reliable the test.
True
The influence of particular facets on the test score is represented by coefficients of generalizability
True
The simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation- coefficient of inter-scorer reliability.
True
The tasks in some tests mimic the actual behaviors that the test user is attempting to understand.
True
There are IRT models designed to handle data resulting from the administration of tests with Dichotomous test items
True
Today all major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual.
True
Assumption 5
Various Sources of Error Are Part of the Assessment Process
average proportional distance (APD)
a relatively new measure for evaluating the internal consistency of a test.
Criterion
a standard on which a judgment or decision may be based.
Fixed reference group scoring system
a system of scoring wherein the distribution of scores obtained on the test from one group of test takers is used as the basis for the calculation of test scores for future administrations.
Dichotomous test item
a test item or question that can be answered with only one of two response options such as true or false or yes-no.
Polytomous test items
a test item or question with three or more alternative responses where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct.
Static characteristics
a trait, state or ability presumed to be relatively unchanging overtime; contrast with dynamic
Norm-referenced
a way to derive meaning from a test scores. An approach to evaluate the test score in relation to other scores on the same set.
The Spearman-Brown formula
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
Age norms
also known as age-equivalent scores, indicate the average performance of different samples of test takers who were at various ages at the time the test was administered.
Criterion-referenced testing and assessment
also referred to as criterion-referenced or domain-referenced testing and assessment. A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard; contrast with norm-referenced testing and assessment.
Restriction or inflation of range
also referred to as inflation of variance, a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedures used and so the resulting correlations coefficients tends to be higher contrast with restriction of range.
National anchor norms
an equivalency table for scores on two nationally standardized test designed to measure the same thing.
Internal consistency estimates of reliability
an estimate of reliability of a test obtained from a measure of inner-item consistency
Item response theory (IRT)
another alternative to the true score model. It is also referred to as latent-trait theory or the latent-trait model, a system of assumptions about measurement and the extent to which each test items measures the trait.
National norms
are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.
Alternate forms
are simply different versions of a test that have been constructed so as to be parallel.
Percentile norms
are the raw data from a test's standardization sample converted to percentile form.
Norm in the psychometric context
are the test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individuals test scores.
True score
as a value that according to classical test theory genuinely reflects an individual's ability level as measured by a particular test.
Heterogeneity
describes the degree to which a test measures different factor.
coefficient alpha
developed by Cronbach and elaborated on by others
Generalizability study
examines how generalizable scores from a particular test are if the test is administered in different situations.
Parallel forms of a test
exist when for each form of the test the means and the variances of observed test scores are equal.
Speed tests
generally contain items of uniform level of difficulty so that when given generous time limits all test takers should be able to complete all the test items correctly.
Random error
is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
dynamic characteristic
is a trait state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.
Classical test theory
is also referred to as the true score model of measurement. It is the most widely used and accepted model in the psychometric literature today.
Test-retest reliability
is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Percentile
is an expression of the percentage of people whose score on a test or measure falls below a particular raw score. It is a converted score that refers to a percentage of test takers.
Heterogeneous test
is composed of items that measure more than one trait.
Criterion-referenced tests
is designed to provide an indication of where a test taker stands with respect to some variable or criterion such as an educational or a vocational objective.
Split-half reliability
is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Item sampling or content sampling
is one source of variance during test construction. It refers to the variation among items within a test as well to variation among items between tests.
Stratified-random sampling
is the process of developing a sample based on specific subgroups of a population in which every member has the same chance of being included in the sample.
Stratified sampling
is the process of developing a sample based on specific subgroups of a population.
Fixed reference group
is used as the basis for the calculation of test scores for future administration of the test.
Power tests
is when a time limit is long enough to allow test takers to attempt all items and if some items are so difficult that no test taker is able to obtain a perfect score.
Coefficient of stability
is when the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as _________.
Criterion-referenced testing and assessment
may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard.
Subgroup norms
norms for any defined group within a large group.
Local norms
provide normative information with respect to the local population's performance on some test.
Norm
refer to behavior that is usual, average, normal, standard, expected, or typical.
Incidental sampling
referred to as convenience sampling. The process of arbitrarily selecting some people to be part of sample because they are readily available, not because they are most representative of the population being studied.
Parallel forms reliability
refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when for each form of the test, the means and variances of observed tests scores are equal.
Alternate forms reliability
refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error or other error.
Overt behavior
refers to an observable action or product of an observable action.
Measurement error
refers to collectively, all of the factors associated with the process of measuring some variable, other than the variable being measured.
Error
refers to the component of the observed test score that does not have to do with the test takers ability.
Inter-item consistency
refers to the degree of correlation among all the items on a scale. A measure of inter-item consistency is calculated from a single administration of a single trait.
Percentage correct
refers to the distribution of raw scores more specifically to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.
Reliability
refers to the proportion of the total variance attributed to true variance
Domain sampling theory
seek to estimate the extent to which specific sources of variations under defined conditions are contributing to the test score.
Sampling
some defined group as the population for which the test is designed.
Classical test theory
states that a score on an ability test is presumed to reflect not only the test takers true score on the ability being measured but also error.
generalizability
study examines how much of impact different facets of the universe have on the test score.
Purposive sampling
the arbitrary selection of people to be part of a sample because they are thought to be representative of the population being studied.
Equipercentile method
the equivalency of scores on different tests is calculated with reference to corresponding percentile score.
Coefficient alpha
the mean of all possible split-half correlation
Standardization or test standardization
the process of administering a test to representative sample of test takers for the purpose of establishing norms.
Error variance
variance from irrelevant, random sources
true variance
variance from true differences