PSYC 442: Exam 2
The Wechsler Tests
A series of individual-administered intelligence tests to assess the intellectual abilities of people from preschool through adulthood
Test blueprint
a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test. This outlines the construct.
Confidence interval
a range or band of test scores that is likely to contain the true score
Unidimensional
some rating scales are ________________________, meaning that only one dimension is presumed to underlie the rating. This is a single characteristic or construct Measuring one thing
Test Administration
sources of error may stem from the testing environment
Kuder-Richardson formula 20
statistic of choice for determining the inter-item consistency of dichotomous items or when tests have considerable heterogeneity due to multiple factors. Used when the data is scored in a dichotomous way, instead of a continuous way.
Other sources of error variance
surveys and polls usually contain some disclaimer as to the margin of error associated with their findings
Spearman
Postulated the existence of general intellectual ability factor (g) and specific factors of intelligence (s)
physical appearance and demeanor may play a role.
What are examiner variables for test administration
the question did not have a right answer on the test or putting the wrong answer in an answer key.
What are examples of systematic error?
Factor analysis
a new test should load on a common factor with other tests of the same construct
rating error
judgment resulting from the intentional or unintentional misuse of rating scale
Sir Francis Galton
who was the first person to publish on the heritability of intelligence.
A likert type scale
will have 4 or 10 options
face validity, confidence
A perceived lack of ___________________________ may lead to a lack of ___________________________ in the test measuring what it purports to measure.
Item bank
A relatively large and easily accessible collection of test questions
test scoring and interpretation
- Computer testing reduces error in test scoring but many tests still require expert interpretation (e.g. projective tests) - Subjectivity in scoring can enter into behavioral assessment
The WAIS-IV
-Contains 10 core subtests -->Block Design, Similarities, Digit Span, Matrix Reasoning, Vocabulary, Arithmetic, Symbol Search, Visual Puzzles, Information, and Coding -Five supplemental Subtests: -->Letter-Number Sequencing, Figure Weights, Comprehension, Cancellation, and Picture Completion) -4 index score derived from groups of subtests: -->Verbal Comprehension Similarities, vocabulary, information, (comprehension) Perceptual Reasoning Block design, matrix reasoning, visual puzzles, (figure weights) Working Memory Digit span, arithmetic, (letter-number sequencing) Processing Speed Symbol search, coding, (cancellation)
.5
50% so half get the item right or wrong or half agree with the construct or do not agree
Validity coefficient
A correlation that provides a measure of the relationship between test score and score on the criterion measure. are affected by restrictions or inflation of range.
Bias
A factor inherent in a test that systematically prevents accurate impartial measurement.
Item characteristic curves
A graphic representation of item difficulty and discrimination
Lower
A great average proportional distance means that the internal consistency is what?
Content validity
A judgment of how adequately a test samples behaviors is representative of the universe of behavior that the test was designed to sample. How well do the items or inner workings of the test measure what they are supposed to be measuring. Do the test items adequately represent the content that should be included in the test?
Intelligence
A multifaceted capacity that includes the abilities to Acquire and apply knowledge Reason logically, plan effectively, and infer perceptively Grasp and visualize concepts Find the right words and thoughts with facility Cope with and adjust to novel situations
equivalency in items
A negative to an item pool is that it can be low in
Random error (noise)
A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. Stuff that comes in randomly that can affect the true score and cause error.
Supplemental subtest
A subtest administered to provide additional clinical information or extend the number of abilities/process sampled
True score
A value that according to classical test theory genuinely reflects an individual's ability (or trait) level as measured by a particular test.
ordinal level data
All rating scales are what?
Item-validity index
Allows test developers to evaluate the validity of items in relation to a criterion measure. Does it measure what it purports to measure Remember though, this is at the item level
An index of the item's difficulty An index of the item's reliability An index of the item's validity An index of the item's discrimination
Among the tools test developers might employ to analyze and select items are:
Computerized adaptive testing
An interactive, computer-administered test taking process wherein items presented to the testtaker are based in part on the testtaker's performance on previous items
Norm referenced
Answers are put in a distribution.
Group administration
Army Alpha test Army Beta test School ability test California Test of Mental Maturity Kuhlmann-Anderson Intelligent Tests Henmon-Nelson Tests of Mental Ability Cognitive Abilities Tests
increases
As error goes down what happens to the amount of true score and reliability?
decreases
As error goes up what happens to the amount of true score and reliability?
decreases
As error variance goes up what happens to the true variance and reliability?
systematic variation
Bias implies what in test scores
Carroll
Came up with a three stratum theory of cognitive ability
Method of equal-appearing intervals
Can be used to obtain data that are interval in nature You are attempting to say that your scale has a set interval between items
spilt half correlation
Chron box alpha is the average of every possible combination of what?
reduce
Computer Administered testing tend to ____________ floor effect and ceiling effect
David Wechsler
Conceptualized intelligence as "the aggregate...capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment. It [is] composed of elements or abilities which...are qualitatively differentiable"
dimension constructs
Cumulatively scored tests are helpful when measuring what?
Heterogeneity
Described differences between factors of a test.
Horn and Cattell
Developed a theory of intelligence postulating the existence of two major cognitive abilities: crystallized intelligence and fluid intelligence
mastered not mastered
Development of criterion-reference test may entail exploratory work with at least two groups of testtakers: one group known to have __________________________ the knowledge or skill being measured and another group that have ________________________ it.
Alfred Binet
Did not define intelligence explicitly but instead described various components of intelligence, including reasoning, judgment, memory, and abstraction. Criticized Galton's approach to intellectual assessment and instead called for more complex measurements of intellectual ability.
Alternative forms
Different versions of a test that have been constructed so as to be parallel. Do not meet the strict requirements of parallel forms but typically item content and difficulty is similar between tests.
Likert scale
Each item presents the testtaker with five alternative responses (sometimes seven), usually on an agree-disagree or approve-disapprove continuum. This is a type of rating scale. they are typically reliable
Second stratum
Eight abilities and processes including fluid intelligence, crystallized intelligence, general memory and learning, broad visual perception, broad auditory perception, broad retrieval capacity, broad cognitive speediness, and processing speed
Factor-analytic theories of intelligence
Focus squarely on identifying the ability or groups of abilities deemed to constitute intelligence
Jean Piaget
Focused his research on the development of cognitive abilities in children. Defined intelligence as an evolving biological adaptation to the outside world; a consequence of interaction with the environment, psychological structures become reorganized
Average proportion distance
For internal consistency reliability Focuses on the degree of difference between score on test items. Involves averaging the absolute difference between score on all of the items then dividing by the number of response options on the test minus one. (7 items means divide by 6)
Low moderate high high, cut scores
For item characteristic curves On Item A, testtakers of _____ ability tend to do better. On Item B, testtakers of ______________ ability tend to do better On Item C, testtakers of ___________ ability tend to do better. It is a good item Item D shows a _________ level of discrimination. It might be good if ______ __________ are being used.
.5 .3-.8
For maximum discrimination among the abilities of the testtakers, the optimal average item difficulty is approximately ____, with individual items on the test ranging in difficulty from about ___ to _____.
Multiple-choice
Format that has three elements
Spearman-brown formula
Formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
sensory abilities
Galton believed that the most intelligent persons were equipped with the best what? By such logic tests of visual acuity or hearing ability are measurements of intelligence
Sensorimotor and perception
Galton developed many ______________________ and _______________________ related tests by which he attempted to measure his definition of intelligence.
high, correctly low, incorrectly
Generally a good item on a norm-referenced achievement test is an item for which ________________ scorers on the test respond ____________________. ____________________ scorers on the test respond ____________________.
higher, lower
Generally, the _________________ the reliability of the test, the _________________ the standard error.
Higher internal consistency
Greater chronbach alpha means a what?
reducing error
How can you improve reliability in a test?
administering two forms of a test to the same group
How is reliability check for parallel-form and alternate-form tests?
5-10
How many respondents should there be per item?
met certain criteria
Ideally, each item on a criterion-oriented test addresses the issue of whether the respondent has what?
theorized
If a test is a valid measure of a construct, higher and lower scorers should behave as ____________________?
Strengths and weaknesses eliminated
Items are evaluated as to their Some items may be
Item format
Includes variables such as the form, plan, structure, arrangement, and layout of individual test items
Item Discrimination index
Indicates how adequately an item separates or discriminants between high scorers and low scorers on an entire test At an item level does it differentiate between people who meet a certain standard and people who do not meet the certain standard. Function of item reliability index and item validity index. A measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly.
Item reliability index
Indication of the internal consistency of the scale. Function of the item-score SDs and the correlation between the item score and the total score. How related each item is to each other item on the test
Third stratum
Individualized factors linked to each of the second stratum abilities E.g., general reasoning, quantitative reasoning, and Piagetian reasoning are linked to fluid intelligence (Gf)
In infancy
Intellectual assessment consists of measuring sensorimotor development
In older children
Intellectual assessment focuses on verbal and performance abilities
clinically relevant information or learning potential
Intelligence tests are rarely administered to adults for purposes of education placement, but rather to ascertain what?
Speed tests
Item analyses of tests taken under speed conditions yield misleading or uninterpretable results. The closer an item is on the end of the test, the more difficult it may appear to be
culture, culture-free
Items on an intelligence test tend to reflect the ______________ of the society where the test is employed and thus many theorists have expressed a desire to develop ____________________ intelligence test
Guttman Scale
Items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured. the lowest is interpreted at the baseline level and that means that you endorse something higher up then you also subscribe to everything else underneath it. All respondents who agree with the stronger statement of the attitude will also agree with milder statements
Constructed-response format
Items require testtakers to supply or to create the correct answer, not merely select it. ex: essay question
individual administration
Kaufman Adolescent and Adult Intelligence Test (KAIT) Kaufman Brief Intelligence Test (K-BIT) Kaufman Assessment Battery for Children (K-ABC)
Visual specialization language skill-related tasks
Males tend to outperform females on tasks requiring ______________ __________________, while females tend to excel at _________________ ____________ ________________ ___________
1. test are homogenous or heterogenous by nature 2. the characteristic, trait, or ability being measure is presumed to be dynamic or static 3. the range of test scores is or is not restricted
Nature of test will determine reliability metric by what three things?
norms, standardized
Once a test has been finalized, _____________ may be developed from the data and it is said to be __________________________
Multidimensional
Other rating scales are ________________________ meaning that more than one dimension is thought to underlie the rating more things that are underlying in the construct and in characteristics
age
Previous versions of the standford binet intelligence scale organized the items by what at which most testtakers should be able to respond correctly. This change was theory driven, based in part on the cattell-horn model of intelligence
low
Projective test, such as Rorschach tent to be ____________ in face validity.
The Wechsler-Bellevue Scale
Provided the calculation of a verbal IQ and a Performance IQ
Item-response theory
Provides a way to model the probability that a person with X ability with be able to perform at a level of Y. Refers to a family of methods and techniques. incorporates considerations of item difficulty and discrimination Difficulty related to an item not being easily accomplished, solved, or comprehended.
total variance attributed to true variance
Reliability is the proportion of the total ___________________ attributed to true ___________________ variance
males and females
Research has examined the differences between ____________ and __________ with regard to cognitive, motor, and other abilities related to intelligence
Class scoring
Responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way. Fits best with criterion based construct
standardized conditions
Revised tests will then be administered under _________________ ____________________ to a second sample
New test development
Revision in
Types of scales
Scales are instruments to measure some trait, state or ability. May be categorized in many ways. Numbers can be assigned to responses to calculate test scores using a number of methods
Evidence of distinct groups
Scores on a test vary in a predictable way as a function of membership in some group.
high
Self report personality tests are _____________ in face validity
final product
Should be administered in the same manner, and have the same instructions, as the what?
Validity
a judgement or estimate of how well a test measures what it purports to measure in a particular context. To see how well the test measures the construct it is suppose to measure.
Expectancy table
Shows the percentage of people within specified test-score intervals who subsequently were placed in various categories of the criterion ex: corporate setting test scores may be divided into intervals and examined in relation to job performance. shows us that the higher the initial rating, the greater the probability of job success.
Evidence of changes with age
Some constructs are expected to change over time
replaced
Some items may be _________________ by others from the item pool
What is the test designed to measure? What is the objective of the test? Is there a need for this test? Who will use this test? Who will take this test? What content will the test cover? How will the test be administered? What is the ideal format of the test? Should more than one form of the test be developed? What special training will be required of test users for administering or interpreting the test? What types of responses will be required of testtakers? Who benefits from an administration of this test? Is there any potential for harm as the result of an administration of this test? How will meaning be attributed to scores on this test?
Some preliminary questions for test construction:
deviates from a true score
Standard error can be used to estimate the extend to which an observed score what?
10 subtests mean of 10, SD of 3
Subtests have a mean and SD of what and how many are there
Cattell-horn and Carroll model of cognitive ability
Synthesis of both theories
pilot studied
Test items may be ___________ _________________ to evaluate whether they should be included in the final form of the instrument
Same
Test should be tried out on the _______ population that it was designed for.
Method of paired comparisons
Test-takers must choose between two alternatives according to some rule. You have to chose one or the other We are attaching our own theories to these questions and saying that if one person selects a certain behavior then they may have these characteristics. ex: select the behavior that you think would be more justifies. a. cheating on taxes if one has a chance b. accepting a bribe in the course of one's duties For each pair of options, testtakers receive a higher score for selecting the option deemed more justifiable by the majority of a group of judges. the test score would reflect the number of times the choices of a testtaker agreed with those of the judges.
core supplemental
The Wechsler Adult intelligence scale 4th ed consists of subtests that are designated as either ___________ or _______________
Ceiling
The ________________ is the hardest item on the test and the highest amount you can have
Floor
The ___________________ on a test is easiest item or the lowest amount you can score
cultures and time
The content validity of a test varies across
item difficulty index
The proportion of responfents answering an item correctly
broader
The fifth edition of the stanford-binet intelligence scale was designed for administration to ages 2-85. it was a much _______________ range than many/most intelligence tests. the test yields a composite of score including a full scale IQ, abbreviated battery score, verbal IQ score, and nonverbal IQ score
Mean of 100, SD of 15 it is standardized
The full scale IQ and five factor index score has a mean and SD of what and it is _____________________
intelligence
The greater the magnitude of g in a test of intelligence, the better overall prediction of intelligence.
Test conceptualization
The impetus for developing a new test is some thought that "there ought to be test for...." the stimulus could be knowledge of psychometric problems with other tests, a new social phenomenon, or any number of things
Interactionism
The mechanism by which heredity and environment are presumed to interact and influence the development of intelligence
Test developer
The nature of the item analysis will vary depending on the goals of who?
Validation
The process of gathering and evaluating evidence about validity. Examining and providing evidence for or against your test testing what it is supposed to measure.
Flynn effect
The progressive rise in intelligence test scores that is expected to occur on a normed intelligence test from the date when the test was first normed
Ratio IQ
The standford binet intelligence scale has this and it is the ration of the testtaker's mental age divided by his or her chronological age, multiplied by 100 to eliminate decimals
alternate items
The standford binet intelligence scale was the first to introduce what
Deviation IQ
The standford binet scale of intelligence has this and it is a comparison of the performance of the individual with the performance of others in the same age in the standardization sample. Deviation from the norm and thi sis a norm referenced test of intelligence
nominal
The stanford binet full scale score can be converted into __________________ categories designated by cutoff boundaries for quick reference
raw score standard score
The stanford binet intelligence scale 5th ed scores on individual items for each subtest are tallied to yield a _____________ ___________ and then these are then converted using a test manual to a ______________ ______________________
Full scale IQ five factor index scores subtest
The stanford binet intelligence scales include
Qualitatively differentiable
Wechsler said that the best way to measure intelligence was by measuring several _______________________ _______________________ abilities, which were verbal or performance-based in nature.
criterion (some existing standard) related validity
This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures. Does the test map onto some existing criteria/standard.
Construct validity
This is a measure of validity that is arrived at by executing a comprehensive analysis of. Does the test measure the construct that does not really exist tangibly. How score on the test relate to other test scores and measures and how scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure. is content validity and criterion-related validity
Coefficient alpha (cronbach alpha)
Used with Internal consistency reliability. Mean of all possible split-half correlations, corrected by the spearman-brown formula. Is the popular approach for internal consistency.
Variance
Variation across score, the distribution
1. evidence of homogeneity 2. evidence of changes with age 3. evidence of pretest/post test changes 4. evidence of distinct age groups
What are 4 evidences of construct validity?
Equivalence or items on a test -whether or not interval scaling is true Greatly affected by length of tests - assumptions in classical test theory favor longer tests
What are some problems with Classical Test theory
bias rater error
What are somethings that effect constructed response format
pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication
What are testtaker variables for test administration.
test conceptualization test construction test tryout analysis revision
What are the 5 steps for test development
1. stem 2. correct answer 3. foil or distractor
What are the three elements of multiple choice
1. Divide the test into equivalent halves 2. Calculate a Pearson r between scores on the two halves of the test 3. Adjust the half-test reliability using the Spearman- Brown Formula
What are the three steps for split half reliability
Scaling method Writing items Scoring items
What are three aspects of test construction?
random and systematic error
What are two types of measurement error
Chrom box alpha
What coefficient do you use for internal consistency reliability.
Coefficient of equivalence
What coefficient do you use for parallel or alternative forms?
you are reducing items so the correlation drops
What do you want to adjust the half test reliability using spearman brown formula?
reliability of items to each other relatedness of items to each other on the test
What does factor analysis look at
True variance plus error variance
What does variance equal
reliable and valid item discriminates testtakers this is for the revision process and what should be revised or taken out
What is a good item?
+1
What is a perfect correlation?
noise outside, knocking on the door, people getting up, leaving during a test
What is an example of random error.
"i am not trying to trick you with this question"
What is an example of test administration?
0-1 1 is perfectly related 0 is no relation at all
What is the value range for chron box alpha?
Classical test theory
What is true-score model often referred to as?
Classical test theory
What theory states that the observed score = true score + error
as time passes
When does the estimates for test retest reliability tend to decrease?
Variables are stable over time
When is it most appropriate to use test retest reliability when?
Distinct processes inseparable abilities
While Galton argued that intelligence consisted of ________________ __________________ that could be assessed only by individual tests, Binet view intelligence as _________________ _________________ that required complex measurements to determine
Raters
Who may be either too lenient, too severe, or reluctant to five ratings at the extremes (central tendency error)
test developers test users
Who plays a role in the validation of a test?
This is because it is categorical and a ranking and you do not know by how much they differ. You can not determine the distance between strongly disagree and strongly agree. we are assuming they are equal in intervals but we can not truly determine if they are or are not.
Why are rating scales ordinal level data
To make sure that test a and test b both show improvement and that they are consistently showing the same results or improved results from intervention. This is so we do not have to give the same test but it tests the same information like giving an essay test and multiple choice test
Why would we want to develop alternate forms of a test?
unrelated
With average proportion distance you are looking at how _______________ each item is to other items.
Coefficient of stability Correlation
With intervals over 6 months the estimate of test retest reliability is called what? and what is the coefficient that you would use?
Factor analysis
a group of statistical techniques designed to determine the existence of underlying relationships between sets of variables
Rating scales
a grouping of words, statements, or symbols on which judgment of the strength of a particular trait, attitude, or emotion are indicated by the test taker.
Face validity
a judgement concerning how relevant the test items appear to be.
Systematic error
a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured. This type of error is consistent across all tests and administrations.
Core subtest
a subtest administered to obtain composite score
halo effect
a tendency to give a particular person a higher rating than he or she objectively deserves because of favorable overall impression
Short forms
a test that has been abbreviated in length, typically to reduce the time need for administration, scoring, and interpretation. suggest these be used for screening purposes, rather than to make placement or educational decisions
Measurement error
all of the factors associated with the process of measuring some variable, other than the variable being measured. As the information is being captured in the test is being measured incorrectly.
test-retest reliability
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Process score
an index designed to help understand the way the testtaker processes various kinds of information
reliability coefficient
an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
Criterion-related validity
an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently). this has to do with time
Predictive validity
an index of the degree to which a test score predicts some criterion, or outcome, measure in the future. Tests are evaluated as to their predictive validity. It is associated with future criterion.
item sampling
an item or question that is on a test. (all questions from the first chapter of the course)
Cumulatively scored test
assumption that the higher the score on the test, the higher the testtaker is on ability, trait, or other characteristic that the test purports to measure.
Factor analysis
can also provide an indication of whether items that are supposed to be measuring the same thing load on a common factor
Reliability
consistency of measurement
Culture-fair
culture free intelligence tests are difficult if not impossible to create, and thus __________________ intelligence tests began to be developed
Criterion referenced test
do the people meet the standard or exceed it or do they not meet this standard
Measuring intelligence
entails sampling an examinee's performance on different types of tests and tasks as a function of development level
alternate forms reliability
estimate of the extent to which different forms of the same test have been affected by item sampling error or other error. Scores may be affected by error related to the state of testtaker or item sampling.
Parallel-forms
finding the mean, median, mode and standard deviation is the same and distribution is the same. The same format and administration.
Parallel-forms
for each form of the test, the means and the variances of observed test scores are equal.
Top stratum
general intelligence
Evidence of homogeneity
how uniform a test is in measuring a single concept
Crystallized intelligence
includes acquired skills and knowledge that are dependent on exposure to a particular culture as well as on formal and informal education. Learning stuff cultural derived intelligence. Language, knowledge, vocabulary.
WPPSI-III
includes several subtests, including matrix reasoning, symbol search, word reasoning, and picture concepts
Adult
intelligence scales should tap abilities such as general information retention, quantitative reasoning, expressive language, and social judgment
Methodological error
interviewers may not be trained properly, the wording in the questionnaire may be ambiguous, or the items may be biased
split-half reliability
is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Criterion
is the standard against which a test or a test score is evaluated. An adequate one of these is relevant for the matter at hand, valid for the purpose of which it is being used, and uncontaminated, meaning it is not part of the predictor. Almost anything constitute a criterion
selected response format
items require testtakers to select a response from a set of alternative responses. Responses you can select between some options. ex: multiple choice
greater reliability
less error variance is associated with what?
less reliability
more error variance is associated with what?
Fluid intelligence
nonverbal, relatively culture free, and independent of specific instruction. Is genetically compromised and not influenced by learning, culture and context. inherited abilities. processing speed.
the true score plus error
observed score
Standfor-binet intelligence scale
originated in France for school children Alfred Binet and Theodore Simon came up with it the first published intelligence test with clear instructions on use influenced by the work of Lewis Terman at Sandford university in 1916 Converted to English and added onto the original version.
Classical test theory
perhaps the most widely used model due to its simplicity.
comprehensive sampling
provides a basis for content validity of the final version of the test
Standard error of measurement
provides a measure of the precision of an observed test score. An estimate of the amount of error inherent in a observed score or measurement
Error
refers to the component of the observed score that does not have to do with the test takers true ability or trait being measured. Anything that contribute to the score that is not the test.
Discrimination
refers to the degree to which an item differentiates among people with higher or lower levels of the trait, ability or other variables being measured.
Convergent validity
scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, tests designed to measure the same construct.
Evidence of pretest/posttest changes
test scores change as a result of some experience between a pretest and a posttest
Construct validity
the ability of a test to measure a theorized construct that it purports to measure. All types of validity evidence, including evidence from the content- and criterion-related varieties of validity, come under the umbrella of this.
Internal consistency reliability
the degree of relatedness of items on a scale or test. Able to gauge the homogeneity of a test. This is how related items are to other items on the test.
Coefficient of equivalence
the degree of the relationship between various forms of a test.
Incremental validity
the degree to which an addition predictor explains something about the criterion measure that is not explained by predictors already in use.
Item fairness
the degree, if any, a test item is biased
Culture loading
the extend to which a test incorporates the vocabulary, concepts, traditions, knowledge, and feelings associated with a particular culture.
Sampling error
the extent to which the sample differs from the population. The extent to which the population of voters in the study actually was representative to voters in the election.
Scaling
the process of setting rules for assigning numbers in measurement Quantifying different outcomes
Item pool
the reservoir or well from which items will or will not be drawn for the final version of the test. A test is drawing from this and changing between each test or the items change as the person answers Can draw the items randomly or based on how the person responds. this individualized the test.
coefficient of inter-scorer reliability
the scores from different raters are correlated with on another.
Content validity
this is a measure of validity based on an evaluation on the subject, topics, or content covered by the items in the test. Do the items on the test measure the construct it is designed to measure
Method of equal appearing intervals
this is a statistical methodological approach to addressing ordinal data
Inter-score reliability
this is for test retest reliability. The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure. Is often used with behavioral measures Guards against biases or idiosyncrasies in scoring.
Discriminate validity
validity coefficient showing little relationship between test scores and other variables with which scores on the test should not theoretically be correlated.
Test Construction
variation may exist within items on a test or between tests (i.e. item sampling or content sampling). The way that the test is made or the type of questions on a test.
g
was assumed to afford the best prediction of overall intelligence, best measured through abstract-reasoning problems. The overlap between different tests and different abilities. General knowledge Represents the portion of variance that all intelligence test have in common and the remaining portions of the variance being accounted for either by specific components (s), or by the error components (e) of this general factor.
WPPSI
was developed to assess children and racial minorities
Prevention during test development
what is the best cure for test bias?
lawshe
who developed a method whereby raters judge each item as to whether it is essential, useful but not essential, or not necessary for job performance. If more than half the raters indicate that an item is essential, the item has at least some content validity.
point scale
with the fourth edition of the standford binet scale of intelligence a what was implemented which organized subtests by category of item vs just age Now there are different subtests instead of one test
WISC-IV
yields a measure of general intellectual functioning (a full scale IQ) as well as four index scores: a Verbal Comprehension Index, a Perceptual Reasoning Index, a Working Memory Index, and a Processing Speed Index It is also possible to derive up to seven process scores