PSYC 442: Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

The Wechsler Tests

A series of individual-administered intelligence tests to assess the intellectual abilities of people from preschool through adulthood

Test blueprint

a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test. This outlines the construct.

Confidence interval

a range or band of test scores that is likely to contain the true score

Unidimensional

some rating scales are ________________________, meaning that only one dimension is presumed to underlie the rating. This is a single characteristic or construct Measuring one thing

Test Administration

sources of error may stem from the testing environment

Kuder-Richardson formula 20

statistic of choice for determining the inter-item consistency of dichotomous items or when tests have considerable heterogeneity due to multiple factors. Used when the data is scored in a dichotomous way, instead of a continuous way.

Other sources of error variance

surveys and polls usually contain some disclaimer as to the margin of error associated with their findings

Spearman

Postulated the existence of general intellectual ability factor (g) and specific factors of intelligence (s)

physical appearance and demeanor may play a role.

What are examiner variables for test administration

the question did not have a right answer on the test or putting the wrong answer in an answer key.

What are examples of systematic error?

Factor analysis

a new test should load on a common factor with other tests of the same construct

rating error

judgment resulting from the intentional or unintentional misuse of rating scale

Sir Francis Galton

who was the first person to publish on the heritability of intelligence.

A likert type scale

will have 4 or 10 options

face validity, confidence

A perceived lack of ___________________________ may lead to a lack of ___________________________ in the test measuring what it purports to measure.

Item bank

A relatively large and easily accessible collection of test questions

test scoring and interpretation

- Computer testing reduces error in test scoring but many tests still require expert interpretation (e.g. projective tests) - Subjectivity in scoring can enter into behavioral assessment

The WAIS-IV

-Contains 10 core subtests -->Block Design, Similarities, Digit Span, Matrix Reasoning, Vocabulary, Arithmetic, Symbol Search, Visual Puzzles, Information, and Coding -Five supplemental Subtests: -->Letter-Number Sequencing, Figure Weights, Comprehension, Cancellation, and Picture Completion) -4 index score derived from groups of subtests: -->Verbal Comprehension Similarities, vocabulary, information, (comprehension) Perceptual Reasoning Block design, matrix reasoning, visual puzzles, (figure weights) Working Memory Digit span, arithmetic, (letter-number sequencing) Processing Speed Symbol search, coding, (cancellation)

.5

50% so half get the item right or wrong or half agree with the construct or do not agree

Validity coefficient

A correlation that provides a measure of the relationship between test score and score on the criterion measure. are affected by restrictions or inflation of range.

Bias

A factor inherent in a test that systematically prevents accurate impartial measurement.

Item characteristic curves

A graphic representation of item difficulty and discrimination

Lower

A great average proportional distance means that the internal consistency is what?

Content validity

A judgment of how adequately a test samples behaviors is representative of the universe of behavior that the test was designed to sample. How well do the items or inner workings of the test measure what they are supposed to be measuring. Do the test items adequately represent the content that should be included in the test?

Intelligence

A multifaceted capacity that includes the abilities to Acquire and apply knowledge Reason logically, plan effectively, and infer perceptively Grasp and visualize concepts Find the right words and thoughts with facility Cope with and adjust to novel situations

equivalency in items

A negative to an item pool is that it can be low in

Random error (noise)

A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. Stuff that comes in randomly that can affect the true score and cause error.

Supplemental subtest

A subtest administered to provide additional clinical information or extend the number of abilities/process sampled

True score

A value that according to classical test theory genuinely reflects an individual's ability (or trait) level as measured by a particular test.

ordinal level data

All rating scales are what?

Item-validity index

Allows test developers to evaluate the validity of items in relation to a criterion measure. Does it measure what it purports to measure Remember though, this is at the item level

An index of the item's difficulty An index of the item's reliability An index of the item's validity An index of the item's discrimination

Among the tools test developers might employ to analyze and select items are:

Computerized adaptive testing

An interactive, computer-administered test taking process wherein items presented to the testtaker are based in part on the testtaker's performance on previous items

Norm referenced

Answers are put in a distribution.

Group administration

Army Alpha test Army Beta test School ability test California Test of Mental Maturity Kuhlmann-Anderson Intelligent Tests Henmon-Nelson Tests of Mental Ability Cognitive Abilities Tests

increases

As error goes down what happens to the amount of true score and reliability?

decreases

As error goes up what happens to the amount of true score and reliability?

decreases

As error variance goes up what happens to the true variance and reliability?

systematic variation

Bias implies what in test scores

Carroll

Came up with a three stratum theory of cognitive ability

Method of equal-appearing intervals

Can be used to obtain data that are interval in nature You are attempting to say that your scale has a set interval between items

spilt half correlation

Chron box alpha is the average of every possible combination of what?

reduce

Computer Administered testing tend to ____________ floor effect and ceiling effect

David Wechsler

Conceptualized intelligence as "the aggregate...capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment. It [is] composed of elements or abilities which...are qualitatively differentiable"

dimension constructs

Cumulatively scored tests are helpful when measuring what?

Heterogeneity

Described differences between factors of a test.

Horn and Cattell

Developed a theory of intelligence postulating the existence of two major cognitive abilities: crystallized intelligence and fluid intelligence

mastered not mastered

Development of criterion-reference test may entail exploratory work with at least two groups of testtakers: one group known to have __________________________ the knowledge or skill being measured and another group that have ________________________ it.

Alfred Binet

Did not define intelligence explicitly but instead described various components of intelligence, including reasoning, judgment, memory, and abstraction. Criticized Galton's approach to intellectual assessment and instead called for more complex measurements of intellectual ability.

Alternative forms

Different versions of a test that have been constructed so as to be parallel. Do not meet the strict requirements of parallel forms but typically item content and difficulty is similar between tests.

Likert scale

Each item presents the testtaker with five alternative responses (sometimes seven), usually on an agree-disagree or approve-disapprove continuum. This is a type of rating scale. they are typically reliable

Second stratum

Eight abilities and processes including fluid intelligence, crystallized intelligence, general memory and learning, broad visual perception, broad auditory perception, broad retrieval capacity, broad cognitive speediness, and processing speed

Factor-analytic theories of intelligence

Focus squarely on identifying the ability or groups of abilities deemed to constitute intelligence

Jean Piaget

Focused his research on the development of cognitive abilities in children. Defined intelligence as an evolving biological adaptation to the outside world; a consequence of interaction with the environment, psychological structures become reorganized

Average proportion distance

For internal consistency reliability Focuses on the degree of difference between score on test items. Involves averaging the absolute difference between score on all of the items then dividing by the number of response options on the test minus one. (7 items means divide by 6)

Low moderate high high, cut scores

For item characteristic curves On Item A, testtakers of _____ ability tend to do better. On Item B, testtakers of ______________ ability tend to do better On Item C, testtakers of ___________ ability tend to do better. It is a good item Item D shows a _________ level of discrimination. It might be good if ______ __________ are being used.

.5 .3-.8

For maximum discrimination among the abilities of the testtakers, the optimal average item difficulty is approximately ____, with individual items on the test ranging in difficulty from about ___ to _____.

Multiple-choice

Format that has three elements

Spearman-brown formula

Formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

sensory abilities

Galton believed that the most intelligent persons were equipped with the best what? By such logic tests of visual acuity or hearing ability are measurements of intelligence

Sensorimotor and perception

Galton developed many ______________________ and _______________________ related tests by which he attempted to measure his definition of intelligence.

high, correctly low, incorrectly

Generally a good item on a norm-referenced achievement test is an item for which ________________ scorers on the test respond ____________________. ____________________ scorers on the test respond ____________________.

higher, lower

Generally, the _________________ the reliability of the test, the _________________ the standard error.

Higher internal consistency

Greater chronbach alpha means a what?

reducing error

How can you improve reliability in a test?

administering two forms of a test to the same group

How is reliability check for parallel-form and alternate-form tests?

5-10

How many respondents should there be per item?

met certain criteria

Ideally, each item on a criterion-oriented test addresses the issue of whether the respondent has what?

theorized

If a test is a valid measure of a construct, higher and lower scorers should behave as ____________________?

Strengths and weaknesses eliminated

Items are evaluated as to their Some items may be

Item format

Includes variables such as the form, plan, structure, arrangement, and layout of individual test items

Item Discrimination index

Indicates how adequately an item separates or discriminants between high scorers and low scorers on an entire test At an item level does it differentiate between people who meet a certain standard and people who do not meet the certain standard. Function of item reliability index and item validity index. A measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly.

Item reliability index

Indication of the internal consistency of the scale. Function of the item-score SDs and the correlation between the item score and the total score. How related each item is to each other item on the test

Third stratum

Individualized factors linked to each of the second stratum abilities E.g., general reasoning, quantitative reasoning, and Piagetian reasoning are linked to fluid intelligence (Gf)

In infancy

Intellectual assessment consists of measuring sensorimotor development

In older children

Intellectual assessment focuses on verbal and performance abilities

clinically relevant information or learning potential

Intelligence tests are rarely administered to adults for purposes of education placement, but rather to ascertain what?

Speed tests

Item analyses of tests taken under speed conditions yield misleading or uninterpretable results. The closer an item is on the end of the test, the more difficult it may appear to be

culture, culture-free

Items on an intelligence test tend to reflect the ______________ of the society where the test is employed and thus many theorists have expressed a desire to develop ____________________ intelligence test

Guttman Scale

Items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured. the lowest is interpreted at the baseline level and that means that you endorse something higher up then you also subscribe to everything else underneath it. All respondents who agree with the stronger statement of the attitude will also agree with milder statements

Constructed-response format

Items require testtakers to supply or to create the correct answer, not merely select it. ex: essay question

individual administration

Kaufman Adolescent and Adult Intelligence Test (KAIT) Kaufman Brief Intelligence Test (K-BIT) Kaufman Assessment Battery for Children (K-ABC)

Visual specialization language skill-related tasks

Males tend to outperform females on tasks requiring ______________ __________________, while females tend to excel at _________________ ____________ ________________ ___________

1. test are homogenous or heterogenous by nature 2. the characteristic, trait, or ability being measure is presumed to be dynamic or static 3. the range of test scores is or is not restricted

Nature of test will determine reliability metric by what three things?

norms, standardized

Once a test has been finalized, _____________ may be developed from the data and it is said to be __________________________

Multidimensional

Other rating scales are ________________________ meaning that more than one dimension is thought to underlie the rating more things that are underlying in the construct and in characteristics

age

Previous versions of the standford binet intelligence scale organized the items by what at which most testtakers should be able to respond correctly. This change was theory driven, based in part on the cattell-horn model of intelligence

low

Projective test, such as Rorschach tent to be ____________ in face validity.

The Wechsler-Bellevue Scale

Provided the calculation of a verbal IQ and a Performance IQ

Item-response theory

Provides a way to model the probability that a person with X ability with be able to perform at a level of Y. Refers to a family of methods and techniques. incorporates considerations of item difficulty and discrimination Difficulty related to an item not being easily accomplished, solved, or comprehended.

total variance attributed to true variance

Reliability is the proportion of the total ___________________ attributed to true ___________________ variance

males and females

Research has examined the differences between ____________ and __________ with regard to cognitive, motor, and other abilities related to intelligence

Class scoring

Responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way. Fits best with criterion based construct

standardized conditions

Revised tests will then be administered under _________________ ____________________ to a second sample

New test development

Revision in

Types of scales

Scales are instruments to measure some trait, state or ability. May be categorized in many ways. Numbers can be assigned to responses to calculate test scores using a number of methods

Evidence of distinct groups

Scores on a test vary in a predictable way as a function of membership in some group.

high

Self report personality tests are _____________ in face validity

final product

Should be administered in the same manner, and have the same instructions, as the what?

Validity

a judgement or estimate of how well a test measures what it purports to measure in a particular context. To see how well the test measures the construct it is suppose to measure.

Expectancy table

Shows the percentage of people within specified test-score intervals who subsequently were placed in various categories of the criterion ex: corporate setting test scores may be divided into intervals and examined in relation to job performance. shows us that the higher the initial rating, the greater the probability of job success.

Evidence of changes with age

Some constructs are expected to change over time

replaced

Some items may be _________________ by others from the item pool

What is the test designed to measure? What is the objective of the test? Is there a need for this test? Who will use this test? Who will take this test? What content will the test cover? How will the test be administered? What is the ideal format of the test? Should more than one form of the test be developed? What special training will be required of test users for administering or interpreting the test? What types of responses will be required of testtakers? Who benefits from an administration of this test? Is there any potential for harm as the result of an administration of this test? How will meaning be attributed to scores on this test?

Some preliminary questions for test construction:

deviates from a true score

Standard error can be used to estimate the extend to which an observed score what?

10 subtests mean of 10, SD of 3

Subtests have a mean and SD of what and how many are there

Cattell-horn and Carroll model of cognitive ability

Synthesis of both theories

pilot studied

Test items may be ___________ _________________ to evaluate whether they should be included in the final form of the instrument

Same

Test should be tried out on the _______ population that it was designed for.

Method of paired comparisons

Test-takers must choose between two alternatives according to some rule. You have to chose one or the other We are attaching our own theories to these questions and saying that if one person selects a certain behavior then they may have these characteristics. ex: select the behavior that you think would be more justifies. a. cheating on taxes if one has a chance b. accepting a bribe in the course of one's duties For each pair of options, testtakers receive a higher score for selecting the option deemed more justifiable by the majority of a group of judges. the test score would reflect the number of times the choices of a testtaker agreed with those of the judges.

core supplemental

The Wechsler Adult intelligence scale 4th ed consists of subtests that are designated as either ___________ or _______________

Ceiling

The ________________ is the hardest item on the test and the highest amount you can have

Floor

The ___________________ on a test is easiest item or the lowest amount you can score

cultures and time

The content validity of a test varies across

item difficulty index

The proportion of responfents answering an item correctly

broader

The fifth edition of the stanford-binet intelligence scale was designed for administration to ages 2-85. it was a much _______________ range than many/most intelligence tests. the test yields a composite of score including a full scale IQ, abbreviated battery score, verbal IQ score, and nonverbal IQ score

Mean of 100, SD of 15 it is standardized

The full scale IQ and five factor index score has a mean and SD of what and it is _____________________

intelligence

The greater the magnitude of g in a test of intelligence, the better overall prediction of intelligence.

Test conceptualization

The impetus for developing a new test is some thought that "there ought to be test for...." the stimulus could be knowledge of psychometric problems with other tests, a new social phenomenon, or any number of things

Interactionism

The mechanism by which heredity and environment are presumed to interact and influence the development of intelligence

Test developer

The nature of the item analysis will vary depending on the goals of who?

Validation

The process of gathering and evaluating evidence about validity. Examining and providing evidence for or against your test testing what it is supposed to measure.

Flynn effect

The progressive rise in intelligence test scores that is expected to occur on a normed intelligence test from the date when the test was first normed

Ratio IQ

The standford binet intelligence scale has this and it is the ration of the testtaker's mental age divided by his or her chronological age, multiplied by 100 to eliminate decimals

alternate items

The standford binet intelligence scale was the first to introduce what

Deviation IQ

The standford binet scale of intelligence has this and it is a comparison of the performance of the individual with the performance of others in the same age in the standardization sample. Deviation from the norm and thi sis a norm referenced test of intelligence

nominal

The stanford binet full scale score can be converted into __________________ categories designated by cutoff boundaries for quick reference

raw score standard score

The stanford binet intelligence scale 5th ed scores on individual items for each subtest are tallied to yield a _____________ ___________ and then these are then converted using a test manual to a ______________ ______________________

Full scale IQ five factor index scores subtest

The stanford binet intelligence scales include

Qualitatively differentiable

Wechsler said that the best way to measure intelligence was by measuring several _______________________ _______________________ abilities, which were verbal or performance-based in nature.

criterion (some existing standard) related validity

This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures. Does the test map onto some existing criteria/standard.

Construct validity

This is a measure of validity that is arrived at by executing a comprehensive analysis of. Does the test measure the construct that does not really exist tangibly. How score on the test relate to other test scores and measures and how scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure. is content validity and criterion-related validity

Coefficient alpha (cronbach alpha)

Used with Internal consistency reliability. Mean of all possible split-half correlations, corrected by the spearman-brown formula. Is the popular approach for internal consistency.

Variance

Variation across score, the distribution

1. evidence of homogeneity 2. evidence of changes with age 3. evidence of pretest/post test changes 4. evidence of distinct age groups

What are 4 evidences of construct validity?

Equivalence or items on a test -whether or not interval scaling is true Greatly affected by length of tests - assumptions in classical test theory favor longer tests

What are some problems with Classical Test theory

bias rater error

What are somethings that effect constructed response format

pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication

What are testtaker variables for test administration.

test conceptualization test construction test tryout analysis revision

What are the 5 steps for test development

1. stem 2. correct answer 3. foil or distractor

What are the three elements of multiple choice

1. Divide the test into equivalent halves 2. Calculate a Pearson r between scores on the two halves of the test 3. Adjust the half-test reliability using the Spearman- Brown Formula

What are the three steps for split half reliability

Scaling method Writing items Scoring items

What are three aspects of test construction?

random and systematic error

What are two types of measurement error

Chrom box alpha

What coefficient do you use for internal consistency reliability.

Coefficient of equivalence

What coefficient do you use for parallel or alternative forms?

you are reducing items so the correlation drops

What do you want to adjust the half test reliability using spearman brown formula?

reliability of items to each other relatedness of items to each other on the test

What does factor analysis look at

True variance plus error variance

What does variance equal

reliable and valid item discriminates testtakers this is for the revision process and what should be revised or taken out

What is a good item?

+1

What is a perfect correlation?

noise outside, knocking on the door, people getting up, leaving during a test

What is an example of random error.

"i am not trying to trick you with this question"

What is an example of test administration?

0-1 1 is perfectly related 0 is no relation at all

What is the value range for chron box alpha?

Classical test theory

What is true-score model often referred to as?

Classical test theory

What theory states that the observed score = true score + error

as time passes

When does the estimates for test retest reliability tend to decrease?

Variables are stable over time

When is it most appropriate to use test retest reliability when?

Distinct processes inseparable abilities

While Galton argued that intelligence consisted of ________________ __________________ that could be assessed only by individual tests, Binet view intelligence as _________________ _________________ that required complex measurements to determine

Raters

Who may be either too lenient, too severe, or reluctant to five ratings at the extremes (central tendency error)

test developers test users

Who plays a role in the validation of a test?

This is because it is categorical and a ranking and you do not know by how much they differ. You can not determine the distance between strongly disagree and strongly agree. we are assuming they are equal in intervals but we can not truly determine if they are or are not.

Why are rating scales ordinal level data

To make sure that test a and test b both show improvement and that they are consistently showing the same results or improved results from intervention. This is so we do not have to give the same test but it tests the same information like giving an essay test and multiple choice test

Why would we want to develop alternate forms of a test?

unrelated

With average proportion distance you are looking at how _______________ each item is to other items.

Coefficient of stability Correlation

With intervals over 6 months the estimate of test retest reliability is called what? and what is the coefficient that you would use?

Factor analysis

a group of statistical techniques designed to determine the existence of underlying relationships between sets of variables

Rating scales

a grouping of words, statements, or symbols on which judgment of the strength of a particular trait, attitude, or emotion are indicated by the test taker.

Face validity

a judgement concerning how relevant the test items appear to be.

Systematic error

a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured. This type of error is consistent across all tests and administrations.

Core subtest

a subtest administered to obtain composite score

halo effect

a tendency to give a particular person a higher rating than he or she objectively deserves because of favorable overall impression

Short forms

a test that has been abbreviated in length, typically to reduce the time need for administration, scoring, and interpretation. suggest these be used for screening purposes, rather than to make placement or educational decisions

Measurement error

all of the factors associated with the process of measuring some variable, other than the variable being measured. As the information is being captured in the test is being measured incorrectly.

test-retest reliability

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

Process score

an index designed to help understand the way the testtaker processes various kinds of information

reliability coefficient

an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

Criterion-related validity

an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently). this has to do with time

Predictive validity

an index of the degree to which a test score predicts some criterion, or outcome, measure in the future. Tests are evaluated as to their predictive validity. It is associated with future criterion.

item sampling

an item or question that is on a test. (all questions from the first chapter of the course)

Cumulatively scored test

assumption that the higher the score on the test, the higher the testtaker is on ability, trait, or other characteristic that the test purports to measure.

Factor analysis

can also provide an indication of whether items that are supposed to be measuring the same thing load on a common factor

Reliability

consistency of measurement

Culture-fair

culture free intelligence tests are difficult if not impossible to create, and thus __________________ intelligence tests began to be developed

Criterion referenced test

do the people meet the standard or exceed it or do they not meet this standard

Measuring intelligence

entails sampling an examinee's performance on different types of tests and tasks as a function of development level

alternate forms reliability

estimate of the extent to which different forms of the same test have been affected by item sampling error or other error. Scores may be affected by error related to the state of testtaker or item sampling.

Parallel-forms

finding the mean, median, mode and standard deviation is the same and distribution is the same. The same format and administration.

Parallel-forms

for each form of the test, the means and the variances of observed test scores are equal.

Top stratum

general intelligence

Evidence of homogeneity

how uniform a test is in measuring a single concept

Crystallized intelligence

includes acquired skills and knowledge that are dependent on exposure to a particular culture as well as on formal and informal education. Learning stuff cultural derived intelligence. Language, knowledge, vocabulary.

WPPSI-III

includes several subtests, including matrix reasoning, symbol search, word reasoning, and picture concepts

Adult

intelligence scales should tap abilities such as general information retention, quantitative reasoning, expressive language, and social judgment

Methodological error

interviewers may not be trained properly, the wording in the questionnaire may be ambiguous, or the items may be biased

split-half reliability

is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

Criterion

is the standard against which a test or a test score is evaluated. An adequate one of these is relevant for the matter at hand, valid for the purpose of which it is being used, and uncontaminated, meaning it is not part of the predictor. Almost anything constitute a criterion

selected response format

items require testtakers to select a response from a set of alternative responses. Responses you can select between some options. ex: multiple choice

greater reliability

less error variance is associated with what?

less reliability

more error variance is associated with what?

Fluid intelligence

nonverbal, relatively culture free, and independent of specific instruction. Is genetically compromised and not influenced by learning, culture and context. inherited abilities. processing speed.

the true score plus error

observed score

Standfor-binet intelligence scale

originated in France for school children Alfred Binet and Theodore Simon came up with it the first published intelligence test with clear instructions on use influenced by the work of Lewis Terman at Sandford university in 1916 Converted to English and added onto the original version.

Classical test theory

perhaps the most widely used model due to its simplicity.

comprehensive sampling

provides a basis for content validity of the final version of the test

Standard error of measurement

provides a measure of the precision of an observed test score. An estimate of the amount of error inherent in a observed score or measurement

Error

refers to the component of the observed score that does not have to do with the test takers true ability or trait being measured. Anything that contribute to the score that is not the test.

Discrimination

refers to the degree to which an item differentiates among people with higher or lower levels of the trait, ability or other variables being measured.

Convergent validity

scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, tests designed to measure the same construct.

Evidence of pretest/posttest changes

test scores change as a result of some experience between a pretest and a posttest

Construct validity

the ability of a test to measure a theorized construct that it purports to measure. All types of validity evidence, including evidence from the content- and criterion-related varieties of validity, come under the umbrella of this.

Internal consistency reliability

the degree of relatedness of items on a scale or test. Able to gauge the homogeneity of a test. This is how related items are to other items on the test.

Coefficient of equivalence

the degree of the relationship between various forms of a test.

Incremental validity

the degree to which an addition predictor explains something about the criterion measure that is not explained by predictors already in use.

Item fairness

the degree, if any, a test item is biased

Culture loading

the extend to which a test incorporates the vocabulary, concepts, traditions, knowledge, and feelings associated with a particular culture.

Sampling error

the extent to which the sample differs from the population. The extent to which the population of voters in the study actually was representative to voters in the election.

Scaling

the process of setting rules for assigning numbers in measurement Quantifying different outcomes

Item pool

the reservoir or well from which items will or will not be drawn for the final version of the test. A test is drawing from this and changing between each test or the items change as the person answers Can draw the items randomly or based on how the person responds. this individualized the test.

coefficient of inter-scorer reliability

the scores from different raters are correlated with on another.

Content validity

this is a measure of validity based on an evaluation on the subject, topics, or content covered by the items in the test. Do the items on the test measure the construct it is designed to measure

Method of equal appearing intervals

this is a statistical methodological approach to addressing ordinal data

Inter-score reliability

this is for test retest reliability. The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure. Is often used with behavioral measures Guards against biases or idiosyncrasies in scoring.

Discriminate validity

validity coefficient showing little relationship between test scores and other variables with which scores on the test should not theoretically be correlated.

Test Construction

variation may exist within items on a test or between tests (i.e. item sampling or content sampling). The way that the test is made or the type of questions on a test.

g

was assumed to afford the best prediction of overall intelligence, best measured through abstract-reasoning problems. The overlap between different tests and different abilities. General knowledge Represents the portion of variance that all intelligence test have in common and the remaining portions of the variance being accounted for either by specific components (s), or by the error components (e) of this general factor.

WPPSI

was developed to assess children and racial minorities

Prevention during test development

what is the best cure for test bias?

lawshe

who developed a method whereby raters judge each item as to whether it is essential, useful but not essential, or not necessary for job performance. If more than half the raters indicate that an item is essential, the item has at least some content validity.

point scale

with the fourth edition of the standford binet scale of intelligence a what was implemented which organized subtests by category of item vs just age Now there are different subtests instead of one test

WISC-IV

yields a measure of general intellectual functioning (a full scale IQ) as well as four index scores: a Verbal Comprehension Index, a Perceptual Reasoning Index, a Working Memory Index, and a Processing Speed Index It is also possible to derive up to seven process scores


Conjuntos de estudio relacionados

Primary Greatness by Stephen Covey (Concepts)

View Set

Life Insurance and Health Insurance

View Set

Chapter #7 Athletic Training Exam #2

View Set

Thinking Like a Scientist and Engineer (English Only)

View Set

Writing an Argumentative Editorial about Initiating Change, Comparing Accounts of Iqbal's Story, Word Choice and Author's Purpose in Warriors Don't Cry

View Set