Psychological Assessment (Reviewer)

Ace your homework & exams now with Quizwiz!

Sources of error variance

1. Assessee 2. Assessor 3. Measuring instruments

The computation of a coefficient of split-half reliability generally entails three steps:

1. Divide the test into equivalent halves. 2. Calculate a Pearson r between scores on the two halves of the test 3. Adjust the half-test reliability using Spearman-Brown formula.

2 types of sampling procedure

1. Purposive sampling 2. Incidental sampling

Sources of error variance during test administration

1. Test environment like room temperature, level of lightning and amount of ventilation and noise. 2. Test taker variables like pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication 3. Formal learning experiences, casual life experiences, therapy, illness and changes in mood or mental state 4. Body weight, can be source of error variance 5. Examiner-related variables are potential sources of error variance.

Basic 3 approaches to the estimation of reliability

1. Test-retest 2. Alternate or parallel forms 3. Internal or inter-item consistency

Types of Norms

1. age norms 2. grade norms 3. national norms 4. national anchor norms 5. local norms 6. norms from a fixed reference group 7. subgroup norms 8. percentile norms

Validity

A test is considered valid for a particular purpose if it does. It measures what it purports to measure. A test reaction time is a valid test if it accurately measures reaction time. A test of intelligence is a valid test if it truly measures intelligence.

Construct

An informed, scientific concept developed or constructed to describe or explain behavior. We can't see, hear, or touch construct but we can infer their existence from an overt behavior

Traits

Any distinguishable, relatively enduring way in which one individual varies from another.

Test Scores

Are always subject to questions about the degree to which the measurement process includes error.

Grade norms

Are developed by administering the test to representative samples of children over a range of consecutive grade levels.

Norm-referenced testing and assessment

As a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker score and comparing it to scores of a group of test takers.

Which of the following don't describe the definition of error? a) refer to mistakes, miscalculations and the like. b) Traditionally refers to something that is more than expected; it is a component of the measurement process. c) Refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test. d) are always subject to questions about the degree to which the measurement process which includes miscalculations.

D

Grade norms

Designed to indicate the average test performance of test takers in a given school grade.

States

Distinguish one person from another but are relatively less enduring.

Refer to mistakes, miscalculations and the like.

Error

Refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test.

Error

Traditionally refers to something that is more than expected; it is a component of the measurement process.

Error

decision study developers

In the _______________________ examine the usefulness of test scores in helping the test user make decisions.

average proportional distance (APD)

It is a measure that focuses on the degree of difference that exists between item scores.

average proportional distance (APD)

It is a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.

Domain sampling theory

It is a test reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample.

Test-retest reliability

It is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time such as personality trait.

Variance

It is the standard deviation squared. A statistic useful in describing sources of test score variability is the ______________.

Split-half estimate

Methods of obtaining internal consistency estimates of reliability. Split-half estimate

Norms

Provide a standard with which the results of measurement can be compared.

Assumption 1

Psychological Traits and States Exist

Assumption 2

Psychological Traits and States can be Quantified and Measured

Categories of Measurement Error

Random error Systematic error

Systematic error

Refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

language of generalizability theory

Stated in the _________________________, generalizability study examines how much of impact different facets of the universe have on the test score.

Sources of Error Variance

Test construction, administration, scoring, and/or interpretation

Assumption 3

Test-Related Behavior Predicts Non-Test-Related Behavior

Assumption 6

Testing and Assessment can be conducted in a fair and unbiased manner

Assumption 7

Testing and assessment benefit society

Assumption 4

Tests and other Measurement Techniques Have Strengths and Weaknesses

test score, cumulative scoring

The ______________ is presumed to represent the strength of the targeted ability or trait or state and is frequently based on _______________.

Classical test theory or true score theory

The assumption is made that each test taker has a true score on a test that would be obtained but for the action of measurement error.

Error variance

The component of a test score attributable to sources other than the trait or ability measured.

administering, scoring, interpretation

The criteria for a good test would include clear instructions for ___________________, ____________, and ________________.

Reliability

The criterion of __________________ involves the consistency of the measuring tool. The precision with which the test measures and the extent to which error is present in measurements. In theory, the perfectly reliable measuring tool consistently measures in the same way.

coefficient of equivalence.

The degree of relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability which is often termed as ____________________.

Sampling

The process of selecting the portion of the universe deemed to be representative of the whole population.

A common goal of norm-referenced tests is to yield information on a test taker's standing or ranking relative to some comparison group of test takers.

True

A good test is a useful test, one can yields actionable results that will ultimately benefit individual test takers or society at large.

True

A good test is one that trained examiners can administer, score, and interpret with a minimum of difficulty.

True

A good test or more generally, a good measuring tool or procedure is reliable.

True

An estimate of the reliability of a test can be obtained without developing an alternate form of the test and without having to administer the test twice to some people

True

Assumption 2. Psychological Traits and States can be Quantified and Measured Measuring traits and states by means of a test entail developing not only appropriate test items but also appropriate ways to score the test and interpret the results.

True

Competent test users understand a great deal about the tests they use. They understand among other things, how a test was developed, the circumstances under which it is appropriate to administer the test, how the test should be administered and to whom, and how the test result should be interpreted.

True

Competent test users understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources.

True

In a world without tests, teachers and school administrators could arbitrarily place children in different types of special classes simply because that is where they believed the children belonged.

True

In a world without tests, there would be a great need for instruments to diagnose educational difficulties in reading and math and point the way to remediation.

True

In the context of IRT discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured.

True

One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test was intended.

True

Patterns of answer to true-false questions on one widely used test of personality are used in decision making regarding mental disorder.

True

Scorers and scoring system are potential sources of error variance.

True

Split-half reliability is also referred to odd-even reliability

True

Test are said to be homogeneous if they contain items that measure a single trait.

True

The criteria for a good test would include clear instructions for administering, scoring, and interpretation. It would also seem to be a plus if a test offered economy in time and money it took to administer, score, and interpret it.

True

The extent to which a test takers score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance.

True

The greater the proportion of the total variance attributed to true variance, the more reliable the test.

True

The influence of particular facets on the test score is represented by coefficients of generalizability

True

The simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation- coefficient of inter-scorer reliability.

True

The tasks in some tests mimic the actual behaviors that the test user is attempting to understand.

True

There are IRT models designed to handle data resulting from the administration of tests with Dichotomous test items

True

Today all major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual.

True

Assumption 5

Various Sources of Error Are Part of the Assessment Process

average proportional distance (APD)

a relatively new measure for evaluating the internal consistency of a test.

Criterion

a standard on which a judgment or decision may be based.

Fixed reference group scoring system

a system of scoring wherein the distribution of scores obtained on the test from one group of test takers is used as the basis for the calculation of test scores for future administrations.

Dichotomous test item

a test item or question that can be answered with only one of two response options such as true or false or yes-no.

Polytomous test items

a test item or question with three or more alternative responses where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct.

Static characteristics

a trait, state or ability presumed to be relatively unchanging overtime; contrast with dynamic

Norm-referenced

a way to derive meaning from a test scores. An approach to evaluate the test score in relation to other scores on the same set.

The Spearman-Brown formula

allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

Age norms

also known as age-equivalent scores, indicate the average performance of different samples of test takers who were at various ages at the time the test was administered.

Criterion-referenced testing and assessment

also referred to as criterion-referenced or domain-referenced testing and assessment. A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard; contrast with norm-referenced testing and assessment.

Restriction or inflation of range

also referred to as inflation of variance, a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedures used and so the resulting correlations coefficients tends to be higher contrast with restriction of range.

National anchor norms

an equivalency table for scores on two nationally standardized test designed to measure the same thing.

Internal consistency estimates of reliability

an estimate of reliability of a test obtained from a measure of inner-item consistency

Item response theory (IRT)

another alternative to the true score model. It is also referred to as latent-trait theory or the latent-trait model, a system of assumptions about measurement and the extent to which each test items measures the trait.

National norms

are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.

Alternate forms

are simply different versions of a test that have been constructed so as to be parallel.

Percentile norms

are the raw data from a test's standardization sample converted to percentile form.

Norm in the psychometric context

are the test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individuals test scores.

True score

as a value that according to classical test theory genuinely reflects an individual's ability level as measured by a particular test.

Heterogeneity

describes the degree to which a test measures different factor.

coefficient alpha

developed by Cronbach and elaborated on by others

Generalizability study

examines how generalizable scores from a particular test are if the test is administered in different situations.

Parallel forms of a test

exist when for each form of the test the means and the variances of observed test scores are equal.

Speed tests

generally contain items of uniform level of difficulty so that when given generous time limits all test takers should be able to complete all the test items correctly.

Random error

is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.

dynamic characteristic

is a trait state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.

Classical test theory

is also referred to as the true score model of measurement. It is the most widely used and accepted model in the psychometric literature today.

Test-retest reliability

is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.

Percentile

is an expression of the percentage of people whose score on a test or measure falls below a particular raw score. It is a converted score that refers to a percentage of test takers.

Heterogeneous test

is composed of items that measure more than one trait.

Criterion-referenced tests

is designed to provide an indication of where a test taker stands with respect to some variable or criterion such as an educational or a vocational objective.

Split-half reliability

is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

Item sampling or content sampling

is one source of variance during test construction. It refers to the variation among items within a test as well to variation among items between tests.

Stratified-random sampling

is the process of developing a sample based on specific subgroups of a population in which every member has the same chance of being included in the sample.

Stratified sampling

is the process of developing a sample based on specific subgroups of a population.

Fixed reference group

is used as the basis for the calculation of test scores for future administration of the test.

Power tests

is when a time limit is long enough to allow test takers to attempt all items and if some items are so difficult that no test taker is able to obtain a perfect score.

Coefficient of stability

is when the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as _________.

Criterion-referenced testing and assessment

may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard.

Subgroup norms

norms for any defined group within a large group.

Local norms

provide normative information with respect to the local population's performance on some test.

Norm

refer to behavior that is usual, average, normal, standard, expected, or typical.

Incidental sampling

referred to as convenience sampling. The process of arbitrarily selecting some people to be part of sample because they are readily available, not because they are most representative of the population being studied.

Parallel forms reliability

refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when for each form of the test, the means and variances of observed tests scores are equal.

Alternate forms reliability

refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error or other error.

Overt behavior

refers to an observable action or product of an observable action.

Measurement error

refers to collectively, all of the factors associated with the process of measuring some variable, other than the variable being measured.

Error

refers to the component of the observed test score that does not have to do with the test takers ability.

Inter-item consistency

refers to the degree of correlation among all the items on a scale. A measure of inter-item consistency is calculated from a single administration of a single trait.

Percentage correct

refers to the distribution of raw scores more specifically to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.

Reliability

refers to the proportion of the total variance attributed to true variance

Domain sampling theory

seek to estimate the extent to which specific sources of variations under defined conditions are contributing to the test score.

Sampling

some defined group as the population for which the test is designed.

Classical test theory

states that a score on an ability test is presumed to reflect not only the test takers true score on the ability being measured but also error.

generalizability

study examines how much of impact different facets of the universe have on the test score.

Purposive sampling

the arbitrary selection of people to be part of a sample because they are thought to be representative of the population being studied.

Equipercentile method

the equivalency of scores on different tests is calculated with reference to corresponding percentile score.

Coefficient alpha

the mean of all possible split-half correlation

Standardization or test standardization

the process of administering a test to representative sample of test takers for the purpose of establishing norms.

Error variance

variance from irrelevant, random sources

true variance

variance from true differences


Related study sets

AP Gov Political Participation AP Classroom Questions

View Set

Cost Accounting 1-4 - Accounting for Manufacturing Overhead

View Set

Intro to Organizational Management // Ch. 14 Teamwork

View Set

Scrum Training Deck #1 (Home of Scrum)

View Set

Creative Play Exam 1 Study guide

View Set