NCE-assessments and testing
Best practices
Counselor thoroughly understands the results Counselor should explain results in easily understood terms, and be able to provide supporting details and norms as needed Counselor should explain and understand average scores and meanings of results* Counselor should allow the client to ask questions and review aspects of the test to ensure understanding Counselor must explain the ramifications and limitations of any data obtained through testing
Standardized Scores z-scores T scores
"common language" that we use to compare several different test sores for the same individual occur by converting raw score distributions these derived scores provide for constant normative or relative meaning allow for comparisons between individuals Specifically, they EXPRESS THE PERSON'S DISTANCE FROM THE MEAN IN TERMS OF THE STANDARD DEVIATION of that standard score distribution they are continuous and have equality of units There are two most commonly used: - -
Construct Validity Convergent Validity Discriminant Validity
A test that measures abstract traits or theories, and isn't inadvertently tesing another variable. For example, a math test with complex word problems may be assessing reading skills Two subtypes are needed to assess
Equivalence
ALTERNATE FORMS OF THE SAME TEST are administered to the same group and the correlation between them is calculated How comparable the forms of the test are will influence this reliability
Major Types of Tests and Inventories
Achievement test Aptitude test Intelligence Test Occupational Test Personality Test
Test-Retest
Administering the same test twice to a group of individuals, then correlating the scores to evaluate stability
Occupational Test
Assess skills, values, or interest as they relate to vocational and occupational choices EX.- O*NET Interest Profiler, Career Assessment Inventory, Self-Directed Search
Norm referenced Criterion referenced Ipsatively interpreted
Assessments may be: - - -
Personality Test
Can be objective (rating scale based) or projective (self-reporting based), and help the counselor and client understand personality trains and underlying beliefs and behaviors EX.- Myers-Briggs Type Inventory (MBTI) Minnesota Multiphasic Personality Inventory (MMPI-2) Beck Depression Inventory, Projective test- Rorschach (inkblot) reveals unconscious thoughts, motives and views
Inter-Rater Reliability
Checks to see that raters (those administering, grading, or judging a measure) do so in agreement. Each rater should value the same measures and at the same degree to ensure consistency Prevents overly subjective ratings, since rater is measuring on the same terms
Face Validity
Commonsense view that a test measures what it should or looks accurate from a non-professional viewpoint
34% and 34% = 68% 13.5 % and 13.5 % = 95% 2% and 2% = 99%
Counselors should be familiar with the distribution of scores within the normal curve: _____________, and comprise one standard deviation _____________, and comprises two standard deviations _____________, and comprise three standard deviations
Average Inter-Item Correlation
Determines if scores on one item relate to the scores on all of the other items in that scale Ensuring that each correlation between items is a form of redundancy to ensure the same content is assessed with each question
intelligence achievement aptitude personality interests
Different types of test: - - - - -
Curricular Validity
Evaluated by experts, and measures that a test aligns with eh curriculum being tested For example, a high school exit exam measures the information taught in the high school curriculum
Criterion Validity Predictive validity Concurrent Validity
Measures success and the relationship between a test score and an outcome, such as scores on the SAT and success in college Two Subtypes:
Parallel-Forms Reliability (aka equivalence)
Involves administering two different versions of an assessment that measure that same set of skills, knowledge, etc. and tehn correlating the results. A test can be writing and split into two parts, thus creating parallel versions
Intelligence test
Measure mental capacity and potential EX. WISC, WAIS, WPPSI, Woodcock-Johnson, Kaufman Assessment Battery for Children
Aptitude Test
Measure the capacity for learning and can be used as part of a job application Can measure abstract/conceptual reasoning, verbal reasoning, and/or numerical reasoning EX- Differential Aptitude Test (DAT), Wonderlic Cognitive Ability Test, Career Ability Placement Survey (CAPS)
Achievement Test
Measures knowledge of a specific subject and are primarily used in education EX- exit exams for high school, GED
Range Standard Deviation Variance
Measures of variability: - - -
Validity face Validity Curricular Validity Criterion Validity Constructive Validity
Refers to how well a test or assessment measures what it's intended to measure For example, an assessment on depression should only measure the degree to which an individual meets the diagnostic criteria for depression Though it does indicate reliability, a test can be reliable but not be _____ • Four types:
Internal Consistency Average inter-item correlation Split-half Reliability
Refers to how well a test or assessment measures what it's intended to measure, while producing similar results each time. Questions on an assessment should be similar and in agreement, but not repetitive High _________ indicates that a measure is reliable Involves: - -
Consequential Validity
Social consequences of testing Though not all researchers feel it's a true measure of validity, some believe a test must benefit society in order to be considered valid
Objective Test Items
Standardized questions with clear correct or incorrect answers; not open to any interpretation
Correlation Coefficient
Statistic that describes the relationship between two variables and their impact on one another. In positive correlation, both variables react in the same direction. In negative correlation, variables react in opposite direction
reliable / valid Valid / reliable
Test may be ______ BUT NOT ________ _______ are ________ unless of course there is a change in the underlying trait or characteristic which might occur through maturation, training, or development
Power Based Speed Based
Test may be: ____________ : no time limits or very generous ones (such as NCE) ______________: timed, the emphasis is placed on speed and accuracy. (EX. measure of intelligence, ability, and aptitude)
T score T / ten (T)ransforming
The mean of this standardized score scale is 50 and the standard deviation is 10. By Transforming this standard score, negative scores are eliminated unlike the z-score. The ___ should remind you of ____ which is the standard deviation of this distribution
Split-Half Reliability
The random division of questions into two sets Results of both halves are compared to ensure correlation
Internal Consistency split half method Spearman-Brown Formula interitem consistency Kuder-Richardson formulas (there are two) Cronbach Alfa coefficient
Two methods: ____________ - the test is divided into two halves. The correlation between the two halves is calculated. *note: when you reduce the length of the test with this method, you necessarily reduce its measured reliability. ....to help you may apply the ________ to see how reliable the test would be had you not split it in two _________ - the more homogeneous the items, the more reliable the test. ______ are used if the test contains dichotomous items (yes or no; true or false) _______ is applied if the instrument contains nondichotomous items (essay, multiple choice)
Stability Equivalence Internal consistency (interrater)
Types of reliability - - -
Face Content Predictive Concurrent Construct
Types of validity - - - - -
Convergent Validity
Use two sets of tests to determine that the same attributes are being measured and correlated For example, two separate tests can measure students similarly
Discriminant Validity
Using tests that measure differently and results that don't correlate
standardized nonstandardized
_________ - the instruments are administered in a formal, structured procedure and the scoring is specified _________ - there are no formal or routine instructions for administration or for scoring. Some example may be checklists or rating scales
Intrusive (or reactive) measurement Unobtrusive (or nonreactive) measurement
_____________ - means the participant knows he or she is being watched or questioned and this knowledge may affect his or her performance. Examples- questionnaires, interviews, or observation ___________ - means data is collected without the awareness of the individual, or without changing the natural course of events. Examples are reviewing existing records or unobtrusive observation
Case or historical study rating scales
_______________ - this may be an analytical and/or diagnostic investigation of a person or group _______________ - these may be used to report the degree to which an attribute or characteristic is present
Regression toward the mean statistical regression
_______________ means that if one earns a very low score (15% or lower) or very high score (85% or higher) on a pretest, the individual will probably earn a score closer to the mean on the posttest This is because of the error occurring due to change, personal and environmental factors. These factors can reliably be expected to be different on the posttest
Grade equivalent scores Age equivalent scores
________________ - scores on an achievement test are often reported as this. the individual's score is compared to the average score of others in their grade. Usually done in school settings _____________- An individuals score is compared to the average score of others at the same age
maximal performance test typical performance
a __________________ may generate a person's best performance on an aptitude or achievement test and a ______ may occur on an interest or personality test
Measures of Central tendency: - Mean - Median -Mode
a distribution of scores (measurements on a number of individuals) can be examined using the following measures: -________: the arithmetic average symbolized by X or M -________ : the middle score in a distribution of scores -________ : the most frequent score in a distribution of scores All three of these fall in the same place when the distribution of scores is symmetrical, i.e. normally distributed (not skewed.)
test battery
a group or set of tests administered to the same group and scored against a standard
Test
a measuring device or procedure
Stanine (STAndard NINE)
a nine-point scaled used to convert a test score to a single digit. They are always positive whole numbers from zero to nine
Horizontal Test
a test covering material across various subjects
Construct multiple traits Convergent validation Discrimination validation
a test has construct validity to the extent it measures some hypothetical construct such as anxiety or creativity Usually several tests or instruments are used to measure different components of the construct or of the hypothesized relationships between that construct and other constructs this is best when ________ are being measured using a variety of methods _______________ - occurs when there is high correlation between the construct under investigation and others ______________ - occurs when there is no significant correlation between the construct under investigation and others
Percentile
a value below which a specified percentage of cases fall ex. - 75%. This score is higher than 74% of the scores; 25% of the scores are higher than this score
Norm referenced
comparing individuals to others who have taken the test before may be national, state, or local in this testing, how you compare with others is more important than what you know
Fluid:
ability to think and act quickly and to solve new problems these are skills that are independent of education and enculturation
Free Choice test
aka Liberal Choice; questions that allow for a subjective/open-ended response
Z-score
aka standard score measure the number of standard deviations a raw score is from the mean use zero as the mean
Aptitude
also called ability test, these measure the effects of general learning and are used to predict future performance
Percentile ranks
an individual's score can be compared to a group (norm group) already examined. this indicates what percentage of individuals in that group has scores above or below this individual
Psychological Assessment
an informal process of testing, interviews or observations used to determine the psychological needs of an individual. Assessments can expose the need for more formal testing
Halo Effect
an overgeneralized positive view of a person from limited data
Standard Error of Measurement (SEM) confidence band or confidence limits
another measure of reliability and useful in interpreting the test scores of an individual may also be referred to as ____ or _______ helps determine the range within which an individual's test score probably falls example pg 215
Obtrusive measurement
assessment tools (such as observation) conducted without knowledge of the individual
Sociometry sociogram
can be used to identify isolates, rejectees or stars (popular individuals) You can measure the structure and organization of social groups which could be a classroom of fourth graders who have been together for a few months, or a work unit It requires revealing personal feelings about others _____________ - a figure or map showing the interrelationships or structure of the group
Criterion referenced
comparing an individual's performance to some predetermined criterion which has been established as important Ex. - NCE cut off score
Ipsatively Interpreted
comparing the results on the test within the individual Ex. - looking at an individual's highs and lows on an aptitude battery which measures several aptitudes. There is no comparison with others Ex.- When an individual's score on a second test is compared to the score on the first test
Coefficient of determination
denoted by R^2, the proportion of the variance in the dependent variable that's predictbale from the independent variable and the square of the coefficient of correlation
external validity
described how well results from a study can be generalized to the larger population
Concurrent Validity
determine if measures can be substituted, such as taking an exam in place of a class. Measures must take place concurrently to accurately test for validity
Percentile
determines how test scores rank on a scale of 100. Determines the number of individuals who are at or below a given rank. For example, a test taker who scores in the 65th percentile performed better than 65 percent of the other test takers
rapport
development of trust, understanding, respect, and liking between two people; essential for an effective therapeutic relationship
Crystallized:
encompasses acquired and learned skills and is influenced by personality, motivation, education, and culture
Normal Bell Curve
essentially distributes the scores (individuals) into SIX equual parts --three above the mean and three below the mean** **See page 210
Stanine
from STAndard NINE, converts a distribution of scores into 9 parts (1 to 9) with five in the middle and a standard deviation of about 2.
measurement
general process of determining the dimensions of an attribute or trait
Predictive Validity
how useful test scores are at predicting future performance
Variance
how widely individuals in a group vary how data is distributed from the mean and the square of the standard deviation
Bell Curve
illustration of data distrubution that resembles the shape of a bell
Appraisal
implies going beyond measurement to making judgments about human attributes and behaviors and is used interchangeably with evaluation
Validity content construct criterion consequential
indicates how well any given test or assesssment measures what it's intended to measure. There are four types : * does indicate reliability
Subjective
individual perceptions/ interpretations based on feelings and opinions, but not necessarily based on fact
Personality projectives inventories specialized
is the dynamic product of genetic factors, environmental experiences, and learning to include traits and characteristics Three different types ______________________: these tests present a relatively unstructured task or stimulus. The person projects thought processes, needs, anxieties, etc.) _____________________ _____________________
Interpretation
making a statement about the meaning or usefulness of measurement data according to the professional counselor's knowledge and judgement
ipsative format
means of testing that measures how individuals prefer to respond to problems, people, and procedures and doesn't compare results to others
Normative format
means of testing to compare individuals to others
Standard deviation
measure of dispersion of numbers calculated by the square root of the variance
Difficulty Index
measure of the proportion of examinees who answer test items correctly
Reliability test-retest parallel forms inter-rater internal consistency
measures that a tool is producing consistent and stable results that must be quantified. Doesn't indicate validity. Four types:
Achievement
measures the effects of learning or a set of experiences These test may be used diagnostically
Trait
method of describing individuals through observable characteristics that are unique and distinguishable
Score
numerical value associated with a test or measure
J.P. Guilford
o Conducted psychometric studies of human intelligence and creativity in the early 1900s o Believed intelligence tests were limited and overly one-dimensional, and didn't factor in the diversity of human abilities, thinking, and creativity
Binet and Simon
o Developed first test to determine which children would succeed in school in 1900s o Focused on concept of mental age and included memory, attention, and problem solving o Brought to Stanford University, has since been revised many times and still used widely
David Wechsler
o Developed intelligence test for adults and children in children in the 1950s o Test were good at identifying learning disabilities in children o Believed intelligence has both verbal and performance components and factors other than pure intellect influenced intellectual behavior o WISC, WAIS, WPPSI
Robert Williams
o Developed the Black Intelligence Test of Cultural Homogeneity (BITCH test) to address the racial inequalities of traditional intelligence tests in 1970s o Used vernacular and experiences common to African American culture
Raymond Cattell and John Horn
o Developed theories of fluid and crystallized intelligence in 1940s
John Ertl
o Invented a neural efficiency analyzer to more effectively measure intelligence o Believed traditional intelligence tests were limited to understanding an abstract degree of intelligence o His system measured the speed and efficiency of electrical activity in the brain using an electroencephalogram (EEG)
Francis Galton
o One of the first to study intelligence in the late 1800s o Cousin to Darwin o Coined term Eugenics o Believed intelligence was genetically determined and could be promoted through selective parenting
Charles Spearman
o Responsible for bringing statistical analysis to intelligence testing in early 1900s o Proposed g Factor Theory for general intelligence, which laid the foundation for analyzing intelligence tests o Prior to him, tests weren't highly correlated with the factors the test attempted to measure
Arthur Jensen accounted for simple associative learning and memory involved more abstract and conceptual reasoning
o Supported g Factor Theory and believe intelligence consisted of two distinct sets of abilities Level I - _________________ Level II - ________________ o Believed genetic factors are were the most influential indicator of intelligence
Dichotomous Items
opposing choices on a test, such as yes/no or true/false options
Interests
preferences, likes and dislikes of an individual and more broadly includes values. These are often not stable in the teen years
Rating Scale
process of measuring degrees of experience and attitudes through questions
assessment
processes and procedures for collecting information about human behavior _______ tools include tests, inventories, rating scales, observation, interview data and other techniques
appraisal
professionally administered assessment tools and tests used to evaluate, measure, and understand clients
mean
provides the average for all scores; calculated by adding all given test scores and dividing by the number of tests
Correlation coefficient (r) cause and effect the degree of relationship
ranges from -1.00 to 1.00 shows the relationship between two sets of numbers. when a very strong correlation exists, if you know one score of an individual you can predict the other score of that person. A correlation between two variables is called ______ A correlation between three or more variables is called ___________ It can tell you NOTHING about _____, only _____
Likert Scale
rating measuring attitudes to a degree of like or dislike
Psychological Test
refers to any number of specific test or measurements conducted to evaluate, diagnose, or develop treatment plans. It can include personality assessments, projective or subjective tests, intelligence tests, or diagnostic batteries.
standard error of measurement (SEM)
refers to test reliability and the difference between the true scores vs the observed score since no test is without error, the SEM depicts the dispersion of scores of the same test to rule out errors, also referred to as the "standard error" of a score
median
refers to the middle or center number in an ordered list of scores or data; also referred to as the midpoint. In an even data set, the two middle numbers are typically averaged to determine the median
Projective Test
responses to ambiguous images that are intended to uncover unconscious desires, thoughts, or beliefs
Vertical Test
same-subject tests given to different levels or ages
Measure
score assigned to traits, behaviors, or actions
Q-Sort
self-assessment procedure requiring subjects to sort items relative to one another along a dimension, such as agree/disagree
T-Score
specific to psychometrics, used to standardize test scores and convert scores to positive numbers. Represent the number of standard deviations the score is from the mean (which is always 50)
Regression to the Mean
statistical tendency of a data series to gravitate towards the center of a distribution
range
subtraction of the lower score from the highest score
Intelligence
the ability to think in abstract terms; to learn Some also believe it is the ability to adapt to the environment and adjust to it aka general ability or cognitive ability
Reliability reliable reliable
the consistency of a test or measure the degree to which the test can be expected to provide similar results for the same subjects on repeated administration can be viewed as the extent to which a measure if FREE FROM ERROR If the instrument has little error, it is ________ a correlation coefficient is used to determine this -if the reliability coefficient is high, about .70 or higher, test scores have little error and the instrument is said to be ______
Skew positive skew (-->) negative skew (<--)
the degree to which a distribution of scores is not normally distributed - - *** see page 208
Validity
the degree to which a test measure what is purports to measure for the specific purpose for which it is used it is SITUATION SPECIFIC- depending on the purpose and population an instrument could be this for some purposes and not others
Reliability test-retest Parallel-Forms Reliability (aka equivalence) Inter-Rater Reliabiltiy Internal Consistency
the degree to which the assessment tool produces consistent and stable results Four types:
content
the instrument contains items drawn from the domain of items which could be included Ex. Two professors of Psych 101, create a final exam which covers the important content they both teach
face
the instrument looks valid Ex. A meth test has math items
z-score z / z score
the mean is 0; the standard deviation is 1.0. The range for the standard deviation is -3.0 to 3.0 the __ in ____ should remind you of ZERO which is the mean of this distribution
Skew
the measure a score deviates from the norm
Mode
the most common or frequent score that occurred in a group of tests. If a number/score occur twice, a test doesn't have one
Predictive
the predictions made by the test are confirmed by later behavior (criterion) Ex. The scores on the Graduate Record Exam predict later grade point average
Psychometrics
the process or study of psychological measurement
Concurrent
the results of the test are compared with other tests' results or behaviors (criteria) at or about the same time Ex. Scores of an art aptitude test may be compared to grades already assigned to students in an art class
Forced Choice Items
the use of two or more specific response options on a survey
Variance
this is simply the square of the standard deviation The variance does not describe the dispersion of scores as well as the standard deviation.
Stability Two weeks
this is test-retest reliability obtained using the same instrument on both occasions same group tested twice The results of the two administration are correlated the length of time and intervening experiences may influence reliability ________ is a good time between administrations
Range Inclusive range
this is the highest score minus the lowest score. Some researchers talk of ________ which is the high score minus the low score and adding one (1)/
Social desirability
this is the tendency for test takers to respond in ways they perceive to be socially desirable
semantic differential
this scale asks respondents to report where they are on dichototmous range between two affective polar opposites Very Good _________ _________ _________ Very bad
Standard Deviation mean of all the deviations
this value describes the variability within a distribution of scores We use the symbol SD to signify this of a sample this is essentially the _______________
behavioral observation
type of assessment used to document the behavior of clients or research subjects
scale nominal ordinal interval ratio
used to categorize and/or quantify variables four ___ of measurement:
observation as appraisal technique
with this technique, you observe samples from a stream of behavior. In observation, you may use schedules, coding systems, and record forms
Ethical Issues in Testing
• Counselor must be adequately trained and earn any certifications and supervision necessary to administer and interpret the test • Test must be appropriate for needs of client • Client must provide informed concerned and must understand the purpose and scope of any test**** • Test results must remain confidential • Test must be validated for the specific client and be unbiased toward the race, ethnicity, and gender of the client