Assessment and Testing
raw score
(original data that have not been converted into a derived score) lacks sufficient meaning and interpretive value.
The use of measurement rests on 3 fundamental assumptions:
1. All human attributes and behavioral expressions are distinct enough to be objectively defined and quantified. 2. All human attributes and behavioral expressions are present in all people. 3. The presence or absence of specific attributes and behavioral expressions in certain situations indicates normalcy or deficiency.
Whiston (2016) provides counselors with a 5-step process for evaluating counseling outcomes:
1. Defining the evaluation study focus. Professional counselors must first determine exactly what they want to evaluate. The focus of the study evaluation may be a specific service(s), a particular treatment or counseling intervention, or a program. 2. Determination of the evaluation design. Once the focus of the evaluation study is solidified, professional counselors must decide how they are going to evaluate outcomes by selecting an evaluation design. One of the most common evaluation designs is to administer a pretest, provide a counseling intervention (e.g., program and specific treatment), and then administer a posttest. 3. Selection of participants. Next, professional counselors need to select which clients will participate in their evaluation study. Professional counselors may invite all clients to participate, use a random sample of clients, or use a subsection of the client population (e.g., women, adolescents, Latinos). 4. Selection of assessments. In addition to selecting participants, professional counselors should decide which assessments they will use to measure a particular counseling outcome. 5. Analysis of data. Once information regarding counseling outcomes has been gathered from participants, professional counselors must analyze the data to determine the effectiveness of counseling.
The use of client assessment in counseling has several primary functions:
1. Diagnosis and treatment planning: counselors must make clinical and treatment decision ab client. 2. Placement services: Once a client is diagnosed, counselors may use additional assessment procedure to determine the program/service that the client should be placed at. 3. Evaluate counseling outcomes: counselors need to evaluate whether counseling treatment is effective overall.
5 influences on reliability of test scores:
1. Test length: Longer tests are generally more reliable than shorter tests. 2. Homogeneity of test items. Lower reliability estimates are reported when test items vary greatly in content. 3. Range restriction. The reliability of test scores will be lowered by a restriction in the range of test scores. 4. Heterogeneity of test group. Test-takers who are heterogenous on the characteristic being measures yield higher reliability estimates. 5. Speed tests. These yield spuriously high reliability coefficients because nearly every test-taker gets nearly every item correct.
Fair Access Coalition on Testing (FACT)
A nonprofit organization that works to ensure qualified professionals are given fair access to testing instruments they were trained to use
The Vocational and Technical Education Act of 1984.
Also known as the Carl D. Perkins Act, this law provides access to vocational assessment, counseling, and placement services for the economically disadvantaged, those with disabilities, individuals entering nontraditional occupations, adults in need of vocational training, single parents, those with limited English proficiency, and incarcerated individuals.
ipsative assessment
An individual's test score also can be compared against a previous test score.
indirect observation
Assesses an individual's behavior through self-report or the use of informants such as family, friends, or teachers.
Civil Rights Act of 1964 and the 1972, 1978, and 1991 amendments.
Assessments used to determine employability must relate strictly to the duties outlined in the job description and cannot discriminate based on race, color, religion, pregnancy, gender, or origin.
Bakke v. Regents of the University of California (1978).
Barred the use of quota systems for minority admissions procedures in U.S. colleges and universities.
Individuals with Disabilities Education Improvement Act of 2004 (IDEA)☐.
Confirms the right of students believed to have a disability, to receive testing at the expense of the public school system.
Americans with Disabilities Act of 1990 (ADA)
Employment testing must accurately measure a person's ability to perform pertinent job tasks without confounding the assessment results with a disability.
Family Educational Rights and Privacy☐Act of 1974 (FERPA).
Ensures the confidentiality of student test records by restricting access to scores. At the same time, this law affirms the rights of both student and parent to view student records.
normal curve equivalent (NCE)
NCEs are similar to percentile ranks in that the range is from 1 to 99, and they indicate how an individual ranked in relationship to peers. Unlike percentile ranks, NCEs divide the normal curve into 100 equal parts (see Figure 7.1). NCEs have a mean of 50 and a standard deviation of 21.06. They can be converted from a z-score by multiplying the NCE standard deviation (S D = 21.06) by an individual's z- score and adding the NCE mean (M = 50). NCE = 21.06 (z) + 50
performance assessments
Nonverbal form of assessment that entails minimal verbal communication to measure broad attributes
Sharif v. New York State Educational Department (1989).
Ruled that SAT scores alone could not be used to determine scholarship awards.
Griggs v. Duke Power Company (1971).
Ruled that assessments used in the job hiring and promotion process must be job related.
Larry P. v. Riles (1974, 1979, 1984).
Ruled that schools had used biased intelligence tests, which led to an overrepresentation of African American children in programs for students with mild intellectual disability.
Health Insurance Portability and Accountability Act (HIPAA) of 1996☐.
Secures the privacy of client records by requiring agencies to obtain client consent before releasing records to others. HIPAA also grants clients access to their records.
Computeradaptive testing
Some computer-based tests have the ability to adapt the test structure and items to the examinee's ability level.
Test Critiques
Test Critiques, also published by PRO-ED, is designed to be a companion text to Tests. Each entry in Test Critiques contains an overview of the assessment, practical applications (e.g., intended population, administration, scoring, and interpretation procedures), and information regarding the instruments validity and reliability.
Tests in Print (TIP)
Tests in Print (TIP) is published by the Buros Institute of Mental Measurements every 3 to 13 years as a companion to the MMY. It offers a comprehensive listing of all published and commercially available tests in psychology and education.
No Child Left Behind (NCLB) Act of 2001
The act aims to improve the quality of U.S. primary and secondary schools by increasing the accountability standards of states, school districts, and schools. No Child Left Behind requires states to develop and administer assessments in basic skills to all students.
Soroka et al. v. Dayton-Hudson Company (1991).
The case was settled out of court, but an appeals court ruled that the use of preemployment psychological screening assessments is an invasion of candidate privacy.
Item difficulty
The percentage of test takers who answer a question correctly
Diana v. California State Board of Education (1973, 1979).
This case was settled out of court and requires that schools provide tests to students in their first language as well as in English English to limit linguistic bias.
mental status exam (MSE)
Used by professional counselors to obtain a snapshot of a client's mental symptoms and psychological state. The MSE addresses the following areas: appearance, attitude, movement and behavior, mood and affect, thought content, perceptions, thought processes, judgment and insight, and intellectual functioning and memory.
Maximal performance test
Used when a professional counselor would like information regarding the client's best attainable score or performance. Achievement and aptitude tests are measures used to test this.
A person's observed score (X) is equal to his or her true score (T) plus the amount of error (e) present during test administration:
X = T + e
Bias
a broad term that refers to an individual or group being deprived of the opportunity to demonstrate their true skills, knowledge, abilities, and personalities on a given assessment.
Percentile rank
a commonly used calculation that allows a comparison to be made between a person's raw score and a norm group. A percentile rank indicates the percentage of scores falling at or below a given score.
derived score
a converted raw score that gives meaning to a test score by comparing an individual's score with those of a norm group.
normal curve (bell curve)
a normal distribution forms a bell-shaped curve when graphed. The normal curve is symmetrical, with the highest point occurring at the graph's center. The lowest points lie on either side of the graph. The curve is also asymptotic, meaning that the tails approach the horizontal axis without ever touching it.
Responsibilities of Users of Standardized Tests (RUST)
a policy statement published by the Association for Assessment and Research in Counseling, a division of the ACA, which is intended to ensure that ACA members use standardized tests with clients in an accurate, fair, and responsible manner.
Item analysis
a procedure that involves statistically examining test-takers responses to individual test items with the intent to assess the quality of test items and the test as a whole. This is frequently used to eliminate confusing, easy, and difficult items from a test that will be used again.
Face validity
a superficial measure that is concerned with whether an instrument looks valid or credible.
grade-equivalent scores
a type of developmental score that compares an individual's score with the average score of those at the same grade level.
Age-equivalent scores
a type of developmental score that compares an individual's score with the average score of those of the same age. Age equivalents are reported in chronological years and months. Thus, we could say that a 7-years-5-months-old child with an age-equivalent score of 8.2 in height is the average height of a child age 8 years 2 months.
Split-half reliability
a type of internal consistency that correlates one half of a test against the other.
5. An individual with a z-score of −1.3 has a stanine score of a. 2.4 b. 37 c. 2 d. 22.62
a. 2.4 Stanine formula = 2 (z) + 5 2(-1.3) + 5 = 2.4
5. ____________ has been translated to mean "fear of the marketplace." a. Agoraphobia b. Factitious disorder c. Paraphilia d. Hypochondriasis
a. Agoraphobia
2. The Myers-Briggs Type Indicator uses four dichotomous scales to measure personality. What specific aspect of personality does the sensing vs. intuition scale measure? a. How you perceive the world around you. b. How you make decisions. c. Where your energy is directed. d. How you deal with the external world.
a. How you perceive the world around you.
5. A professional counselor would like information related to the Beck Depression Inventory (BDI). Specifically, he would like information related to the instrument's score reliability and validity, as well as a critique of using the assessment in clinical settings. Which source is designed to provide this information? a. Mental Measurements Yearbook b. Tests in Print c. Tests d. DSM-5
a. Mental Measurements Yearbook
3. In an initial session a professional counselor notices the client appears disheveled, has abnormal movements, and appears paranoid. The profession counselor is concerned about the client and would like to administer an assessment that will capture information on client appearance, movement, and thought content. Which one of the following assessments is most appropriate? a. Mental status exam b. Bayley scales c. House-Tree-Person d. Minnesota Multiphasic Personality Inteventory-2
a. Mental status exam
3. Individuals with conduct disorder cannot simultaneously be diagnosed with a. Oppositional defiant disorder. b. Attention-deficit/hyperactivity disorder. c. Separation anxiety disorder. d. Learning disorder.
a. Oppositional defiant disorder.
2. If a professional school counselor wants to know if a student is ready to move to the next grade level, she should administer a(n) a. maximal performance test. b. speed test. c. objective test. d. typical performance test.
a. maximal performance test.
5. William Stern's ratio intelligence quotient, popularized on early versions of the Stanford-Binet Intelligence Scales, was calculated by dividing one's a. mental age by chronological age. b. broad cognitive abilities by narrow cognitive abilities. c. chronological age by one's mental age. d. narrow cognitive abilities by broad cognitive abilities.
a. mental age by chronological age.
3. If a math test item has positive item discrimination, it can be said that a. more students who knew the material answered the question correctly than students who did not know the material material well. b. more students who did not know the material well answered the question correctly than students who knew the material. c. all students answered the question correctly. d. 50% of all students answered the question correctly.
a. more students who knew the material answered the question correctly than students who did not know the material material well.
Group tests
administered to two or more test-takers at a time. These tests use objective scoring methods and have established norms.
Nonstandardized tests
allow for variability and adaptation in test administration, scoring, and interpretation. These tests do not permit an individual's score to be compared to a norm group so the counselor must rely on judgment to interpret the data.
norm-referenced assessment
an individual's score is compared to the average score of the test-taking group
Diagnostic tests
are designed to identify learning disabilities or specific learning difficulties in a given academic area.
objective personality tests
are standardized, self-report instruments that often use multiple-choice or true/false formats to assess various aspects of personality
Personality tests
assess a person's affective realm. Specifically, personality tests describe the facets of a person's character that remain stable through adulthood (e.g., temperament and patterns in behavior).
projective personality tests
assess personality factors by interpreting a client's response to ambiguous stimuli
Direct observation
assesses an individual's behavior in real time and usually occurs in a naturalistic setting.
Design accuracy
assesses the accuracy of instruments in supporting counselor decisions.
Admission
assessment procedures are often used to determine admission into an educational institution. Aptitude tests such as the GRE are considered for entrance into graduate programs.
Selection
assessments are also used to select candidates for a special program or job position.
4. A high school counselor would like to administer a vocational aptitude assessment to students in the ninth grade that will assess multiple aptitudes. Her hope is that the assessment will highlight student vocational strengths and offer potential careers that students might be interested in. Which of the following assessments would be appropriate for the school counselor to administer? a. Skills Profiler Series b. Differential Aptitude Test (DAT) c. ACT Assessment d. Clerical Test Battery
b. Differential Aptitude Test (DAT)
1. A professional counselor would like to administer an objective personality assessment to her adult client. She would like the assessment to identity DSM-5 related personality disorders. Which of the following assessments should the counselor use? a. California Psychological Inventory (CPI) b. Millon Clinical Multiaxial Inventory (MCMI-IV) c. Myers-Briggs Type Indicator (MBTI) d. Minnesota Multiphasic Personality Inventory-2 (MMPI-2)
b. Millon Clinical Multiaxial Inventory (MCMI-IV)
4. A professional counselor releases a client's test results to a bachelor's-level case manager who has no training in testing and assessment. What ethical guideline was violated? a. Informed consent b. Release of results to qualified professionals c. Communicating test results d. Competence to use and interpret assessment instruments
b. Release of results to qualified professionals
1. Which of the following is NOT true about test validity? a. Validity should be reported in terms of test purpose and intended population. b. Test scores do not have to be valid to be reliable. c. A validity coefficient of .55 is high. d. False positive errors contribute to a lack of test score validity
b. Test scores do not have to be valid to be reliable.
5. The GRE has the ability to modify the test structure and items to the examinee's ability level. This characteristic makes the GRE a(n) a. aptitude test. b. computer-adaptive test. c. projective test. d. computer-based test.
b. computer-adaptive test.
5. Item response theory can be used to a. detect equivalence in an item that is written in different languages. b. detect item bias in the same test given to African Americans and Latino Americans. c. determine if an SAT score of 1,000 is equivalent to an IQ score of 114. d. give a bright student an exam with more difficult items.
b. detect item bias in the same test given to African Americans and Latino Americans.
4. Performance assessments a. require examinees to complete a paper-and-pencil test. b. require examinees to perform a task. c. are advantageous when working with highly verbal clients. d. All of the above.
b. require examinees to perform a task.
Intelligence tests
broadly assess an individual's cognitive abilities
trauma
broadly defined as an emotional response to an event(s) where an individual experiences physical and/or emotional harm
2. If a set of high school standardized test scores with a mean of 74 and a standard deviation of 10 is normally distributed, what is the median? a. 64 b. 84 c. 74 d. 104
c. 74 ; if it is a normal distribution, the mean, median, and mode are always the same. If there is a positive or negative skew in the curve, then the mean, median, and mode could differ.
3. In a normal distribution, _______________ of scores falls between −1 and +2 standard deviations? a. 68% b. 2% c. 82% d. 98%
c. 82% ; Refer to bell curve on 7.1... 34% + 34% + 14% = 82
2. Which of the following is a DSM-5 neurodevelopmental disorder? a. Posttraumatic stress disorder. b. Antisocial personality disorder. c. Attention-deficit/hyperactivity disorder. d. Oppositional defiant disorder.
c. Attention-deficit/hyperactivity disorder.
4. A professional counselor would like to track her clients' therapeutic outcomes. She decides to administer each client a shortened version of the Outcome Questionnaire (OQ-45). The shortened version has 30 items, which is 15 items less than the full-length version. Which of the following is a concern when administering a shorter test? a. Item discrimination b. Face validity c. Reliability d. Decision accuracy
c. Reliability ; Go to Pearson book and look at 7.2.2.4 Factors That Influence Reliability
2. A counseling researcher wants to establish the reliability of a new eating disorder scale. She administers the scale to the same participants twice to evaluate the consistency of scores over time. Which type of reliability is the counseling researcher using? a. Alternate form reliability b. Split-half reliability c. Test-retest reliability d. Factor analysis
c. Test-retest reliability
3. Holly is a third grader who is academically struggling. Her mother discloses to the school counselor that she believes Holly might have a learning disability. Under IDEA legislation, Holly is entitled to which of the following? a. Confidentiality of her student records, which could contain results from any disability testing services she receives. b. The right to receive appropriate accommodations during the administration of class tests since she may have a learning disability. c. The right to receive disability testing services at the expense of the public school system in order to determine if she does have a learning disability. d. The right to receive vocational assessment and counseling services.
c. The right to receive disability testing services at the expense of the public school system in order to determine if she does have a learning disability.
1. Requiring minority students to take standardized college admission exams that were designed for White, middle-class students constitutes ____________ bias. a. interpretive b. situational c. ecological d. examiner
c. ecological
1. Achievement testing includes each of the following EXCEPT a. teacher-constructed criterion-referenced tests. b. standardized norm-referenced tests. c. intelligence tests. d. All of the above are examples of achievement testing.
c. intelligence tests.
1. The DSM-5 a. is not often used by professional counselors. b. is difficult to interpret into laymen's terms for the client. c. provides a common language for mental health professionals to communicate with one another. d. All of the above.
c. provides a common language for mental health professionals to communicate with one another.
2. According to the ACA and NBCC codes of ethics, professional counselors a. must rely on cultural stereotypes when assessing multicultural populations. b. do not need to be knowledgeable about the client's culture if using a multiculturally appropriate assessment. c. use instruments that provide norms for the specific client population that is being assessed. d. All of the above.
c. use instruments that provide norms for the specific client population that is being assessed.
Clinical assessment
can be thought of as the"whole person assessment". It refers to the process of assessing clients through multiple methods such as personality testing, observation, interviewing, and performance.
ordinal scale
classifies and assigns rank-order to data. Likerttype scales, which often rank degrees of satisfaction toward a particular issue, are an example. Ordinal scales designate order, but the intervals between the numbers are not necessarily equal.
Likert scale
commonly used when developing instruments that assess attitudes or opinions.
standardized scores
compare individual scores to a norm group through the use of formulas that convert the raw score to a new score. By converting a raw score into a standard score, we can compare an individual's scores on different types of tests. Common types of standardized scores include z-scores, T scores, deviation IQ, stanine scores, and normal curve equivalent scores.
Alternative form reliability, also referred to as parallel form reliability or equivalent form reliability
compares the consistency of scores from two alternative, but equivalent, forms of the same test.
typical performance test
concerned with one's characteristic or normal performance. Personality measurements would be an example of this.
Concurrent validity
concerned with the relationship between an instrument's results and another currently obtainable criterion.
validity coefficient
correlation between a test score and the criterion measure.
Monitoring client progress
counselors have the responsibility to monitor client progress throughout the course of counseling the determine whether the client is moving forward toward his/her goals.
4. If the mean on an intelligence test is 100 and the standard deviation is 20, what is the percentile rank of a client who scored an 80 on that test? a. 34 b. 50 c. 84 d. 16
d. 16 ; Basically trying to figure out how many people scored between the mean and one SD. The score of 80 is 1 SD below the mean... so how many people in the intelligence scored one SD below the mean? Look at the 16 below the 85 in the chart in the percentile row, which lets us know that any time something is one SD below, it will equate to 16.
5. Based on the risk factors associated with committing suicide, which client is MOST likely to commit suicide? a. An 18-year-old, African American male with a family history of suicide. b. A 78-year-old, widowed Caucasian female who is depressed. c. A 40-year-old, Hispanic woman with a diagnosed anxiety disorder and a history of alcohol abuse. d. A 67-year-old, divorced Caucasian male who recently lost his job and reports feelings of hopelessness.
d. A 67-year-old, divorced Caucasian male who recently lost his job and reports feelings of hopelessness.
1. Developed by Robert Yerkes, the ____________ is a language-free test that was designed to assess the cognitive ability of military recruits who could not read or were foreign born. a. Army Alpha b. Stanford-Binet Intelligence Test c. Minnesota Multiphasic Personality Inventory d. Army Beta
d. Army Beta
2. John, a third-grade student, is having trouble in math. The school counselor suspects that John has a learning disability in math. Which of the following tests could be used to determine whether John has a learning disability in math? a. Iowa Test of Basic Skills b. Stanford Achievement Test c. Graduate Record Exam d. Key Math Diagnostic Test
d. Key Math Diagnostic Test
4. A career counselor would like to begin administering the Strong Interest Inventory via the computer vs. paper-and-pencil. Which of the following would be a disadvantage of implementing computer-based testing? a. Greater scoring accuracy b. Can provide immediate feedback concerning client performance c. Clients prefer test administration via the computer when responding to sensitive topics d. Minimizes human contact and involvement in the testing process
d. Minimizes human contact and involvement in the testing process
3. When administering a test to a client from a different cultural background, professional counselors should consider using a test that has been adapted over one that has been translated for all but ONE of the following reasons. a. Test adaptation includes translating language, as well as empirically evaluating the cultural equivalence. b. Test adaptation is preferred to test translation. c. Test translation has been heavily criticized for assuming equivalence in content and values across cultures. d. Test translation sufficiently reduces cultural bias in testing.
d. Test translation sufficiently reduces cultural bias in testing.
1. If a client scored 45 on an anxiety screening, what can the professional counselor conclude about his level of anxiety? a. The client has a high level of anxiety and should be medicated. b. The client has an average level of anxiety and his symptoms should improve with CBT. c. The client has a low level of anxiety and does not need to seek treatment. d. There is not enough information to make a clinical decision about the client's anxiety level and need for treatment.
d. There is not enough information to make a clinical decision about the client's anxiety level and need for treatment.
3. Which of the following is NOT an example of an aptitude test? a. GRE general test b. The Armed Services Vocational Aptitude Battery (ASVAB) c. Otis-Lennon School Ability Test d. WRAT4
d. WRAT4
4. Alam has been feeling down since he left his country of origin three years ago. He misses his family, his children, his culture, and his language. There are not many people from Bangladesh living in his neighborhood. He's experienced a depressed mood more often than not during this time period. Alam would likely be diagnosed with a. major depressive disorder. b. cyclothymic disorder. c. bipolar I disorder. d. persistent depressive disorder.
d. persistent depressive disorder.
Suicide lethality
defined as the likelihood that a client will die as a result of suicidal thoughts and behaviors
Achievement tests
designed to assess what one has learned at the time of testing
Standardized tests
designed to ensure the conditions for administration, test content, scoring procedures, and interpretations are consistent.
Test-retest reliability (temporal stability)
determines the relationship between the scores obtained from the two different administrations of the same test.
unstructured interviews
do not use pre-established questions and tend to rely on the client's lead to determine a focus for the interview
predictive validity
examines the relationship between an instrument's results collected now and a criterion collected in the future.
Test theory
expects that test constructs must have the ability to be measured for quality and quantity to be considered empirical. Strives to reduce test error and enhance construct reliability and validity
z-score
has a mean of 0 and a standard deviation of 1. It simply represents the number of standard deviation units above or below the mean at which a given score falls. Z-scores are derived by subtracting the sample mean from the individual's raw score and then dividing the difference by the sample standard deviation. z = X (raw score) − M (sample mean) / SD (standard deviation). EX. You must know the sample's mean and standard deviation to convert an individual's raw score into a z-score. Consider the data set found in Table 7.6. Ivan's raw score on the math exam was 67, therefore his z- score is z = 67 − 63 (average) = 4 / 4 = +1.00.
Validity
how accurately an instrument measures a given construct
reliability coefficient
in test reports and manuals, reliability is expressed as a correlation
interval scale
includes all ordinal scale qualities and has equivalent intervals - that is, interval scale measure have an equal distance between each point on the scale. EX: Therefore, the difference between 32 degrees and 31 degrees Fahrenheit is the same as the difference between 67 degrees and 66 degrees Fahrenheit. Interval scales do not have an absolute zero point. For example, 0 degrees Fahrenheit does not mean there is no temperature. As a result, it cannot be said that 60 degrees Fahrenheit is twice as warm as 30 degrees Fahrenheit.
Criterion validity
indicates the effectiveness of an instrument in predicting an individual's performance on a specific criterion.
Paraphilic disorders
involve sexually arousing fantasies, urges, or behaviors usually involving nonhuman objects or humiliation of one's partner or children
Test adaptation
involves the process of altering a test for a population that differs significantly from the original test population in terms of cultural background and language. It includes translating language as well as empirically evaluating the cultural equivalence of the adapted data.
Discriminant validity
is established when measures of constructs that are not theoretically related are observed to have no relationship.
convergent validity
is established when measures of constructs that theoretically should be related are actually observed to be related to each other.
construct validity
is the extent to which an instrument measures a theoretical construct (i.e., idea or concept).
nominal scale
is the simplest measurement scale as it is only concerned with naming data, not classifying data for order or equal interval units. An example of a measure occurring on a nominal scale is gender. By assigning the label male or female, only a classification is being made.
Power tests
limit perfect scores by including difficult test items that few individuals can answer properly. These tests measure how well the test taker can perform given items of varying difficulty regardless of time or speed of response.
Cognitive ability tests
make predictions about an individual's ability to perform in future grade levels, colleges, and graduate schools
Inter-item consistency
measure of interval consistenct that compares individual test item responses with one another and the total test score.
Thurstone scale
measures multiple dimensions of an attitude by asking respondents to express their beliefs through agreeing or disagreeing with item statements.
Internal consistency
measures the consistency of responses from one test item to the next during a single administration of the instrument.
Guttman scale
measures the intensity of a variable being measured. Items are presented in a progressive order so that a respondent, who agrees with an extreme test item, will also agree with all previous, less extreme items. EX. Please place a check next to each statement that you agree with. ____________ Are you willing to permit gay students to attend your university? ____________ Are you willing to permit gay students to live in the university dorms? ____________ Are you willing to permit gay students to live in your dorm? ____________ Are you willing to permit gay students to live next door to you? ____________ Are you willing to have a gay student as a roommate?
normal distribution
nearly all scores fall close to average and very few scores fall toward either extreme of the distribution
Response bias
occurs when clients use a response set (e.g., all yes or no) to answer test questions.
Ecological bias
occurs when global systems prevent members of a particular group of individuals from demonstrating their true skills, knowledge, abilities, and personalities on a given assessment.
Situational bias
occurs when testing conditions or situations differentially affect the performance of individuals from a particular group
Interpretive bias
occurs when the examiner's interpretation of the test results provides unfair advantage or disadvantage to the client
Test bias
occurs when the properties of a test cause an individual or particular group of individuals to score lower (negative bias) or higher (positive bias) on the test than the average score for the total population.
Interpretation
part of the assessment process wherein the professional counselor aligns meaning to the data yielded by evaluative procedures.
Developmental scores
place an individual's raw score along a developmental continuum to derive meaning from the score. Developmental scores describe an individual's location on a developmental continuum; they can be evaluate an individual's score against the scores of those of the same age or grade level
Classical test theory
postulates that an individual;s observed score is the sum of the true score and the amount of error present during test administration; central aim is to increase reliability of test scores
construct-based validity model
proposed that validity is a holistic construct, not explainable as separate components. Specifically, Messick proposed the exploration of internal structural aspects and external aspects of validity to study score validity, all of which in combination describe score validity in a holistic manner.
Scales of measurement
provide a method for classifying or measuring specific qualities or characteristics. Four different scales: nominal, ordinal, interval, and ratio.
Objective tests
provide consistency in administration and scoring to ensure freedom from the examiners own beliefs or biases (multiple-choice, true and false, and matching).
criterion-referenced assessment
provide information about an individual's score by comparing it to a predetermined standard or set criterion. EX. For example, if Ivan's instructor decided that 90 to 100 was an A, 80 to 89 was a B, 70 to 79 was a C, and 60 to 69 was a D, then Ivan would receive a D on the math exam (Ivan scored a 67).
Diagnostic systems
provide standardized terminology, or a common language that allows MHCs to communicate with one another regarding client diagnosis and treatment planning.
Survey batteries
refer to a collection of tests that measure individual's knowledge across broad content areas. These tests must cover material from multiple subject areas and as a result do not go in depth on any one area.
Readiness tests
refer to a group of criterion-referenced achievement assessments that indicate the minimum level of skills needed to move from one grade level to the next.
informal assessments
refer to subjective assessment techniques that are developed for specific needs; it seeks to identify the strengths and needs of the client. Ex: observation, clinical interviewing, rating scales, and classification systems
norms
refer to the typical score or performance against which all other test scores are evaluated
Ability assessment
refers to a broad category of assessment instruments that measure the cognitive domain. The cognitive domain often includes knowledge, comprehension, application, analysis, synthesis, and evaluation of information.
scale
refers to a collection of items or question that combine to form a composite score on a single variable
computer-based testing (CBT), also known as computer-based assessment (CBA)
refers to a method for administering, analyzing, and interpreting tests through the use of computer technology, software programs, or Internet sites.
Test translation
refers to a process by which test items are translated into the language spoken by examinees.
Semantic differential, also known as self-anchored scales
refers to a scaling technique rooted in the belief that most people think dichotomously (2). It asks test-takers to place a mark between 2 dichotomous adjectives. EX. How do you feel about your NCE scores? Bad_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Good
Vocational aptitude testing
refers to a set of predictive tests designed to measure one's potential for occupational success
Item response theory/ modern test theory
refers to applying mathematical models to the data collected from assessments.
suicide assessment
refers to determining a client's potential for committing suicide
evaluation
refers to making a determination of worth or significance based on the result of a measurement.
reliability
refers to the consistency of scores attained by the same person on different administrations of the same test.
experimental design validity
refers to the implementation of an experimental design to show that an instrument measures a specific construct.
High stakes testing
refers to the use of standardized test outcome to make a major educational decision concerning promotion, retention, educational placement, and entrance into college. The results of the high stakes testing can have serious consequences for the students being tested.
standardization
relates to the conversion of raw scores to standard scores. scores. Specifically, standardization refers to the process of finding the typical score attained by a group of test-takers. The typical score then acts as a standard reference point for future test results. Therefore, once a test is standardized, a score can be compared to the scores of the standard group.
individual tests
require that a test be administered to one examinee at a time. These allow counselors to establish rapport with the examinee and monitor the factors that influence performace.
Deviation IQ
scores used in intelligence testing. Deviation IQs have a mean of 100 and standard deviation of 15 and are derived by multiplying an individual's z-score by the deviation IQ standard deviation (15) and adding it to the deviation IQ mean (100). (7.7) SS = 15 (z) + 100
Subjective tests
sensitive to rater and examinee beliefs. They employ open-ended questions, which have more than one correct answer or way of expressing the correct answer (ex. essay question).
Standard error of estimate
statistic that indicates the expected margin of error in a predicted criterion score due to the imperfect validity of the test
factor analysis
statistical technique that analyzes the interrelationships of an instrument's items, thus revealing predicted latent (hidden) traits or dimensions called factors.
Test
subset of assessment and is used to yield data regarding an examinee's responses to test items.
Aptitude tests
such as the GRE and SAT, assess what a person is capable of learning
Item discrimination
the degree to which a test item is able to correctly differentiate test-takers who vary according to the construct measured by the test.
Content validity
the extent to which an instrument's content is appropriate to its intended purpose
ratio scale
the most advanced scale of measurement as it preserves the quality of nominal, ordinal, and interval scale and has an absolute zero point. EX. Height is an excellent example of a measure occurring on a ratio scale, and it can be said that a 6-foot-tall person is twice as tall as a 3-foot-tall person.
Clinical interviewing
the most commonly used assessment technique in counseling. The counselor uses clinical skills to obtain client information that will facilitate the course of counseling. three types: structured, semi-structured, and unstructured
Measurement (in counseling)
the process of defining and estimating the magnitude of human attributes and behavioral expressions.
percentage score
the raw score (the number of correct items) divided by the total number of test items EX. This is what I do to grade the student's tests. Consider Ivan's percentage score; he got 67 items correct on his math test, and the total number of items on the test was 100. Therefore, Ivan's percentage score is 67% (67/100).
Assessment
the systematic process of gathering information about an individual's background, history, skills, knowledge, perceptions, and feelings.
T score
type of standard score that has an adjusted mean of 50 and a SD of 10. These scores are commonly used when reporting the results of personality, interest, and aptitude measures. T scores are easily derived from z-scores by multiplying an individual's z- score by the T score standard deviation (i.e., 10) and adding it to the T score mean (i.e., 50): T = 10 (z) + 50. Ivan's T-score is: score is T = 10 (+1) + 50 = 60
rating scales
typically evaluate the quantity of an attribute
Structure interviews
use a series of pre-established questions that the professional counselor presents in the same order during each interview.
Speed test
use limited testing time to prevent perfect scores. Typically, these tests have easy questions but include too many items to answer in the allotted time.
Semi-structured interviews
use pre-established questions and topic areas to be addressed, however, the professional counselor can customize the interview by modifying questions; are more prone to interviewer error and bias; considered less reliable than structured interviews
classification systems
used to assess the presence or absence of an attribute. 3 of the commonly used systems are: 1. Behavior and feeling work checklists: Allow the professional counselor or client to identify the words that best describe the client's feelings or behaviors. 2. Sociometric instruments: Assess the social dynamics within a group, organization, or institution. 3. Situational tests: Involve asking the client to role-play a situation to determine how he or she may respond in real life.
inter-scorer reliability/inter-rater reliability
used to calculate the degree consistency of ratings between 2 or more persons observing the same behavior or assessing an individual through observational or interview methods
standard error of measurement (SEM)
used to estimate how scores from repeated administrations of the same instrument to the same individual are distributed around the true score
stanine
which stands for "standard nine", is a type of standard score used on achievement tests. This standard score divides the normal distribution into nine intervals.