Assessment and Testing
first standardized IQ test
primarily the work of Alfred Binet and Theodore Simon.
The early ratio formula for the Binet IQ score was
MA/CA (i.e., mental age divided by your chronological age) × 100 The score indicated how you compared to those in your age group. Memory device: An MA is a high degree so put it on top of the equation as the numerator.
Reliability and Validity
Validity refers to whether the test measures what it says it measures while reliability tells how consistent a test measures an attribute.
Today the Stanford-Binet is used from age 2 to adulthood. The IQ formula has been replaced by the a. SAS. b. SUDS. c. entropy. d. KR-20 formula.
a. SAS. SAS stands for "standard age score."
The best intelligence test for a kindergartner would be the a. WPPSI-IV. b. WAIS-IV. c. WISC-IV. d. Myers-Briggs Type Indicator.
a. WPPSI-IV.
Urie Bronfenbrenner
(ecological systems model) claimed that Jensen relied on twin studies with poor internal validity
quotient
A quotient is the result when you perform division.
CPT
current procedure terminology
The NCE is a. an intelligence test. b. an aptitude test. c. a personality test. d. an achievement test.
d. an achievement test. The NCE is testing your knowledge and application of material in the counseling profession.
acquiescence
manifests itself when a client always agrees with something.
Summated or linear rating scale
used to describe answer scales in which various values are given to different responses. For example, on a Likert Scale a "strongly agree" might be given a 5, yet an "agree response" might be rated a 4. The clients score is the "sum" of all the items.
divergent thinking
Divergent thinking is the ability to generate a novel idea.
1979 Larry P. v. Wilson Riles, Superintendent of Public Instruction, State of California: The Wechsler and Binet on Trial
In this now oft-quoted court battle, it was initially ruled that IQ tests were racially biased against African American children who were overly represented in EMR (educable mentally retarded) classes (proper terminology at the time) based on IQ scores.
convergent validity and discriminant validity
These terms relate to both criterion validity and construct validity.
IQ stands for intelligence quotient, which is expressed by a. CA/MA × 100. b. CA/MA × 100. c. MA/CA × 50. d. MA/CA × 100.
*d. MA/CA × 100. The test is Binet's, but the famous formula was created by the German psychologist, William Louis Stern. The formula produced what is known as a "ratio IQ." Today, a "deviation IQ" is utilized which compares the individual to a norm (i.e., the person is compared to others in his or her age group). Thus, the present score indicates "deviation" from the norm. Okay, now just to be sure that you are really picking this up let me say it in a slightly different way: Although we still use the term IQ, the Binet today actually relies on a standard age score (SAS) with a mean of 100 and a standard deviation of 16. So then you see, the IQ isn't really an IQ after all—right? *normal distribution graphic
info/hints about WAIS-IV
- The test is based on neurocognitive research and the Cattell-Horn-Carroll leading theory of human intelligence. - It can be administered and scored online. - The exam takes 60 to 90 minutes to complete. - When compared to the previous version of the exam, object assembly and picture arrangement have been dropped. - Ten subject areas, also called subtests on some exams (with a mean of 10 and a standard deviation of 3) make up four index scores: verbal comprehensive index (VCI), perceptual reasoning index (PRI), working memory index (WMI), and processing speed index (SPI). - FSIQ merely stands for full scale IQ. FSIQ and indexes sport a mean of 100 with a standard deviation of 15. - Less emphasis than the previous version on crystallized intelligence. - Can measure IQ from 40 to 160. Since the Stanford-Binet 5 has a wider range (e.g., it can measure an IQ up to 180) it would be a better instrument than the Wechsler for measuring extremely low IQs or giftedness.
synthetic validity
A method of testing the validity of a selection procedure by combining jobs that require similar abilities and by separately validating the specific predictors intended to measure those abilities. Synthetic validity is derived from the word synthesized. Synthetic validity was popularized by industrial organizational (I/O) psychologists who felt the procedure had merit, especially when utilized for smaller firms who did not hire a large number of workers. In synthetic validity, the helper or researcher looks for tests that have been shown to predict each job element or component (e.g., typing, filing, etc.). Tests that predict each component (criterion) can then be combined to improve the selection process
convergent thinking
Convergent thinking occurs when divergent thoughts and ideas are combined into a singular concept.
cross-validation
Cross-validation takes place when a researcher further examines the criterion validity (and in rare instances, the construct validity) of a test by administering the test to a new sample. This procedure is necessary to ensure that the original validity coefficient is applicable to others who will take the exam. This method helps guard against error factors, which are likely to be present if the original sample size is small. In most cases a cross-validation coefficient is indeed smaller than the initial validity coefficient. This phenomenon is called "shrinkage."
fluid and crystallized intelligence
Fluid intelligence is flexible (terrific they both begin with an F), culture-free, and adjusts to the situation, while crystallized is rigid and does not change or adapt.
Spearman-Brown formula
In psychometrics, a mathematical formula that predicts the degree to which the reliability of a test can be improved by adding more items.
Robert Willams
In the final choice, the African American psychologist Robert Williams created the Black Intelligence Test of Cultural Homogeneity (BITCH) to demonstrate that African Americans often excelled when given a test laden with questions whose answers would be familiar to members of the African American community. Williams charged that tests like the Binet and the Wechsler were part of "scientific racism." Williams—a victim of the system himself—scored an 82 on an IQ test at age 15 and his counselor suggested bricklaying since he was good with his hands! Williams rejected the advice and went on to put PhD after his name! IQ tests, though controversial to say the least, are, however, excellent predictors of school success in most cases since schools emphasize values that have been heavily influenced by European cultures.
internal consistency
Internal consistency or homogeneity of items also is known as "inter-item consistency." - measured with Kuder-Richardson or Cronback's alpha estimates
MBTI
Myers-Briggs Type Indicator personality inventory based on Carl Jung's analytic psychology. The MBTI uses dichotomous types: extraversion versus introversion, sensing versus intuition, thinking versus feeling, and judging versus perceiving. The test results in a four-letter type score such as ISFJ (i.e., introversion, sensing, feeling, judging). (Note: Intuition, though it begins with an "I," is coded using an "N" since Introversion begins with an "I.") Important exam hint: When a test is guided via a theory it is known as a theorybased test or inventory.
Merrill-Palmer
The Merrill-Palmer Scale of Mental Tests is an intelligence test for infants and children below age 7 years.
convergent validity
The relationship or correlation of a test to an independent measure or trait - actually a method used to assess a test's construct/criterion validity by correlating test scores with an outside source. Say, for example, that a measure purports to measure phobic responses. A client, who has a snake phobia, is then exposed to a snake and experiences extreme panic. If the client scores higher on the test than he would in a relaxed state, then this would display convergent validity. The test also should show discriminant validity.
discriminant validity
This means the test will not reflect unrelated variables. Hence, if phobias are unrelated to IQ, then when one correlates clients' IQ scores to their scores on the test for phobias, this should produce a near zero correlation. Similarly, if discriminant validity is evident, a counselor who is genuinely qualified to sit for a state licensing exam should score higher on the exam than a student who flunked an introductory counseling course. *When a researcher is engaged in test validation, both convergent and discriminant validity should be thoroughly examined.
WISC-IV
Wechsler Intelligence Scale for Children appropriate for kids ages 6-16 years and 11 months
WPPSI
Wechsler Preschool and Primary Scale of Intelligence suitable for children ages 2 years and 6 months to 7 years and 7 months
entropy
a popular family therapy/systems theory term that means that dysfunctional families are either too open or too closed (i.e., letting too much information in or not enough information in). The healthy family is said to be in a balanced state known as negative entropy.
construct validity
a test's ability to measure a theoretical construct like intelligence, self-esteem, artistic talent, mechanical ability, or managerial potential.
A researcher working with a personality test discovers that the test has a reliability coefficient of .70 which is somewhat typical. This indicates that a. 70% of the score is accurate while 30% is inaccurate. b. 30% of the people who are tested will receive accurate scores. c. 70% of the people who are tested will receive accurate scores. d. 30% of the score is accurate while 70% is inaccurate.
a. 70% of the score is accurate while 30% is inaccurate. Seventy percent of the obtained score on the test represented the true score on the personality attribute, while 30% of the obtained score could be accounted for by error. Seventy percent is true variance while 30% constitutes error variance.
A counselor who fears the client has an organic, neurological, or motoric difficulty would most likely use the a. Bender Gestalt II. b. Rorschach. c. Minnesota Multiphasic Personality Inventory-2. d. Thematic Apperception Test.
a. Bender Gestalt II. The Bender Visual Motor Gestalt Test (named after psychiatrist Lauretta Bender) is actually an expressive projective measure, though first and foremost it is known for its ability to discern whether brain damage is evident. Suitable for age 4 years and beyond, the client is instructed to copy 16 geometric figures which the client can look at while constructing his or her drawing.
The ________ are examples of aptitude tests. a. O*NET Ability Profiler and the MCAT b. GZTS and the MMPI-2 c. CPI and the MMPI-2 d. Strong and the LSAT
a. O*NET Ability Profiler and the MCAT Plenty of alphabet soup here! Here I've teamed up the O*NET Ability Profiler with the new Medical College Admission Test (MCAT). Choice "b," the Guilford-Zimmerman Temperament Survey (GZTS), is a personality measure for persons who do not have severe psychiatric disabilities. Ditto for the California Personality Inventory (CPI), which shares questions with the MMPI. Last, the final alternative introduces you to the new Law School Admission Test (LSAT), which of course qualifies as a bona fide aptitude test. So why is choice "d" incorrect? Well, if any portion of a response is incorrect, then the entire choice is erroneous. If you marked choice "d" you can blame it on the Strong! Exam Hint: School selection tests assess aptitude.
The 16 PF reflects the work of a. Raymond B. Cattell. b. Carl Jung. c. James McKeen Cattell. d. Oscar K. Buros.
a. Raymond B. Cattell. The 16 PF (16 Personality Factor Questionnaire), developed by Raymond B. Cattell, is suitable for persons age 16 and above and has been the subject of over 2,000 papers or other communications! The test measures key personality factors such as assertiveness, emotional maturity, and shrewdness. A couple can even decide that each party will take the 16 PF, and both an individual and joint profile will be compiled, which can be utilized for marital counseling. Tests and inventories like the 16 PF, which analyze data outside of a given theory, are called factor-analytic tests or inventories rather than theory-based tests.
One future trend which seems contradictory is that some experts are pushing for a. a greater reliance on tests while others want to rely on them less. b. social workers to do most of the testing. c. psychiatrists to do most of the testing. d. counselors to ban all computer-assisted tests.
a. a greater reliance on tests while others want to rely on them less. It seems we counselors just can't agree on anything. Many counselors would like to see a greater emphasis in the future on tests which assess creative and motivational factors.
A test battery is considered a. a horizontal test. b. a vertical test. c. a valid test. d. a reliable test.
a. a horizontal test. In a test battery, several measures are used to produce results that could be more accurate than those derived from merely using a single source. Say, this can get confusing. Remember, that in the section on group processes I talked about vertical and horizontal interventions. In testing, a vertical test would have versions for various age brackets or levels of education (e.g., a math achievement test for preschoolers and a version for middle school children). A horizontal test measures various factors (e.g., math and science) during the same testing procedure.
Test bias primarily results from a. a test being normed solely on white middle-class clients. b. the use of projective measures. c. using whites to score the test. d. using IQ rather than personality tests.
a. a test being normed solely on white middle-class clients. This bias should be communicated to the client when the results are explained. (a)
Francis Galton felt intelligence was a. a unitary faculty. b. best explained via a two factor theory. c. best explained via the person's environment. d. fluid and crystallized in nature.
a. a unitary faculty. Sir Francis Galton of England has been recognized as one of the major pioneers in the study of individual differences. A half-cousin of Charles "Origin of Species" Darwin, he believed that exceptional mental abilities were genetic and ran in families, and said just that in his 1869 work Hereditary Genius.
The WAIS-IV is given to 100,000 individuals in the United States who are picked at random. A counselor would expect that a. approximately 68% would score between 85 and 115. b. approximately 68% would score between 70 and 130. c. the mean IQ would be 112. d. 50% of those tested would score 112 or above.
a. approximately 68% would score between 85 and 115. First, the Wechsler IQ test has been administered to a very large group of people so chances are the distribution of scores will be normal. This tells you that the mean score will be 100 (i.e., the average IQ) and the standard deviation will be 15 (if the question were asked about the Binet you'd use 16 as the standard deviation). In a normal distribution approximately 68% of the population will fall between +/-1 standard deviation of the mean. With a standard deviation of 15 you simply subtract 15 from 100 to get the low score (i.e., 85) and add 15 to 100 to get 115. - Choice "b" would be correct if the 68% was changed to 95%, since about 95% of the people in a normal distribution fall between +/-2 standard deviations of the mean. (You simply subtract 30 from 100 to get 70 and add 30 to 100 to yield the upper IQ score of 130.) - Keep in mind that choice "c" should read 100 while choice "d" ought to indicate that 50% would score above 100.
Counselors often shy away from self-reports since a. clients often give inaccurate answers. b. ACA ethics do not allow them. c. clients need a very high IQ to understand them. d. they are generally very lengthy.
a. clients often give inaccurate answers. Say a client is monitoring her behavior and does not wish to disappoint her therapist. The report could be biased. This is a "reactive effect" of the self-monitoring.
One major testing trend is a. computer-assisted testing and computer interpretations. b. more paper and pencil measures. c. to give school children more standardized tests. d. to train pastoral counselors to do projective testing.
a. computer-assisted testing and computer interpretations. But don't take my word for it. Pick up any modern testing catalog and you might erroneously think you've picked up a computer software directory! And speaking of choice "c," as a write this there seems to be a nationwide push to eliminate the number of standardized tests given.
The ________ index indicates the percentage of individuals who answered each item correctly. a. difficulty b. critical c. intelligence d. personal
a. difficulty The higher the number of people who answer a question correctly, the easier the item is—and vice versa. A 0.5 difficulty index (also called a difficulty value) would suggest that 50% of those tested answered the question correctly, while 50% did not. Most theorists agree that a "good measure" provides a wide range of items that even a poor performer will answer correctly.
A new IQ test which yielded results nearly identical to other standardized measures would be said to have a. good concurrent validity. b. good face validity. c. superb internal consistency. d. all of the above.
a. good concurrent validity. Concurrent validity answers the question of how well your test stacks up against a well-established instrument that measures the same behavior, construct, or trait. Evidence for reliability and validity is expressed via correlation coefficients. Suffice to say that the closer they are to 1.00 the better.
Group IQ tests like the Otis-Lennon, the Lorge-Thorndike, and the California Test of Mental Abilities are popular in school settings. The advantage is that a. group tests are quicker to administer. b. group tests are superior in terms of predicting school performance. c. group tests always have a higher degree of reliability. d. individual IQ tests are not appropriate for school children.
a. group tests are quicker to administer. World War I provided the impetus for the group testing movement. Approximately two million men were tested using the Army Alpha for literates and the Army Beta for illiterates and those from other countries. School districts, government, and industry prefer tests which can be administered to many individuals simultaneously. The catch is that group tests are less accurate and have lower reliability
A job test which predicted future performance on a job very well would a. have high criterion/predictive validity. b. have excellent face validity. c. have excellent construct validity. d. not have incremental validity or synthetic validity.
a. have high criterion/predictive validity. Here you are concerned that the test will measure an independent or external outside "criterion," in this case the "future prediction" of the job performance. (Note: Choice "a" would be incorrect on a question such as this if the question specified current job performance. If this were the case then technically only the term criterion would apply.)
The standard error of measurement tells you a. how accurate or inaccurate a test score is. b. what population responds best to the test. c. something about social loafing. d. the number of people used in norming the test.
a. how accurate or inaccurate a test score is. If a client decided to take the same test over and over and over again you could plot a distribution of scores. This would be the standard error of measurement for the instrument in question. Suffice it to say, the lower the better. A low standard error means high reliability.
A counselor peruses a testing catalog in search of a test which will repeatedly give consistent results. The counselor a. is interested in reliability. b. is interested in validity. c. is looking for information which is not available. d. is magnifying an unimportant issue.
a. is interested in reliability. Beware: A test can indeed be reliable yet not valid. A highly reliable test could conceivably prove invalid. A scale that invariably reads 109 lb when you weigh 143 lb would hardly be providing you with a valid assessment of your true weight! The score, nevertheless, is consistent (reliable). Thus, a test can have a high reliability coefficient but still have a low validity coefficient. Reliability places a ceiling on validity, but validity does not set the limits on reliability.
In a culture-fair test a. items are known to the subject regardless of his or her culture. b. the test is not standardized. c. culture-free items cannot be utilized. d. African Americans generally score higher than whites.
a. items are known to the subject regardless of his or her culture. The culture-fair test attempts to expunge items which would be known only to an individual due to his or her background. Key exam hint: Ethics now consider it unethical to administer a test to a client from a given population unless that particular test or inventory has been normed on that specific population! As an example, if you gave an African American client a test that had not been normed on African Americans this would be considered a violation of ethics.
Face validity refers to the extent that a test a. looks or appears to measure the intended attribute. b. measures a theoretical construct. c. appears to be constructed in an artistic fashion. d. can be compared to job performance.
a. looks or appears to measure the intended attribute. Face validity—like a person's face—merely tells you whether the test looks like it measures the intended trait. Does your therapist look like a therapist? Does the Wechsler appear to be an IQ test? The obvious answer is "In most cases who cares, it's not that important"! And if a therapist looks like a good therapist, does that necessarily mean he is an adept therapist? Of course not. And the same is true of testing.
A test can be defined as a systematic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. The format of an essay test is considered a(n) ________ format. a. subjective b. objective c. very precise d. concise
a. subjective A "subjective" paradigm relies mainly on the scorer's opinion. If the rater knows the test taker's attributes, the rater's "personal bias" can significantly impact upon the rating. For example, an attractive examinee might be given a higher rating. (This is the so-called halo effect.) In job settings, peers generally rate their colleagues higher than do their supervisors. In an "objective" test (choice "b") the rater's judgment plays little or no part in the scoring process.
One method of testing reliability is to give the same test to the same group of people two times and then correlate the scores. This is called a. test-retest reliability. b. equivalent forms reliability. c. alternate forms reliability. d. the split-half method.
a. test-retest reliability. All right, I've got to hand it to you—you're very perceptive. You've figured out that I'm banking on the fact that your exam will spring a few reliability or validity questions on you. The well-known test-retest method discussed here tests for "stability," which is the ability of a test score to remain stable or fluctuate over time when the client takes the test again. When using the test-retest paradigm the client generally takes the same test after waiting at least seven days. The test-retest procedure is only valid for traits such as IQ which remain stable over time and are not altered by mood, memory, or practice effects.
Appraisal can be defined as: a. the process of assessing or estimating attributes b. testing which is always performed in a group setting c. testing which is always performed on a single individual d. a pencil and paper measurement of assessing attributes
a. the process of assessing or estimating attributes Appraisal is a broad term which includes more than merely "testing clients." Appraisal could include a survey, observations, or even clinical interviews. Choices "b," "c," and "d" are thus too limited. A test is simply an instrument which measures a given sample of behavior. When we use the term measure it merely connotes that a number or score has been assigned to the person's attribute or performance. On your exam an appraisal could be billed as an assessment or an evaluation.
Interest inventories are positive in the sense that a. they are reliable and not threatening to the test taker. b. they are always graded by the test taker. c. they require little or no reading skills. d. they have high validity in nearly all age brackets.
a. they are reliable and not threatening to the test taker. Generally, an interest inventory would be the least threatening variety of test. (a)
J. P. Guilford isolated 120 factors which added up to intelligence. He also is remembered for his a. thoughts on convergent and divergent thinking. b. work on cognitive therapy. c. work on behavior therapy. d. work to create the first standardized IQ test.
a. thoughts on convergent and divergent thinking. Using factor analysis, Guilford determined that there were 120 elements/abilities which added up to intelligence. Two of the dimensions—convergent and divergent thinking—are still popular terms today.
An achievement test measures maximum performance or present level of skill. Tests of this nature are also called attainment tests, while a personality test or interest inventory measures a. typical performance. b. minimum performance. c. unconscious traits. d. self-esteem by always relying on a Q-Sort design.
a. typical performance. Interest inventories are popular with career counselors because such measures focus on what the client likes or dislikes. The Strong Interest Inventory (SII) is an excellent example.
In the field of testing, validity refers to a. whether the test really measures what it purports to measure. b. whether the same test gives consistent measurement. c. the degree of cultural bias in a test. d. the fact that numerous tests measure the same traits.
a. whether the test really measures what it purports to measure. To be valid the test must measure what you want it to measure! Incidentally, a test which is valid for one population is not necessarily valid for another group.
predictive validity
also known as empirical validity reflects the test's ability to predict future behavior according to established criteria. On some exams, concurrent validity and predictive validity are often lumped under the umbrella of "criterion validity," since concurrent validity and predictive validity are actually different types of criterion-related validity.
James McKeen Cattell
another Cattell, who coined the term mental test and spent time researching mental assessment and its relation to reaction time at the University of Pennsylvania. James McKeen Cattell had originally worked with Wilhelm Wundt and later Francis Galton.
An excellent psychological or counseling test would have a reliability coefficient of a. 50. b. .90. c. 1.00. d. -.90.
b. .90. Ninety percent of the score measured the attribute in question, while 10% of the score is indicative of error.
The mean on the Wechsler and the Stanford-Binet Intelligence scales (SB5) is ________ and the standard deviation is ________. a. 100; 100 b. 100; 15 Wechsler, 16 Stanford-Binet c. 100; 20 d. 100; 1
b. 100; 15 Wechsler, 16 Stanford-Binet
In most instances, who would be the best qualified to give the Rorschach Inkblot Test? a. A counselor with NCC after his or her name. b. A clinical psychologist. c. A D.O. psychiatrist. d. A social worker with LCSW after his or her name.
b. A clinical psychologist. Generally, a clinical psychologist would have the most training in projective measures while the social worker would have the least education regarding tests and measurements.
A counselor who had an interest primarily in testing would most likely be a member of a. HS-BCP. b. AARC. c. NASW. d. ACES.
b. AARC. The AARC (Association for Assessment and Research in Counseling) is one of 20 ACA divisions. Can you name the other choices? - Human Services-Board Certified Practitioner - National Association of Social Workers - association for counselor education and supervision
The Myers-Briggs Type Indicator reflects the work of a. Raymond B. Cattell. b. Carl Jung. c. William Glasser. d. Oscar K. Buros.
b. Carl Jung.
A counselor can utilize psychological tests to help secure a ________ diagnosis if third-party payments are necessary. a. CPT b. DSM or ICD c. percentile d. standard error
b. DSM or ICD Diagnosis is a medical term which asserts that you classify a disease based on symptomatology. CPT (Current Procedural Terminology Codes) are used to let insurance companies, managed care firms, etc. know which service you provided, such as individual therapy or family therapy.
The best IQ test for a 22-year-old single male would be the a. WPPSI-III. b. WAIS-IV. c. WISC-IV. d. any computer-based IQ test.
b. WAIS-IV. the WAIS-IV (Wechsler Adult Intelligence Scale), is intended for ages 16-90 years
Today, the Stanford-Binet IQ test is a. a nonstandardized measure. b. a standardized measure. c. a projective measure. d. b and c.
b. a standardized measure. The Stanford-Binet is standardized because the scoring and administration procedures are formal and well delineated. Measures which are not standardized (choice "a") lack procedural guidelines for scoring or administration and do not include quantitative information related to "standards" of performance
A valid test is ________ reliable. a. not always b. always c. never d. 80%
b. always A valid test is always reliable. Choice "b" is correct because a test that measures a given trait well does so repeatedly. Remember that a reliable test, however, is not necessarily always valid. After all, a depression scale that was invalid and really measured anxiety could produce consistent reliable anxiety data.
The word psychometric means a. a form of measurement used by a neurologist. b. any form of mental testing. c. a mental trait which cannot be measured. d. the test relies on a summated or linear rating scale.
b. any form of mental testing.
A client who takes a normative test a. cannot legitimately be compared to others who have taken the test. b. can legitimately be compared to others who have taken the test. c. could not have taken an IQ test. d. could not have taken a personality test.
b. can legitimately be compared to others who have taken the test. First, forget about choice "a," it's ipsative. Technically, a normative interpretation is one in which the individual's score is evaluated by comparing it to others who took the same test. A percentile rank is an excellent example. Say your client scores 82 on a nationally normed test and this score corresponds to the percentile rank of 60. This tells you that 60% of the individuals who took the test scored 82 or less. If it's still a bit fuzzy don't sweat it! There's more where this one came from in the next section!
One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as a. test-retest reliability. b. equivalent or alternate forms reliability. c. the split-half method. d. internal consistency.
b. equivalent or alternate forms reliability. Here a single group of examinees takes parallel forms of a test and a reliability correlation coefficient is figured on the two sets of scores. Counterbalancing is necessary when testing reliability in this fashion. That is to say, half of the individuals get parallel form A first and half get form B initially. This controls for variables such as fatigue, practice, and motivation.
The NCE and the CPCE would be examples of a(n) ________ test. a. free choice b. forced choice c. projective d. intelligence
b. forced choice This book is composed of forced choice/recognition items. On some tests this format is used to control for the "social desirability phenomenon" which asserts that the person puts the answer he or she feels is socially acceptable (i.e., the test provides alternatives that are all equal in terms of social desirability). The MMPI-2 (Minnesota Multiphasic Personality Inventory), for example, uses forced choices to create a "lie scale" composed of human frailties we all possess. This scale, therefore, ferrets out those individuals who tried to make themselves look good (i.e., the way they believe they "should" be).
A colleague of yours invents a new projective test. Seventeen counselors rated the same client using the measure and came up with nearly identical assessments. This would indicate a. high validity. b. high reliability. c. excellent norming studies. d. culture fairness.
b. high reliability. This is known as "inter-rater" reliability.
Infant IQ tests are a. more reliable than those given later in life. b. more unreliable than those given later in life. c. not related to learning experiences. d. never used.
b. more unreliable than those given later in life. These "toddler tests" are sometimes capable of picking up gross abnormalities such as severe intellectual disabilities.
A good practice for counselors is to a. always test the client yourself rather than referring the client for testing. b. never generalize on the basis of a single test score. c. stay away from culture-free tests. d. stay away from scoring the test yourself.
b. never generalize on the basis of a single test score. Also, although choice "c" represents an ideal measure, most experts believe that as of this date no such animal exists.
The National Counselor Exam (NCE) is a(n) ________ test because the scoring procedure is specific. a. subjective b. objective c. projective d. subtest
b. objective Since the NCE uses an a, b, c, d alternative format the rater's "subjective" feelings and thoughts would not be an issue. Ditto for the CPCE.
An aptitude test is to ________ as an achievement test is to ________. a. what has been learned; potential b. potential; what has been learned c. profit from learning; potential d. a measurement of current skills; potential
b. potential; what has been learned An aptitude test assesses "potential" and "predicts." A person, for example, who scores high on a music aptitude test is not necessarily a skilled musician at the time he or she takes the test. The test, however, is predicting that this individual could excel in music if he or she received the proper training and practice. An achievement test examines what you know or how well you currently perform (e.g., the NCE or how fast you can run the 100-yard dash). Predictive validity is particularly important when choosing an aptitude test.
One problem with interest inventories is that the person often tries to answer the questions in a socially acceptable manner. Psychometricians call this response style phenomenon a. standard error. b. social desirability (the right way to feel in society). c. cultural bias. d. acquiescence.
b. social desirability (the right way to feel in society). The converse of choice "b" occurs when an individual purposely, or when in doubt, gives unusual responses. This phenomenon is known as "deviation."
A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor a. used an invalid procedure to test reliability. b. was testing reliability via the split-half correlation method. c. was testing reliability via the equivalent forms method. d. was testing reliability via the inter-rater method.
b. was testing reliability via the split-half correlation method. In this situation the individual takes the entire test as a whole and then the test is divided into halves. The correlation between the half scores yields a reliability coefficient. When a researcher does not use even versus odd questions to split the test, he or she may do so using random numbers (merely dividing a test according to first half versus second half could confound the data due to practice and fatigue effects).
In an ipsative measure the person taking the test must compare items to one another. The result is that a. an ipsative measure cannot be utilized for career guidance. b. you cannot legitimately compare two or more people who have taken an ipsative test. c. an ipsative measure is never a forced choice format. d. an ipsative measure is never reliable.
b. you cannot legitimately compare two or more people who have taken an ipsative test. Since the ipsative measure does not reveal absolute strengths, comparing one person's score to another is relatively meaningless. The person is measured in response to his or her own standard of behavior. The ipsative measure points out the highs and lows that exist within a single individual. Hence, when a colleague tells you that Mr. Johnson's anxiety is improving, she has given you an ipsative description. This description, however, would not lend itself to comparing say Mr. Johnson's anxiety to Mrs. McBee's. Choice "c" is a no go since ipsative assessments are generally composed of forced choice items. The ipsative approach yields a within-person analysis.
A career counselor is using a test for job selection purposes. An acceptable reliability coefficient would be ________ or higher. a. .20 b. .55 c. .80 d. .70
c. .80 This is a tricky question. Although .70 is generally acceptable for most psychological attributes, for admissions for jobs, schools, and so on, it should be at least .80 and some experts will not settle for less than .90.
In constructing a test you notice that all 75 people correctly answered item number 12. This gives you an item difficulty of a. 1.2. b. .75. c. 1.0. d. 0.0.
c. 1.0. The item difficulty index is calculated by taking the number of persons tested who answered the item correctly/total number of persons tested. Hence, in this case 75/75 = 1.0. This maximum score for item 12 tells you it is probably much too easy for your examinees.
The Binet stressed age-related tasks. Utilizing this method, a 9-year-old task would be one which a. only a 10-year-old child could answer. b. only an 8-year-old child could answer. c. 50% of the 9-year-olds could answer correctly. d. 75% of the 9-year-olds could answer correctly.
c. 50% of the 9-year-olds could answer correctly. A 9-year-old task was defined as one in which one half of the 9-yearolds tested could answer successfully.
The black versus white IQ controversy was sparked mainly by a 1969 article written by ________. a. John Ertl b. Raymond B. Cattell c. Arthur Jensen d. Robert Williams
c. Arthur Jensen Jensen, choice "c" mentioned earlier, sparked tremendous controversy—actually that's putting it mildly—when he suggested in a 1969Harvard Educational Review article ("How Much Can We Boost IQ and Scholastic Performance?") that the closer people are genetically, the more alike their IQ scores. Adopted children, for example, will sport IQs closer to their biological parents than to their adopted ones. Jensen then leveled the charge that whites score 11 to 15 IQ points higher than African Americans (regardless of social class). His theory stated that due to slavery it was possible that African Americans were bred for strength rather than intelligence. He estimated that heredity contributed 80%, while environment influenced 20% of the IQ. - Other researchers (e.g., Newman, Freeman, and Holzinger; Fehr) felt that genetic influences contributed less than 50% to IQ.
An aptitude test predicts future behavior while an achievement test measures what you have mastered or learned. In the case of a test like the ________ the distinction is unclear. a. Binet b. Wechsler c. GRE d. Bender
c. GRE Sure, the GRE attempts to predict graduate school performance, but it also tests your level of knowledge. Some exams will refer to tests like the GRE, MAT, MCAT, SAT, etc., as "aptitude- achievement tests." Now here's where a counselor's life gets really complicated. Say your exam presents you with one of the aforementioned tests and gives you "aptitude" as one choice, and "achievement" as another, but does not give you "aptitude achievement" as an alternative (yipes!). Well, I certainly won't condone the practice, but based on my investigation of the textbook taxonomy of tests I'd opt for the "aptitude" option and latch onto the first good four-leaf clover I could get my hands on.
A counseling test consists of 300 forced response items. The person taking the test can take as long as he or she wants to answer the questions. a. This is most likely a projective measure. b. This is most likely a speed test. c. This is most likely a power test. d. This is most likely an invalid measure.
c. This is most likely a power test. Like the speed test, it will ideally be designed so that nobody receives a perfect score. Choice "a," projective measure, stands incorrect since the projective tests rely on a "free response" format. In a power test time is not an issue.
A new IQ test has a standard error of measurement (SEM) of 3. Tom scores 106 on the test. If he takes the test a lot, we can predict that about 68% of the time a. Tom will score between 100 and 103. b. Tom will score between 100 and 106. c. Tom will score between 103 and 109. d. Tom will score higher than Betty who scored 139.
c. Tom will score between 103 and 109. Calculated simply by taking: 106 - 3 = 103 and 106 + 3 = 109. Hint: Your exam could refer to this as the "68% confidence interval" (i.e., 103 to 109). Classical test theory suggests the formula, X = T + E, where X is the obtained score, T is the true score, and E is the error. Hence, psychometricians know that if a client takes the same test over and over, random error (i.e., E in the formula) will cause the score to fluctuate.
Which is more important, validity or reliability? a. Reliability. b. They are equally important. c. Validity. d. It depends on the test in question.
c. Validity. Experts nearly always consider validity the number one factor in the construction of a test. A test must measure what it purports to measure. Reliability, choice "a," is the second most important concern. A scale, for example, must measure body weight accurately if it is a valid instrument. In order to be reliable, it will need to give repeated readings which are nearly identical for the same person if the person keeps stepping on and off the scale.
The best intelligence test for a sixth-grade girl would be the a. WPPSI-IV. b. WAIS-IV. c. WISC-IV. d. Merrill-Palmer.
c. WISC-IV. The WISC-IV is recommended for children from ages 6-16 years and 11 months. Counselors who have been in the field for an extended period of time might be surprised that the WAIS-IV and the WISC-IV no longer provide verbal and performance IQ scores. On any test the lowest possible score is known as the "floor" while the "highest possible score" is referred to as the "ceiling."
A reliability coefficient of 1.00 indicates a. a lot of variance in the test. b. a score with a high level of error. c. a perfect score which has no error. d. a typical correlation on most psychological and counseling tests.
c. a perfect score which has no error. As stated earlier, this generally occurs only in physical measurement.
Tests are often classified as speed tests versus power tests. A timed typing test used to hire secretaries would be a. a power test. b. neither a speed test nor a power test. c. a speed test. d. a fine example of an ipsative measure.
c. a speed test. In terms of difficulty, a speed test is really intended to be fairly easy. The difficulty is induced by time limitations, not the difficulty of the tasks or the questions themselves. (Try giving your secretary a timed keyboarding test and give him or her three hours to complete it and you'll see what I mean.) A good timed speed test is purposely set up so that nobody finishes it. A timed test is really a type of speed test, but a high percentage of the test takers complete it and it is usually more difficult and has a time limit (think CPCE or NCE).
An interest inventory would be least valid when used with a. a first-year college student majoring in philosophy. b. a third-year college student majoring in physics. c. an eighth-grade male with an IQ of 136. d. a 46-year-old white male construction worker.
c. an eighth-grade male with an IQ of 136. Interest inventories work best with individuals who are of high school age or above inasmuch as interests are not extremely stable prior to that time. Interests become quite stable around age 25.
Your client, who is in an outpatient hospital program, is keeping a journal of irrational thoughts. This would be a. an unethical practice based on NBCC ethical guidelines. b. considered a standardized test. c. an informal assessment technique. d. an aptitude measure.
c. an informal assessment technique. Self-reports, case notes, checklists, sociograms of groups, interviews, and professional staffings would also fall into the informal assessment category.
A true/false test has ________ recognition items. a. similar b. free choice c. dichotomous d. no
c. dichotomous "Dichotomy" simply means that you are presented with two opposing choices. This explains why choice "a" is definitely incorrect. When a test gives the person taking the exam three or more forced choices (e.g., the NCE, the CPCE, or this book) then psychometricians call it a "multipoint item." Choice "b" describes a situation in which the examinee can respond in any way he or she chooses.
Most experts would agree that the Wechsler IQ tests gained popularity, as the Binet a. must be administered in a group. b. favored the geriatric population. c. didn't seem to be the best test for adults. d. was biased toward women.
c. didn't seem to be the best test for adults. Choice "a" is incorrect—both the Binet and the Wechsler are individual tests which require specific training beyond that required for a group IQ test. David Wechsler felt the Binet was slanted toward verbal skills and thus he added "performance" skills to ascertain attributes which might have been cultivated in a background which did not stress verbal proficiency. Choice "c" is correct since the Binet was initially created for children.
Simon and Binet pioneered the first IQ test around 1905. The test was created to a. assess high school seniors in America. b. assess U.S. military recruits. c. discriminate children without an intellectual disability from children with an intellectual disability. d. measure genius in the college population.
c. discriminate children without an intellectual disability from children with an intellectual disability. The Minister of Public Instruction for the Paris schools wanted a test to identify children with an intellectual disability so that they could be taught separately. The assumption was made that intelligence was basically the ability to understand school-related material. In regard to choice "d," some experts believe that the Wechsler is a better test for those who fall in the average range, while the Stanford- Binet is more accurate for assessing extremes of intellect. Today the terms intellectual disability (ID) and Intellectual Development Disorder (IDD) have replaced the terminology used at the time, which carried negative connotations.
Construct validity refers to the extent that a test measures an abstract trait or psychological notion. An example would be a. height. b. weight. c. ego strength. d. the ability to name all men who have served as U.S. presidents.
c. ego strength. Any trait you cannot "directly" measure or observe can be considered a construct. (c)
IQ means a. a query of intelligence. b. indication of intelligence. c. intelligence quotient. d. intelligence questions for test construction.
c. intelligence quotient. A quotient is the result when you perform division. The early ratio formula for the Binet IQ score was MA/CA (i.e., mental age divided by your chronological age) × 100. The score indicated how you compared to those in your age group. Memory device: An MA is a high degree so put it on top of the equation as the numerator. IQ testing has been the center of more heated debates among experts than any other type of testing.
Most counselors would agree that a. more preschool IQ testing is necessary. b. teachers need to give more personality tests. c. more public education is needed in the area of testing. d. the testing mystique has been beneficial to the general public.
c. more public education is needed in the area of testing. Again, the public needs to know the limitations of testing (i.e., that they are fallible). If you've been doing counseling for any length of time then you've surely come in contact with clients who have been harmed by hearing a score (e.g., their IQ) and then reacting to it such that it becomes a negative, self- fulfilling prophecy.
A reliable test is ________ valid. a. always b. 90% c. not always d. 80%
c. not always Again shout this one out loud: A reliable test is not always valid. Reliability, nonetheless, determines the upper level of validity.
Both the Rorschach and the Thematic Apperception Test (TAT) are projective tests. The Rorschach uses 10 inkblot cards while the TAT uses a. a dozen inkblot cards. b. verbal and performance IQ scales. c. pictures. d. incomplete sentences.
c. pictures. The TAT consists of 31 cards. The test, which is intended for ages 4 and beyond, uses up to 20 cards when administered to any given individual (i.e., 19 selected to fit the age and sex of the client, plus one blank card). The pictures on each card are intentionally ambiguous, and the client is asked to make up a story for each of them. Choice "d" would describe a projective test such as the Rotter Incomplete Sentence Blank (RISB) in which the subject completes an incomplete sentence with a real feeling.
You are uncertain whether a test is intended for the population served by your not-for-profit agency. The best method of researching this dilemma would be to a. contact a local APA clinical psychology graduate program. b. e-mail the person who created the test. c. read the test manual included with the test. d. give the test to six or more clients at random.
c. read the test manual included with the test. The manual should specify the target population for the test in question.
Short answer tests and projective measures utilize free response items. The NCE and the CPCE uses forced choice or so-called ________ items. a. vague b. subjective c. recognition d. numerical
c. recognition
A counselor is told by his supervisor to measure the internal consistency reliability (i.e., homogeneity) of a test but not to divide the test in halves. The counselor would need to utilize a. the split-half method. b. the test-retest method. c. the Kuder-Richardson coefficients of equivalence. d. cross-validation.
c. the Kuder-Richardson coefficients of equivalence. In plain everyday verbiage, the supervisor wants the counselor to find out if each item on the test is measuring the same thing as every other item. Is performance on one item truly related to performance on another? This can be done by using the Kuder-Richardson reliability/item consistency estimates, which are often denoted on exams as the KR-20 or KR-21 formulas. Another statistic, Lee J. Cronbach's alpha coefficient, also has been used in this respect.
One major criticism of interest inventories is that a. they have far too many questions. b. they are most appropriate for very young children. c. they emphasize professional positions and minimize blue-collar jobs. d. they favor jobs that will require a bachelor's degree or higher.
c. they emphasize professional positions and minimize blue-collar jobs. Also take note of the fact that contrary to popular opinion interests and abilities are not—that's right, not—highly correlated. A client, for example, could have tremendous musical ability in music yet could thoroughly dislike being a musician.
The most critical factors in test selection are a. the length of the test and the number of people who took the test in the norming process. b. horizontal versus vertical. c. validity and reliability. d. spiral versus cyclical format.
c. validity and reliability.
The group IQ test movement began a. in 1905. b. with the work of Binet. c. with the Army Alpha and Army Beta in World War I. d. with Freudian psychoanalysis and the psychodynamic movement.
c. with the Army Alpha and Army Beta in World War I.
In a cyclical test a. the items get progressively easier b. the difficulty of the items remains constant c. you have several sections which are spiral in nature d. the client must answer each question in a specified period of time
c. you have several sections which are spiral in nature
John Ertl
claimed he invented an electronic machine to analyze neural efficiency and take the place of the paper and pencil IQ test. The device relies on a computer, an EEG, a strobe light, and an electrode helmet. The theory is that the faster one processes the perception, the more intelligence he or she has. I might add that thus far, counselors don't seem to be buying the idea!
ipsative measures
compare traits within the same individual, they do not compare a person to other persons who took the instrument; does not reveal absolute strengths The Kuder Career Planning instruments are often cited as falling into this category. The ipsative measure allows the person being tested to compare items.
criterion validity
could be concurrent or predictive
William Louis
created IQ formula
You want to admit only 25% of all counselors to an advanced training program in psychodynamic group therapy. The item difficulty on the entrance exam for applicants would be best set at a. 0.0. b. .5 regardless of the admission requirement. c. 1.0. d. .25.
d. .25. In most tests the level is set at .5 (i.e., 50% of the examinees will answer correctly while 50% will not). However, in this case the .25 level would allow you to ferret out the lower 75% you do not wish to admit. Item difficulty ranges from 0.0 (choice "a") to 1.0 (choice "c"). The higher the index number, the greater the number of examinees who will answer the question correctly. Or simply: The higher the number, the easier the question is to answer.
The same test is given to the same group of people using the test-retest reliability method. The correlation between the first and second administration is .70. The true variance (i.e., the percentage of shared variance or the level of the same thing measured in both) is a. 70%. b. 100%. c. 50%. d. 49%.
d. 49%. Here's the key to simplifying a question such as this. To demonstrate the variance of one factor accounted for by another you merely square the correlation (i.e., reliability coefficient). So .70 × .70 = .49 and .49 × 100 = 49%. Your exam could refer to this principle as the coefficient of determination.
________ would be an informal method of appraisal. a. IQ testing b. Standardized personality testing c. GRE scores d. A checklist
d. A checklist Unlike choices "a," "b," and "c," the informal method does not use standard administration or scoring procedures. I might tell a client to do her checklist or diary one way and you would go about it in a totally different manner.
Which measure would yield the highest level of reliability? a. A TAT, projective test popular with psychodynamic helpers. b. The WAIS-IV, a popular IQ test. c. The MMPI-2, a popular personality test. d. A very accurate postage scale.
d. A very accurate postage scale. In the real world physical measurements are more reliable than psychological ones. (d)
The first intelligence test was created by a. David Wechsler. b. J. P. Guilford. c. Francis Galton. d. Alfred Binet and Theodore Simon.
d. Alfred Binet and Theodore Simon. The year was 1904 and the French government appointed a commission to ferret out feeble-minded Parisian children from those who were normal (what kind of phrasing is this!?!). Alfred Binet led the committee and the rest is history. By 1905, Binet, along with his coworker Theodore Simon, created a 30-question test with school- related items of increased difficulty. Binet used his own daughters as test subjects in order to investigate mental processes and also is cited as one of the pioneers in projective testing based on his work with inkblots. After testing nearly 3,000 children in the United States in 1916, Lewis M. Terman of Stanford University published an American version of the Binet that was translated into English and adapted to American children. And in case you haven't already guessed, the word "Stanford" was added to the name.
Lewis Terman a. constructed the Wechsler tests. b. constructed the initial Binet prior to 1910. c. constructed the Rorschach. d. Americanized the Binet.
d. Americanized the Binet. Since Terman was associated with Stanford University the test became the Stanford-Binet.
________ did research and concluded that intelligence was normally distributed like height or weight and that it was primarily genetic. a. Spearman b. Guilford c. Williamson d. Galton
d. Galton Francis Galton felt intelligence was a single or so-called unitary factor. **I think he was a eugenicist
Which method of reliability testing would be useful with an essay test but not with a test of algebra problems? a. Test-retest. b. Alternate forms. c. Split-half. d. Inter-rater/inter-observer.
d. Inter-rater/inter-observer. Using choice "d," several raters assess the same performance. This method is also called "scorer reliability" and is utilized with subjective tests such as projectives to ascertain whether the scoring criteria are such that two persons who grade or assess the responses will produce roughly the same score.
In a counseling research study, two groups of subjects took a test with the same name. However, when they talked with each other they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is this possible? a. The researcher is not telling the truth. The groups could not possibly have taken the same test. b. The test was horizontal. c. The test was not a power test. d. The researcher gave parallel forms of the same test.
d. The researcher gave parallel forms of the same test. When a test has two versions or forms that are interchangeable they are termed parallel forms or equivalent forms of the same test. From a statistical/psychometric standpoint each form must have the same mean, standard error, and other statistical components.
A word association test would be an example of a. a neuropsychological test. b. a motoric test. c. an achievement test. d. a projective test.
d. a projective test. Although it is rare, some texts and exams take issue with the archaic word projective and refer to such tests as "self-expressive."
The MMPI-2 is a. an IQ test. b. a neurological test. c. a projective personality test. d. a standardized personality test.
d. a standardized personality test. The original version of this instrument was created in 1940. The Minnesota Multiphasic Personality Inventory-2, the current version used since 1989, is known as a "self-report" personality inventory. The client can respond with "true," "false," or "cannot say" to 567 questions (10 more than the traditional MMPI, which was the most researched test in history as well as the most useful for assessing emotional disturbance). The "new" MMPI, designated via the 2, is intended to help clinicians diagnose and treat patients. The test is said to have retained the best factors of the MMPI, while updating the test and eliminating sexist wording. The MMPI-2 is suitable for those over age 18. A sixth-grade reading level is required and testing time varies from 60 to 90 minutes. The test restandardization committee reported that the norming sample for the MMPI-2 is larger and more representative than the old measure. The MMPI offers computer report packages for specialized settings such as college counseling, chronic pain programs, or outpatient mental health centers. Wikipedia calls the MMPI, "the gold standard in personality testing." The MMPI-A is a 478-question version suitable for 14-to-18-year-old adolescents. My advice: Thumb through a few major testing catalogs before taking the exam. The major testing catalogs are available online.
Clients should know that a. validity is more important than reliability. b. projective tests favor psychodynamic theory. c. face validity is not that important. d. a test is merely a single source of data and not infallible.
d. a test is merely a single source of data and not infallible. Although the first three choices are important to the counselor, the final statement should be explained to the client. An extremely high score—say on a mechanical aptitude test—does not automatically imply that the client will prosper as a mechanic.
Your supervisor wants you to find a new personality test for your counseling agency. You should read a. professional journals. b. the Buros Mental Measurements Yearbook. c. classic textbooks in the field as well as test materials produced by the testing company. d. all of the above.
d. all of the above. Moreover, it has been discovered that if the counselor involves the client in the process of test selection it will improve his or her cooperation in the counseling process.
According to Public Law 93-380, also known as the Buckley Amendment, a 19-year-old college student attending college a. could view her record, which included test data. b. could view her daughter's infant IQ test given at preschool. c. could demand a correction she discovered while reading a file. d. all of the above.
d. all of the above. Persons over age 18 can inspect their own records and those of their children. The Family Educational Rights and Privacy Act (FERPA) also stipulates that information cannot be released without adult consent.
A counselor created an achievement test with a reliability coefficient of .82. The test is shortened since many clients felt it was too long. The counselor shortened the test but logically assumed that the reliability coefficient would now a. be approximately .88. b. remain at .82. c. be at least 10 points higher or lower. d. be lower than .82.
d. be lower than .82. Increasing a test's length raises reliability. Shorten it and the antithesis occurs. Note: The Spearman Brown formula is used to estimate the impact that lengthening or shortening a test will have on a test's reliability coefficient.
A test format could be normative or ipsative. In the normative format a. each item depends on the item before it. b. each item depends on the item after it. c. the client must possess an IQ within the normal range. d. each item is independent of all other items.
d. each item is independent of all other items.
A short answer test is a(n) ________ test. a. objective b. culture-free c. forced choice d. free choice
d. free choice Some exams will call this a "free response" format. In any case, the salient point is that the person taking the test can respond in any manner he or she chooses. Although free choice response patterns can yield more information, they often take more time to score and increase subjectivity (i.e., there is more than one correct answer). I should mention that although testing is often controversial, schools now employ psychoeducational tests more than at any time in history. Recently there has been a strong push against this practice so keep your eyes peeled to see what transpires.
In a projective test the client is shown a. something which is highly reinforcing. b. something which is highly charged from an emotional standpoint. c. a and b. d. neutral stimuli.
d. neutral stimuli. The idea here is that the client will "project" his or her personality if given an unstructured task. More specifically, there are several acceptable formats for projective tests: First, Association—such as "What comes to mind when you look at this inkblot?" Second, Completion—"Complete these sentences with real feelings." Third, Construction—such as drawing a person. The theory is that self-report inventories like the MMPI do not reveal hidden unconscious impulses. In order to accomplish this the client is shown vague, ambiguous stimuli such as a picture or an inkblot. Some counselors believe that by using projective measures a client will have more difficulty faking his or her responses and that he or she will be able to expand on answers. It should be noted that examiner bias is common when using projectives and a therapist using projective measures needs more training than one who merely works with self- report tests.
When a counselor tells a client that the Graduate Record Examination (GRE) will predict her ability to handle graduate work, the counselor is referring to a. good concurrent validity. b. construct validity. c. face validity. d. predictive validity.
d. predictive validity. The Graduate Record Examination (GRE), the Scholastic Aptitude Test (SAT), the American College Test (ACT), and public opinion polls are effective only if they have high predictive validity, which is the power to accurately describe future behavior or events. Again the subtypes of criterion validity are concurrent and predictive.
The counselor who favors projective measures would most likely be a a. Rogerian. b. strict behaviorist. c. TA therapist. d. psychodynamic clinician.
d. psychodynamic clinician. Choices "a," "b," and "c" all reflect positions that do not rely heavily on the unconscious mind (especially the behaviorists, who believe that if you can't directly measure the behavior, it is not meaningful). However, some theorists (e.g., Allport) would contend that even if it is true that unconscious impulses exist, they are not very important.
In a spiral test a. the items get progressively easier. b. the difficulty of the items remains constant. c. the client must answer each question in a specified period of time. d. the items get progressively more difficult.
d. the items get progressively more difficult.
concurrent validity
deals with how well the test compares to other instruments that are intended for the same purpose.
power test
designed to evaluate the level of mastery without a time limit.
incremental validity
extent to which a test contributes information beyond other more easily collected measures First and foremost, incremental validity has been used to describe the process by which a test is refined and becomes more valid as contradictory items are dropped. Incremental validity also refers to a test's ability to improve predictions when compared to existing measures that purport to facilitate selection in business or educational settings. When a test has incremental validity, it provides you with additional valid information that was not attainable via other procedures.
two-factor theory
illuminates the position of Charles Spearman, who in 1904 postulated two factors—a general ability G and a specific ability S which were thought to be applicable to any mental task. (Wasn't psychological theory simple in those days?)
Oscar K. Buros
noted for his Mental Measurements Yearbook (MMY), which was the first major publication to review available tests. After his death, the University of Nebraska set up the Oscar K. Buros Center, which continued his valuable contribution to the field by producing MMY series books packed with professional reviews to help counselors pick appropriate tests.
the Q-Sort
often used to investigate personality traits, involves a procedure in which an individual is given cards with statements and asked to place them in piles of "most like me" to "least like me." Then the subject compiles them to create the "ideal self." The ideal self can then be compared to his or her current self-perception in order to assess self-esteem.
self-expressive test
projective test
forced choice items are sometimes known as ___
recognition items
Raymond B. Catell
responsible for the fluid (inherited neurological that decreases with age and is not very dependent on culture) and crystallized intelligence (intelligence from experiential, cultural, and educational interaction). Crystallized intelligence is measured by tests that focus on content. Fluid intelligence is tested by what has been called "content-free reasoning" such as a block design or a pictorial analogy problem.
consequential validity
simply tries to ascertain the social implications of using tests.
content validity
sometimes called rational or logical validity Does the test examine or sample the behavior under scrutiny? An IQ test, for example, that did not sample the entire range of intelligence (say the test just sampled memory and not vocabulary, math, etc.) would have poor content validity. In this case a savant might truly score higher than a well-rounded individual with a gifted level of intelligence.
halo effect
tendency of an interviewer to allow positive characteristics of a client to influence the assessments of the client's behavior and statements
Psychometrics
the study of psychological measurement and thus a helper who primarily administers and interprets tests often has the job title of psychometrician. *An effective counselor will always inform clients about the limitations of any test that he or she administers. Some evidence indicates that neophyte counselors are sometimes tempted to administer tests merely to boost their credibility. I think it is safe to say this is not a desirable practice.
