Psych Assessment I
What are ordinal scales? (2)
- A form of order or ranking - Fails to provide information about the relative strength of rankings (ex: race winners)
Test (3)
- A test is a standardized procedure for sampling and describing behavior with categories or scores. - Most tests have "norms" or "standards" by which the results can be used to predict other more important behaviors - A test is considered to be standardized if the procedures used for administering it are uniform from one examiner to another
What is face validity?
- A type of content validity - The extent to which the test looks valid to test users, examiners, and examinees - This is not actually a form of validity; it is more of a matter of social acceptability and not a technical form of validity.
What is rapport? (3)
- A warm, comfortable atmosphere that serves to motivate examinees and elicit cooperation. - Important for examiners to establish rapport with their patients - Is particularly important in individual testing and when evaluating children.
What is a power test?
- Allows enough time for test-takers to attempt all items, but is constructed so that no test taker is able to obtain a perfect score
Aptitude Tests vs. Achievement Tests (4)
- Aptitude tests predict what an individual will accomplish in the future - Achievement tests report what the individual has learned in the past - The main difference lies moreso in the usage than the content - The same instrument can be used as both an achievement test or an aptitude test
Creativity Tests (3)
- Assess a subject's ability to produce new ideas, insights, or artistic creations that are considered to have social, aesthetic, or scientific value - Emphasize novel ideas and originality in the solution of "fuzzy" problems - Require divergent thinking (putting forth a variety of answers to a complex problem as opposed to convergent thinking, which asks for a single solution)
What is a speed test? (3)
- Contains items of uniform and generally sample levels of difficulty. - If time permitted, the subject would complete most or all of the items - Because of time constraints, most subjects do not complete the entire test
What is an age norm? (5)
- Depicts the level of test performance for each separate age group in the normative sample. -The purpose is to facilitate same-aged comparisons - The performance of each examinee is interpreted in relation to standardized subjects of the same aged - Typically used for school-aged children - Common in IQ testing
What is a grade norm? (3)
- Depicts the level of test performance for each separate grade in the normative sample - Rarely used with ability tests - Useful in school settings when reporting the achievement levels of schoolchildren
How can a well-trained psychometrician detect conscious faking?
- Does the client have a motivation to perform deceitfully? - Is the overall pattern of test results suspicious in light of the other information known about the client?
What is the ETS? (2)
- Educational Testing Services, a non-profit that directs the development, standardization, and validation of the GRE, LSAT, and Peace Corps Entrance Exam -The College Entrance Examination Board (CEEB) was subsumed under ETS
Who is considered the father of mental testing? What were some of his accomplishments? (2)
- Francis Galton - Set up a psychometric laboratory in London at the International Health Exhibition in 1884. - Tested at least 17,000 individuals during the 1880's and 1890's.
How do examiners interpret scores of tests? (2)
- Generally the examiner will compare the individual's scores to those obtained by others on the same test. - Test developers usually provide norms, or a summary of test results from a large and representative group of subjects, to make these comparisons
What are ratio scales? (2)
- Has all the characteristics of an interval scale, but also possesses a conceptually meaningful zero point in which there is a total absence of the characteristic being measured - Rare in psychology
What is the new age-relative formula for intelligence? Why is it used?
- IQ = (actual score)/(expected score for age) - Used in the Weschler scales because IQ should remain constant with normal aging even if the raw score changes
What did Terman contribute to the measurement of intelligence? (2)
- In 1916, Terman suggested multiplying Stern's Intelligence Quotient by 100 to eliminate fractions. - Was the first individual to use the abbreviation "IQ"
What is a raw score? (2)
- In isolation, they are meaningless - Interpreted only by norm-references or criterion-references
What are the descriptive statistics for the Weschler Intelligence Scales?
- Mean= 100, SD= 15 - Scaled scores have a Mean= 10 and SD=3, which allows the examiner to analyze subtest scores for relative strengths and weaknesses
Achievement Tests (2)
- Measure a person's degree of learning, success, or accomplishment in a subject matter - The purpose is to determine how much of the material the subject has observed or mastered.
Interests Inventories (2)
- Measure an individual's performance for certain activities or topics and thereby help determine occupational choice - Are based on the explicit assumption that interest patterns determine and, therefore, predict job satisfaction
Personality Tests (2)
- Measure the traits, qualities, or behaviors that determine a person's individuality - Helps us to predict future behavior
Why are levels of measurement relevant to test construction?
- More powerful parametric statistical procedures can be used only for scores derived from interval or ratio scales - Nominal and ordinal data can only use nonparametric statistical procedures
What is an expectancy table? (2)
- Portrays the established relationship between test scores and expected outcomes on a relevant task. - Are practical because new examinees receive a probabilistic preview of how well they are likely to do on the criterion (ACT score --> college GPA)
How did Charles Spearman define intelligence? (2)
- Proposed that intelligence consists of two kinds of factors: a single general factor (g) and numerous specific factors (s1, s2, s3,...) - Spearman invented factor analysis in order to aid in his investigation of intelligence
What are interval scales? (2)
- Provides information about ranking, but also supplies a metric for gauging the differences between rankings - 1-100 scales, for example
Why did psychologists realize that the Binet-Simon tests may not be appropriate for all populations? Which populations in particular?
- The Binet-Simon was heavily focused on verbal skills - The tests was not appropriate for individuals with speech, language, or hearing impairments or non-English speakers.
In 1921, the _______________ was founded by _______________.
- The Psychological Cooperation, the first major test publisher -Cattell, Thorndike, and Woodsworth
What is a norm-referenced test? (2)
- The majority of tests are norm-referenced - The performance of each examinee is interpreted in reference to a relevant standardization sample.
What is the mean? (2)
- The most commonly reported measure of central tendency in psychology - Is sensitive to extreme values, so it can be misleading if a distribution has a few scores that are unusually high or unusually low.
What is a drawback of using percentile scores? (2)
- The underlying measurement is distorted - The raw score differences between 99 and 90 is far greater than the raw score difference of 50 and 59, even though the percentile score differences are the same.
What is the benefit of a normal distribution? (4)
- They are mathematically precise - The percentage of cases falling within a certain range or beyond a certain value can be precisely known -Arise spontaneously in nature -No skew; symmetrical
What is convenient about standard scores, t-scores, stanines, and percentiles?
- They can all be transformed into one another, especially if the underlying distribution is normally distributed.
Neuropsychological Tests (2)
- Used in the assessment of persons with known or suspected brain dysfunction - Examiners must undergo comprehensive advanced training in order to make sense of the abundance of data
When did Personality Tests emerge? What was their purpose?
- WWII era - Woodsworth attempted to develop an instrument for detecting Army recruits susceptible to psychoneurosis, or shell-shock - Questions were "face obvious" and answered with either "yes" or "no"
Intelligence Tests (2)
- Were designed to sample a broad assortment of skills in order to estimate the individual's general intellectual levels. -Incorporate heterogenous tasks such as word definitions, memory for designs, comprehension questions, and spatial-visual tasks
What is the Standard Error of Measurement (SEM)?
- an index of measurement error that pertains to the test in question - in the hypothetical case that SEM = 0, there would be no measurement error - the SEM for any given test never changes - more reliable test = lower SEM
How did Binet translate mental "levels"? (3)
-Binet translated mental levels into *mental ages* . - When determining intelligence, testers compared the child's mental age to his/her chronological age. - Example: a 9-year-old child functioning at the mental age of a 6-year-old would be considered to be retarded by 3 years.
How did Gardner define intelligence? (2)
-Gardner proposed a theory of multiple intelligences based loosely on the study of brain-behavior relationships -Proposed that there are several intelligences which are independent of each other
What is fluid intelligence? (2)
-High-level reasoning used for novel tasks that cannot be performed automatically. -These abilities are mostly nonverbal and not heavily dependent on exposure to a specific culture
Aptitude Tests (2)
-Measure one or more clearly defined and relatively homogenous segments of ability. - These tests are often used to predict success in an occupation, training course, or educational endeavor (GRE)
What are nominal scales? (2)
-Numbers serve only as category names -The numbers are arbitrary
What is construct validity? (2)
-Pertains to psychological tests that claim to measure complex, multifaceted, and theory-bound psychological attributes such as intelligence, creativity, etc. -Necessary because no criterion or universe of content is accepted as entirely adequate to define the quality to be measured
What is equilibration? (4)
-Proposed by Piaget -A mechanism by which schemas become more mature -assimilation: application of a schema to an object, person, or event -accommodation: the adjustment of an unsuccessful schema so that it works
Early on, Binet used some "Catellian" approaches to intelligence such as ____. (4)
-Reaction time -Sensory acuity - Found the results to be inconsistent and hard to interpret - On average, children's RT"s were slower than those of adults, but in some instances they were equal to or faster than those of adults
What are extravalidity concerns? (2)
-Side-effects and unintended consequences of testing -Even if a test is valid, unbiased, and fair, the decision to use it may be governed by additional considerations
What was Stern's criticism of Binet's mental ages? (3)
-Stern pointed out that being retarded by 3 years had difference meanings at different ages - Suggested an *intelligence quotient* to be computed instead in order to better measure the relative functioning of a subject compared to his or her same-aged peers. - Intelligence Quotient= (mental age)/(chron. age)
How did Guilford define intelligence? (4)
-Structure of Intellect Model (SOI) - Identified 5 types of operations, 5 types of content, and 6 types of products, or *150 different factors of intellect* - Good because it captures the complexities of intelligence -Bad because its too complex
What are two principal causes of error that may occur when testing children?
-Suggestibility -Lack of attention
What is the method of empirical keying? (3)
-Test items are selected for a scale based on how well they contrast a criterion group from a normative sample - Emphasizes the selection of items that discriminate between normal individuals and members of different diagn ostic groups, regardless of whether the items appear theoretically relevant to the diagnoses of interest. - Makes it possible to construct measurement scales based entirely on empirical considerations devoid of theory or expert judgement.
What is predictive validity? (3)
-Test scores are used to estimate outcome measures obtained at a later date. -Entrance exams, employment tests, etc. -Purpose is to determine who is likely to succeed at a future endeavor
Apgar Test (3)
-This test is given to infants almost immediately after birth, making it the first test individuals take. - The Apgar is a quick, multivariate assessment of heart rate, respiration, muscle tone, reflex irritability, and color. - Apgar tests are scored on a range of 0-10 to help determine the need for immediate medical attention after birth
What is discriminant validity?
-When a test does not correlate with variables with variables or tests from which it should differ.
Psychological tests have 5 main uses
1. Classification: assigning a person to one category rather than another a. Screening b. Certification 2. Diagnosis and treatment planning 3. Self-Knowledge 4. Program Evaluations 5. Research
What are the types of criterion-related validity?
1. Concurrent validity 2. Predictive validity
What are the 3 types of validity?
1. Content 2. Criterion 3. Construct A good test should satisfy all three.
What are the types of construct validity?
1. Convergent validity 2. Discriminant validity 3. Classification accuracy
What are the four ways in which 1905 intelligence scales differed from earlier scales?
1. Did not precisely measure any single skill. Instead, it was aimed at assessing a child's general mental development with a heterogenous group of tasks. Aim was classification- not measurement. 2. Was brief and practical; required little equipment 3. Measured practical judgement, which Binet and Simon regarded as the essential factor of intelligence *4. Items were arranged by level of difficulty- not by content*
What are the four levels of measurement?
1. Nominal scales 2. Ordinal scales 3. Interval scales 4. Ratio scales
What were the tests and measures used by Galton?
1. Physical domains: head length, height, weight, arm span, length of middle finger, length of lower arm, etc. 2. Behavioral domains: strength of hand-squeeze, vital capacity of lungs, visual acuity, highest audible tone, speed of blow, reaction time, etc.
What features do all tests typically posses? (5)
1. Standardized procedures 2. Behavior sample 3. Scores or categories 4. Norms or standards 5. Prediction of contest behaviors
Why did the development of aptitude tests lag behind the development of intelligence tests? (2)
1. Statistical problems: factor analysis was needed to discern which aptitudes were primary. Without computer power, this is very tedious and time-consuming 2. Social problems: there was no practical use for such an instrument. It wasn't until WWII that it would be necessary to select candidates who were highly qualified for difficult or specialized tasks
Why is the history of psychological testing pertinent to contemporary testing? (3)
1. The origins of testing can help explain current practices that may otherwise seem arbitrary or peculiar. 2. The strengths and limitations of testing are better recognized when viewed in a historical context. 3. The history of psychological testing contains some sad and regrettable episodes that remind us not to be overly-zealous in our modern applications of testing.
What are the two parts of diagnosis?
1. To determine the nature and source of a person's abnormal behavior 2. To classify the behavior within an accepted diagnostic system
What are the three ways to revise a test?
1. item analysis: identify unproductive items in the preliminary test so they can be revised, eliminated, or replaced 2. cross-regression: using the original regression equation in a new sample to determine whether the test predicts the criterion as well as it did in the original sample 3. feedback from examinees
What are the two parts of classification accuracy?
1. sensitivity: accurate identification of patients who have a syndrome 2. specificity: accurate identification of "normal" patients
What are the four types of reliability?
1. test-retest: administering the same test twice to the same group of heterogenous and representative subjects 2. alternate-forms reliability: derived by administering both forms of a test to the same group and then correlating correlating the two sets of scores 3. split-half: correlating the pairs of scores obtained from equivalent halves of a test administered only once to a representative sample of examinees 4. interscorer: a sample of tests is independently scored by two or more examiners and scores for pairs of examiners are then correlated.
How do most experts define intelligence? (2)
1. the capacity to learn from experience 2. the capacity to adapt to ones environent
What are routing procedures?
A method to estimate cognitive abilities of the examinee before proceeding with the rest of the test. This helps to establish an appropriate starting point for subsequent subtests.
What is a Likert scale?
A simple and straightforward method for scaling attitudes that is widely used today. These present examinees with a continuum of disagree to agree statements.
How was the Binet-Simon test improved after Goddard's 1908 translation?
A spate of performance scales were created, particularly for individuals who would not be able to use the typically Binet-Simon due to language deficits
What is the digit span test?
A verbal test where the examiner reads a series of digits and asks the examinee to repeat them back either forward or backward. -After 2 consecutive successes on a trial of the same length, the examiner adds one more digit until they reach 9 digits (not everyone does)
What is the method of rational scaling?
All scale items correlate positively with each other and with the total score for the scale; ensures internal consistency
What is crystalized intelligence?
An individual's breadth and depth of acquired cultural knowledge such as language, information, and concepts influenced by their culture
What do procedures used to evaluate construct validity seek to answer?
Based on current theoretical understanding of the construct that the test claims to measure, do we find the kinds of relationships with non-test criteria that the theory predicts? In other words, does the operational definition actually reflect the true theoretical meaning of the concept?
Who developed the first intelligence tests? What was the purposed?
Binet created the first intelligence tests in the early 1900's to help identify children in Paris schools who were unlikely to benefit from ordinary instruction.
What is concurrent validity?
Criterion measures are obtained at approximately the same time as the test scores.
What is content validity?
Determined by the degree to which the questions, tasks, or items on a test are representative of the universe the test was designed to sample.
Who placed heavy emphasis on language skills in the diagnosis of mental retardation?
Esquirol
What is one possible reason why current intelligence tests place so much emphasis on language skills?
Esquirol placed heavy emphasis on linguistic abilities in the diagnosis of mental retardation
What is a norm group?
Examinees who are representative of the population for whom the test is intended.
What is a percentile?
Expresses the percentage of individuals in the standardization sample who scored below a specific raw score.
How did Galton positively influence the testing movement?
He demonstrated that objective tests could be devised and that meaningful scores could be obtained through standardized procedures.
Certification
Implies that a person has at least a minimum proficiency in some discipline or activity. These tests have a pass/fail quality and confer privileges. *used in classification*
What is the earliest documented form of testing? When did it occur?
In 2200 BC, the Chinese emperor had his officials examined every third year to determine their fitness for office.
According to Binet, what is true about a child's intelligence?
In young children, a child's exact mental level *should not* be considered to be an absolute measure of intelligence.
What is the single greatest source of error in group test administration?
Incorrect timing for tests that require a time limit. Examiners must be careful to allot sufficient time for the entire testing process, including setup, reading of instructions, and the actual test-taking period.
What did the Stanford Binet correct in its 4th edition revision?
It added subcomponent scores in addition to composite scores so that examiners could easily examine the subtests.
What is significant about the WISC-IV?
It has excellent standardization because the sample was stratified off of census data.
What is significant about the Weschler vocabulary subtest?
It is the best measure of overall intelligence because acquiring word meanings necessitates contextual inferences
What is an experimental design used to confirm convergent or discriminant validity?
Multitrait-multimethod matrices. These call for the assessment of two or more traits by two or more methods.
How is construct validity of a test evaluated?
Must amass a variety of evidence through numerous sources
Screening
Refers to the quick and simple tests or procedures to identify persons who might have special characteristics or needs * used in classification*
What was a major flaw in the Brass Instruments Era and early experimental psychology?
Simple, sensory procedures were mistaken for measures of intelligence.
How did Sternberg define intelligence?
Sternberg proposed a theory of triarchic intelligence which proposed three aspects of intelligence: componential, experiential, and contextual
What is a strength of modern personality tests?
The MMPI-II has incorporated various validity scales to detect false response tendencies
What is the difference between the designs of the Rorschach test and the Thematic Apperception Test?
The Rorschach test was designed to reveal the thoughts of an "abnormal" person, but the TAT was designed to study "normal" personality.
How does the SB5 appeal to a larger clientele?
The SB5 has extensive high-end and low-end items to better suit more gifted or mentally retarded individuals, making it an ideal tests for individuals on either extreme of the cognitive spectrum.
What was one of the earliest performance measures in the Binet-Simon?
The Sequin Board, an upright stand with depressions for differently-spaced blocks.
How did Weschler define intelligence?
The aggregate or global capacity of the individual to act purposefully, think rationally, and effectively deal with one's environment. We can only known intelligence by what it enables a person to do.
What is validity?
The extent to which a test measures what it claims to measure.
What is the Standard Error of Estimate?
The margin of error to be expected in the predicted criterion score due to imperfect validity of the test.
What is unique about the SB5?
The use of routing procedures.
What is the purpose of exploratory factor analysis?
To summarize the interrelationships among a large number of variables in a concise and accurate manner as an aid to conceptualization.
What is the KBIT-2?
Used as a preliminary test to signal any need for additional testing. Common in intelligence research because it is so quick and easy to do. The KBIT-2 should NOT substitute other intelligence tests.
What is a criterion-referenced test?
Used to compare examinees' accomplishments to a predefined performance standard
What is convergent validity?
When a test correlates highly with other variables or tests with which it shares an overlap of constructs
What is criterion-related validity?
When a test is shown to be effective in estimating an examinee's performance on some outcome measure. The variable of primary interest is the outcome measure, called the criterion.
When is the interpretation of a psychological test most reliable?
When the measurements are obtained under standardized conditions outlined in the publisher's test manual
Most investigators look to _________ measures to study intelligence, but some have sought to discern the nature of intelligence by looking at ___________.
behavior; the brain itself
Good ______________ has been demonstrated in several studies correlating the WAIS with mainstream IQ tests and measures of academic success.
criterion-validity
The objective of criterion-referenced tests is to ________.
determine where the examinee stands with respect to very tightly-defined educational objectives. Norm-referencing is not necessary. Ex: A=90, B= 80, C= 70...
In a normal distribution, what is the relationship among measures of central tendency?
mean = median = mode
The majority of tests are ___________.
norm-referenced tests
The purpose of the tests is to ______.
predict additional behaviors other than those directly sampled by the test.
Diagnosis is generally a precursor to ____.
remediation or treatment of personal distress or impaired knowledge
X = T + e
reported score = true score + error
What is a standard score?
z= (X-M)/SD
