GRE PSY - CH 10.2: Test and Measurement
Discriminate validity
Discriminate validity is the test Performance NOT correlated with performance on a test measuring a theoretically related variable
Wechsler Intelligence Scale for Children (WISC-R)
Is for children aged six to sixteen.
Blacky pictures
The Blacky pictures test was a projective test, employing a series of twelve picture cards, used by psychoanalysts in mid-20th century America and elsewhere, to investigate the extent to which children's personalities were shaped by Freudian psychosexual development.
Deviation IQ
expression of a person's IQ test relative to his or her same-aged peers divide a persons test score compared to others of the same age * 100
Types of measurement Scales
1. Nominal/ Categorical 2. Ordinal 3. Interval 4. Ratial (Trong số các loại cân đo lường )
Anne Anastasi
Researched intelligence in relation to performance.
aptitude test
a test designed to assess a person's capacity to benefit from education or training a test designed to predict a person's future performance Include intelligence test (Năng khiếu; năng lực, khả năng)
Empirical-Keying or Criterion-Keying Approach
refers to an approach to test Development that emphasizes the selection of items that discriminate between normal individuals and members of different diagnostic groups, regardless of whether the items appear theoretically relevant to the diagnoses of interest. - Method to select questions for personality inventories where the items are chosen and weighted according to social criterion. - An individuals' responses to the items determine if he is like a particular group or not. The STRONG-CAMPBELL INTEREST INVENTORY is an ex.
achievement test
to assess what one knows or can do know, they can test adequacy of learning content and skills
Alfred Binet
Developed both the concept of the IQ and the first intelligence test (Binet scale). - IQ is sitll most commonly computerd by Binet's equation: - (mental age/chronological age) x 100. Mental age is hte age level of a person's functioning according to the IQ test. The highest chronological age used in the computation is 16. After that, intelligence seems to stop developing; therefore, to use adult ages would unnecessarily decrease the IQ ratio.
Norm-referenced test
Norm-referenced refers to standardized tests that are designed to compare and rank test takers in relation to one another. Norm-referenced tests report whether test takers performed better or worse than a hypothetical average student, which is determined by comparing scores against the performance results of a statistically selected group of test takers, typically of the same age or grade level, who have already taken the exam. Ex: Erika did better in spelling than 99 % of second graders tested
Ratio scale
Unlike Interval scale, there is true zero point that indicate the total absence of the quantity being measured
IQ
a measure of intelligence that takes into account a student's mental and chronological age Test score based on mental age divided by chronological age multiplied by 100
RELIABILITY TEST
- Reliability is the degree to which an assessment tool produces stable and consistent results. - Reliability in statistics and psychometrics is the overall consistency of a measure.[1] A measure is said to have a high reliability if it produces similar results under consistent conditions. "It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. - Hight reliability >>>means test measure are dependable, consistent. - Standard error of Measurement (SEM) is smaller the test is better. (Sự đáng tin cậy; tính đáng tin cậy; assessment : sự đánh giá)
Draw-A-Person Test
Asks the subject to draw a person of each sex and to tell a story about them.
Face Validity
Face Validity ascertains that the measure appears to be assessing the intended construct under study. The stakeholders can easily assess face validity. Although this is not a very "scientific" type of validity, it may be an essential component in enlisting motivation of stakeholders. If the stakeholders do not believe the measure is an accurate assessment of the ability, they may become disengaged with the task. Example: If a measure of art appreciation is created all of the items should be related to the different components and types of art. If the questions are regarding historical time periods, with no reference to any artistic movement, stakeholders may not be motivated to give their best effort or invest in this measure because they do not believe it is a true assessment of art appreciation. Ex: measuring knowledge of 20th century American history but test is about 20th European history >>> The test lack face validity
Interval scale
Interval scale is used actual numbers, NOT RANKED, such as numbers corrects on the spelling test Equal intervals
California Personality Inventory (CPI)
Is a personality measure generally used for more "normal" and less clinical groups than the MMPI. - for ages 13 and up. - It was developed by Harrison Gough at University of California, Berkeley.
Rorschach Inkblot Test
Requires that the subject describe what he sees in each of ten inkblogs - cards. - Scoring is complex. - The validity of the test is questionable, but its fame is not. Created by Hermann Rorschach
Hathaway and Mckinley
Empirical criterion-keying approach
PROJECTIVE PERSON TEST
PROJECTIVE PERSON TEST
Personality Inventories
Personality Inventories is a self-rating, consisted 100-500 statements. a type of questionnaire designed to reveal the respondent's personality traits. (sự tóm tắt; bản tóm tắt CA NHAN)
adaptive test
computerizing with achievement test, ability by assessing the answers questions
Types of Validity
1. content validity (Sampling Validity ) 2. Face Validity 3. Criterion-Related Validity 4. Predictive validity 5. Construct Validity 6. . Formative Validity
Rosenzweig Picture-Frustration (P-F) Study
Consists of cartoons in which one person is frustrating another person. - The subject is asked to describe how the frustrated person responds. (frustrating : Làm nản lòng, gây sự bực dọc)
John Horn and Raymond Cattell
Found that FLUID INTELLIGENCE (knowing how to do something) declines with old age while CRYSTALLIZED INTELLIGENCE (known a fact) does not
mental age
Mental age is a concept related to intelligence. It looks at how a specific child, at a specific age—usually today, now—performs intellectually, compared to average intellectual performance for that physical age, measured in years. - The physical age of the child is compared to the intellectual performance of the child, based on performance in tests and live assessments by a psychologist. Scores achieved by the child in question are compared to scores in the middle of a bell curve for children of the same age [1]
PERSONALITY TEST (T)
Objective Personality Inventories Projective Tests
TEST AND MEASUREMENT
TEST AND MEASUREMENT
Barnum effect
The Barnum effect, also called the Forer effect, is the observation that individuals will give high accuracy ratings to descriptions of their personality that supposedly are tailored specifically for them, but are in fact vague and general enough to apply to a wide range of people.
ABILITY TEST
aptitude test achievement test
content validity
refer to relate particular skill area that is wanted to measure - Does the test measure various facets of American history?
Thematic Apperception Test (TAT)
was devised by Christiana Morgan and Henry Murray is consisted 20 CARDS (Is MAYBE made up of 31 cards (1 blank and 30 with pictures). The pictures show various interpersonal scenes (two people facing each other). - The subject tells a story about each of the cards, which reveals aspects of her personality. - The TAT is often used to emasure need for achievement. NEEDS, PRESS, and PERSONOLOGY are terms that go along with interpretin the test.
Types of Reliability:
1. Test-retest reliability 2. Parallel forms reliability 3. Inter-rater reliability 4. Internal consistency reliability a. Average inter-item correlation b. Split-half reliability
Alternate form reliability
Alternate form reliability occurs when an individual participating in a research or testing scenario is given two different versions of the same test at different times. + Average inter-item correlation + Split-half reliability
Structured Tests
Are often seen as more objectively scored than PROJECTIVE TESTS. Most objective tests are self-reported - in other words, the subject records her own responses. However, these tests are not completely objective, because any self-report measure allows for the subject to bias her answers.
Vocational Tests
Assess to what extent an individual's interests and strengths match those already found by professionals in a particular job field.
IQ Correlates...
Correlates most positively with IQ of biological parents (not adoptive parents) and socioeconomic status of parents (measured by either income or job-type).
Q-sort or Q-measure Technique
Is the process of sorting cards into a normal distribution. Each card has a different statement on it pertaining to personality. The subject places the cards that he is neutral about at the hump of the curve. Toward one end, he places cards that he deems "very characteristic" of himself, and toward the other end, he places the "not characteristic" cards.
Nominal scale
Labels/ name observation that can be categorized ex. Girl, blue eyes..
VALIDITY TEST
Validity refers to how well a test measures what it is purported to measure. Why is it necessary? While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be valid. - For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight.
Predictive validity
When a test is used to to predict future performance ex GPA - Does the test predict future success as aa history major?
concurrent validity
When the test is given the same time as the criterion measure such as a written driving and a road test - (Đồng lòng, đồng ý, nhất trí; hợp nhau)
William Stern
William Stern, born Louis William Stern, was a German psychologist and philosopher noted as a pioneer in the field of the psychology of personality and intelligence. - He was the inventor of the concept of the intelligence quotient, or IQ, later used by Lewis Terman and other researchers in the development of the first IQ tests, based on the work of Alfred Binet. He was the father of the German writer and philosopher Günther Anders. In 1897, Stern invented the tone variator, allowing him to research human perception of sound in an unprecedented way.
Average inter-item correlation
Average inter-item correlation is a subtype of internal consistency reliability. It is obtained by taking all of the items on a test that probe the same construct (e.g., reading comprehension), determining the correlation coefficient for each pair of items, and finally taking the average of all of these correlation coefficients. This final step yields the average inter-item correlation.
Construct Validity Or Convergent Validity
Construct Validity refers to how well performance on the test fits in to the theoretical framework related to what it is you want to test to measure. Is test performance correlated with performance on a test measuring a theoretically related variable Ex: Interest in history - Test have to related in history) Example: A women's studies program may design a cumulative assessment of learning throughout the major. The questions are written with complicated wording and phrasing. This can cause the test inadvertently becoming a test of reading comprehension, rather than a test of women's studies. It is important that the measure is actually assessing the intended construct, rather than an extraneous factor.
Julian Rotter
Created the Internal-External Locus of Control Scale to determine whether a person feels responsible for the things that happen (internal) or that he has no control over the events in life (external).
Criterion-Related Validity
Criterion-Related Validity is used to predict future or current performance on an established test of the same skill or knowledge area - it correlates test results with another criterion of interest - Predictive validity - concurrent validity (Criterion : Tiêu chuẩn; correlates : liên quan tới, phối hợp) Example: If a physics program designed a measure to assess cumulative student learning throughout the major. The new measure could be correlated with a standardized measure of ability in this discipline, such as an ETS field test or the GRE subject test. The higher the correlation between the established measure and new measure, the more faith stakeholders can have in the new assessment tool.
Objective Tests
Do not allow subjects to make up their own answers, so these tests are relatively structured. An objective test is a psychological test that measures an individual's characteristics independent of rater bias or the examiner's own beliefs, usually by the administration of a bank of questions marked and compared against exacting scoring mechanisms that are completely standardized, much in the same way that examinations are administered.
Goodenough Draw-A-Man Test
For children is notable for its (relatively) cross-cultural application and simple directions: "Make a picture of a man. Make the very best picture that you can." Children are scored based on detail and accuracy, not artistic talent.
Mean IQ
Is 100, with a standard deviation of 15 or 16, depending on the test
F-Scale or F-Ratio
Is a measurement of fascism or authoritarian personality
Wechsler Preschool and Primary Scale of Intelligence
Is for children aged four to six.
Split half reliability
Split-half reliability is another subtype of internal consistency reliability. The process of obtaining split-half reliability is begun by "splitting in half" all items of a test that are intended to probe the same area of knowledge (e.g., World War II) in order to form two "sets" of items. The entire test is administered to a group of individuals, the total score for each "set" is computed, and finally the split-half reliability is obtained by determining the correlation between the two total "set" scores.
Robert Zajonc
Studied the relationship between birth order and intelligence. - He found that firstborns were slightly moreterm-47 intelligent than secondborns and so on. He also found that the more children present in a family, the less intelligent they were likely to be. This relationship seems to also be affected by the spacing of the children, with greater spaces between children leading to higher intelligence.
test-retest method
Test-retest reliability is a measure of reliability obtained by giving the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time. Example: A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores. (phương pháp kiểm tra-thi lại; correlated : được liên hệ với)
Walter Mischel
Was extremely critical of personality trait-theory and of personality tests in general. He felt that situations (not traits) decide actions.
Minnesota Multiphasic Personality Inventory (MMPI)
Was originally created to determine mental illness, but is now used as a personality measure. - The MMPI consists of 550 "true/false/not sure" questions. Most notably, the MMPI contains items (such as "I would like to ride a horse") that have been foudn to discriminate between different disorders and that subjects could not "second guess." The test has high validity primarily because it was constructed with highly discriminatory items and because it has three VALIDITY SCALES (questions that assess lying, carelessness, and faking).
Word Association Test
Was originally used in conjunction with free-association techniques. - A word is called out by a psychologist, and the subject says the next word thatcomes to mind. Word association practice test is based on objective type questions and answers , This is actually analogy test, two items or words bear a particular relation with each other, third item bears the same relation with one of the answer choices ,you are suppose to find relation and association between words .
Ratio IQ
mental age/chronological age x 100 (Mental age/physical age) x 100 mental age/physical age Developed by William Stern
Empirical criterion-keying approach
tested thousands of questions that differentiated between patient and non patients Empirical criterion keying refers to an approach to test Development that emphasizes the selection of items that discriminate between normal individuals and members of different diagn ostic groups, regardless of whether the items appear theoretically relevant to the diagnoses of interest.
Beck Depression Inventory (BDI)
Is not used to diagnose depression. Rather, it is widely used to assess the severity of depression that has already been diagnosed. The BDI allows a therapist or researcher to follow the course of a person's depression.
Stanford-Binet Intelligence Scale
Is the revised version of Alfred Binet's original intelligence test. - LEWIS TERMAN of Stanford University was the first to revise it, hence the name. - The Stanford-Binet is used with children and is organized by age level. OF all the intelligence tests, the Stanford-Binet is the best known predictor of future academic achievement. - Terman is also famous for his studies with gifted children and for the finding that children with higher IQs are better adjusted.
Domain-referenced or criterion-referenced
Domain-referenced, or criterion-referenced, test interpretation is the concept that an examinee's scores on a test are interpreted with reference to the particular cognitive ability being assessed rather than in comparison with the performance of a population of individuals (norm-referenced testing). - For example, When takers take test for driver's license. What is important is not compare with other takers but how many questions can do for passing the test.
Projective Tests
In psychology, a projective test is a personality test designed to let a person respond to ambiguous stimuli, presumably revealing hidden emotions and internal conflicts projected by the person into the test. Allow the subject to create his own answer, thus facilitating the expression of conflicts, needs, and impulses. - The content of the response is interpreted by the test administrator. - Some projective tests are scored more objectively than others. For example, a person who'd recently witnessed a murder might see pools of blood in the image, or a little girl might find a butterfly. Proponents of the projective test believe that the way you interpret the image is a reflection of who you are.
Inter-rater reliability
Inter-rater reliability is a measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions. Inter-rater reliability is useful because human observers will not necessarily interpret answers the same way; raters may disagree as to how well certain responses or material demonstrate knowledge of the construct or skill being assessed. Example: Inter-rater reliability might be employed when different judges are evaluating the degree to which art portfolios meet certain standards. Inter-rater reliability is especially useful when judgments can be considered relatively subjective. Thus, the use of this type of reliability would probably be more likely when evaluating artwork as opposed to math problems.
Rotter Incomplete Sentence Blank
Is similar to word association. Subjects finish incomplete sentences. The Rotter Incomplete Sentence Blank (RISB) The Rotter Incomplete Sentence test is a projective test where you are given a series of incomplete sentences that you are to complete, or finish. By grouping and evaluating the responses an evaluator can make some judgements about your psychological state of mind.
Wechsler Adult Intelligence Scale (WAIS)
Is the most commonly used intelligence test for adults. - Like all of the Wechsler intelligence tests, it is organized by subtests that provide subscales and identify problem areas. - The version in current use is called the WAIS-III (third edition)
Bayley Scales of Infant Development
Not intelligence tests. - They measure the sensory and motor development of infants in order to identify mentally retarded children. The Bayley scales are poor predictors of later intelligence.
Ordinal scale
Observation is ranked in term of size or... Ex. Order of finish in horse race
Parallel forms reliability
Parallel forms reliability is a measure of reliability obtained by administering different versions of an assessment tool (both versions must contain items that probe the same construct, skill, knowledge base, etc.) to the same group of individuals. The scores from the two versions can then be correlated in order to evaluate the consistency of results across alternate versions. Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might create a large set of items that all pertain to critical thinking and then randomly split the questions up into two sets, which would represent the parallel forms.
Sampling Validity (similar to content validity)
Sampling Validity (similar to content validity) ensures that the measure covers the broad range of areas within the concept under study. Not everything can be covered, so items need to be sampled from all of the domains. This may need to be completed using a panel of "experts" to ensure that the content area is adequately sampled. Additionally, a panel can help limit "expert" bias (i.e. a test reflecting what an individual personally feels are the most important or relevant areas). (Sampling : Lấy mẫu, đưa mẫu; thử) Example: When designing an assessment of learning in the theatre department, it would not be sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound, functions of stage managers should all be included. The assessment should reflect the content area in its entirety.