Psy 603
Be able to recognize examples of nominal, ordinal, interval, and ratio scaled variables.
1. Nominal Scale. Nominal variables (also called categorical variables) can be placed into categories. They don't have a numeric value and so cannot be added, subtracted, divided or multiplied. They also have no order; if they appear to have an order then you probably have ordinal variables instead. 2. Ordinal Scale. The ordinal scale contains things that you can place in order. For example, hottest to coldest, lightest to heaviest, richest to poorest. Basically, if you can rank data by 1st, 2nd, 3rd place (and so on), then you have data that's on an ordinal scale. 3. Interval Scale. An interval scale has ordered numbers with meaningful divisions. Temperature is on the interval scale: a difference of 10 degrees between 90 and 100 means the same as 10 degrees between 150 and 160. Compare that to high school ranking (which is ordinal), where the difference between 1st and 2nd might be .01 and between 10th and 11th .5. If you have meaningful divisions, you have something on the interval scale. 4. Ratio Scale. The ratio scale is exactly the same as the interval scale with one major difference: zero is meaningful. For example, a height of zero is meaningful (it means you don't exist). Compare that to a temperature of zero, which while it exists, it doesn't mean anything in particular (although admittedly, in the Celsius scale it's the freezing point for water).
What it means when a test is referred to as "standardized"
A standardized test is a test that is given in a very consistent manner; meaning that the questions on the test are all the same, the time given is the same, and the way in which the test is scored is the same for all participants.
Ways in which couple's assessment is different/unique from individual assessment
Commonalities: Methods should be empirically linked to functionally related targeted problems and constructs; Methods have evidence of reliability, validity, cost-effectiveness; Findings can be linked within a conceptual framework of presumed causes of distress and to prevention/intervention strategies What is unique about couple's assessment: Focus is on relationship processes and interactions between individuals; Allows for direct observations of target complaints related to communication and other interpersonal exchanges; Must be sensitive to potential challenges in establishing a collaborative alliance in a conjoint context
Difference between norm-referenced and criterion-referenced testing*
Criterion-referenced test results are often based on the number of correct answers provided by students, and scores might be expressed as a percentage of the total possible number of correct answers. On a norm-referenced exam, however, the score would reflect how many more or fewer correct answers a student gave in comparison to other students. Hypothetically, if all the students who took a norm-referenced test performed poorly, the least-poor results would rank students in the highest percentile. Similarly, if all students performed extraordinarily well, the least-strong performance would rank students in the lowest percentile. It should be noted that norm-referenced tests cannot measure the learning achievement or progress of an entire group of students, but only the relative performance of individuals within a group. For this reason, criterion-referenced tests are used to measure whole-group performance.
Face validity
Face validity is simply whether the test appears (at face value) to measure what it claims to. This is the least sophisticated measure of validity. A test item such as 'I have recently thought of killing myself' has obvious face validity as an item measuring suicidal cognitions, and may be useful when measuring symptoms of depression
What are important factors to keep in mind when introducing an assessment measure to a client (how can your approach to this task make it more likely the client will agree to complete assessment measures and do so in an open, honest manner)?
Gender role association for example some problems, such as depression, can manifest differently in men, thus disguising the disorder and leading to underdiagnosis or misdiagnosis In addition, different screening or assessment settings (e.g., prisons, outpatient programs, primary care offices) influence whether and how clients present their struggles. Culture also plays a role; some nonmainstream cultures may be reluctant to share information about difficulties or illnesses. Counselors must be sensitive to these nuances and create an environment in which clients feel open to sharing their vulnerabilities or perceived shortcomings especially in an assessment. Counselors need to also take in mind how a client may perceive taking an assessment especially if the client has test anxiety.
Goal Attainment Scaling (what this is; what kind of assessment this is)
Goal attainment scaling (GAS) is a therapeutic method that refers to the development of a written follow-up guide between the client and the counselor used for monitoring client progress. Three steps in developing and testing a GAS: 1. Goal selection and scaling 2. Random assignment of the patient to one of the treatment modalities 3. A follow-up of each patient with regard to the goals and scale values chosen at intake A specific goal is selected on a composed scale that ranges from least to most favorable outcomes. At least two points on the scale should have sufficiently precise and objective descriptions so that anyone could understand the client's status. The points are assigned numerical values (-2 for the least favorable outcome, 0 for the most likely treatment outcome, and +2 for the most favorable treatment outcome). Thus, this scale has a mean value of zero and a standard deviation of one. Each scale is specific to the individual, and the defined points are indirectly related to mental health goals. GAS can therefore be individualized, yet universal in its meaningfulness. Communication is enabled through specificity and the well-defined nature of the measure.As an evaluation method, GAS has many uses. GAS can be used to compare treatments or to simply evaluate treatment effectiveness with one client. GAS is used to scale treatment goals, and then their level of attainment is measured. It is a valid individualized treatment outcome and program evaluation measure. Further, GAS is an easy, low cost, evaluation technique. As many treatments will incorporate several goals, GAS can be used to track multiple goals. The goals can be prioritized and differentially weighted to reflect treatment objectives. This goal-oriented measurement tool creates specific operational indicators of progress and can focus case planning and treatment. This often results in better outcomes. GAS results in specific goal attainment indicators, making effectiveness readily apparent. It also promotes positive perceptions of progress towards a goal, which further aids in goal attainment. GAS combines behavioral definitions, mutually defining goals, clear expectations, and continuous evaluation to improve client outcomes and effectively measure change.
Know what a "derived score" refers to.
In psychometrics, a score obtained by applying a mathematical transformation to a raw score. Common forms of derived scores are IQ scores, stanine scores, sten scores, T scores, and z scores.
MSI-R: Be familiar with general interpretation principles such as higher scores mean greater problems (except for the Role Orientation scale). What are some advantages of this measure (why is it good to use)?
Includes one global scale and several other scales for assessing areas of specific concern. Multidimensional nature of scale allows for identification of couple's strengths as well as areas for growth. Easy for clients to use. Multidimensional self-report of marital interaction.
Inter-rater reliability
Inter-rater reliability: Many behavioral measures involve significant judgment on the part of an observer or a rater. Inter-rater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students' social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student's level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers' ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura's Bobo doll study. In this case, the observers' ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach's α when the judgments are quantitative or an analogous statistic called Cohen's κ (the Greek letter kappa) when they are categorical.
Internal consistency (split-half; coefficient alpha)
Internal consistency, which is the consistency of people's responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people's scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people's responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants' bets were consistently high or low across trials. Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a split-half correlation. This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined.
Marital Conventionalization Scale: what does this measure; why is this important in the field of marital self-report inventories
It measures an apparent social desirability bias in marital quality measurement at the individual level. Marital conventionalization is defined as the extent to which a person distorts the appraisal of his marriage in the direction of social desirability. Since the measurement of the variable is direct, the major focus is upon establishing content validity for a short scale of marital conventionalization. Since the results indicate that marital conventionalization is both extensive and intensive, it is necessary to control for its effect in any study of highly ego-involved areas, particularly the area of marital adjustment.
Constructed response vs. multiple choice formats for tests
MC formats are useful for testing cognitive knowledge especially at higher levels. They are most efficient for use with large groups of examinees because the time spent in preparing tests items less than the time required to read and score CR items after the test, because MCs can be easily and rapidly computer scored. MCs are most efficient for testing large knowledge domains broadly. A major advantage of CR formats is they that they are easy to construct and clients can answer from their own perspective. However, they are unsuitable for measuring complex intelligence/skill outcomes and are often difficult to score. Completion and short answer tests are sometimes called objective tests and there is no variability in scoring. A major disadvantage of extended response items is the difficulty in reliable scoring. Various examiners can score the same response differently - a variety of steps need to be taken to improve the reliability and validity of scoring.
Idiographic vs. nomothetic assessment
Nomothetic approach involves establishing laws or generalizations that apply to all people. Laws can be categorized into three kinds: (1) Classifying people into groups (such as the DSMIV for classifying people with mood disorders); (2) Establishing principles (Such as the behaviorist laws of learning), and (3) Establishing dimensions (such as Eysenck's personality inventory which allows for comparisons between people). This approach typically uses scientific methods such as experiments and observations to obtain quantitative data. Group averages are statistically analyzed to create predictions about people in general. Strengths Regarded as scientific as it is: precise measurement; prediction and control of behavior; investigations of large groups; objective and controlled methods allowing replication and generalization. Has helped psychology as a whole become scientific by developing laws and theories which can be empirically tested. Limitations Predictions can be made about groups but these may not apply to individuals. Approach has been accused of losing sight of the 'whole person'. The idiographic approach tends to include qualitative data, investigating individuals in a personal and detailed way. Methods of research include: case study, unstructured interviews, self-reports, autobiographies and personal documents. Strengths A major strength of the idiographic approach is its focus on the individual. Gordon Allport argues that it is only by knowing the person as a person that we can predict what the person will do in any given situation. Findings can serve as a source of ideas or hypotheses for later study. Limitations The idiographic approach is very time consuming. It takes a lot of time and money to study individuals in depth. If a researcher is using the nomothetic approach once a questionnaire, psychometric test or experiment has been designed data can be collected relatively quickly.
Difference between norm-referenced* and criterion-referenced testing.
Norm-referenced refers to standardized tests that are designed to compare and rank test takers in relation to one another. Norm-referenced tests report whether test takers performed better or worse than a hypothetical average student, which is determined by comparing scores against the performance results of a statistically selected group of test takers, typically of the same age or grade level, who have already taken the exam. Calculating norm-referenced scores is called the "norming process," and the comparison group is known as the "norming group." Norming groups typically comprise only a small subset of previous test takers, not all or even most previous test takers. Test developers use a variety of statistical methods to select norming groups, interpret raw scores, and determine performance levels. Norm-referenced scores are generally reported as a percentage or percentile ranking. For example, a student who scores in the seventieth percentile performed as well or better than seventy percent of other test takers of the same age or grade level, and thirty percent of students performed better (as determined by norming-group scores). Norm-referenced tests often use a multiple-choice format, though some include open-ended, short-answer questions. They are usually based on some form of national standards, not locally determined standards or curricula. IQ tests are among the most well-known norm-referenced tests, as are developmental-screening tests, which are used to identify learning disabilities in young children or determine eligibility for special-education services. A few major norm-referenced tests include the California Achievement Test, Iowa Test of Basic Skills, Stanford Achievement Test, and TerraNova.
Predictive validity
Predictive validity is the degree to which a test accurately predicts a criterion that will occur in the future. For example, a prediction may be made on the basis of a new intelligence test, that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out then the test has predictive validity.
Multi-trait, multi-method matrix (know what this is and what it gives you information about; be able in a sample matrix to identify monotrait-heteromethod, heterotrait-monomethod, etc correlation coefficients; know differences between what these types of correlation coefficients mean).
SEE NOTES The multitrait-multimethod (MTMM) matrix is an approach to examining construct validity. It organizes convergent and discriminant validity evidence for comparison of how a measure relates to other measures. Multiple traits are used in this approach to examine (a) similar or (b) dissimilar traits (constructs), as to establish convergent and discriminant validity between traits. Similarly, multiple methods are used in this approach to examine the differential effects (or lack thereof) caused by method specific variance. There are six major considerations when examining a construct's validity through the MTMM matrix, which are as follows: 1. Evaluation of convergent validity - Tests designed to measure the same construct should correlate highly amongst themselves. 2. Evaluation of discriminant (divergent) validity - The construct being measured by a test should not correlate highly with different constructs. 3. Trait-method unit- Each task or test used in measuring a construct is considered a trait-method unit; in that the variance contained in the measure is part trait, and part method. Generally, researchers desire low method specific variance and high trait variance. 4. Multitrait-multimethod - More than one trait and more than one method must be used to establish (a) discriminant validity and (b) the relative contributions of the trait or method specific variance. This tenet is consistent with the ideas proposed in Platt's concept of Strong inference (1964). 5. Truly different methodology - When using multiple methods, one must consider how different the actual measures are. For instance, delivering two self-report measures are not truly different measures; whereas using an interview scale or a psychosomatic reading would be. 6. Trait characteristics - Traits should be different enough to be distinct, but similar enough to be worth examining in the MTMM.
Test-retest reliability
Test-retest: When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent. Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at test-retest correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson's r.
MSI-R:
Widely used to assess the nature and extent of conflict within a marriage or relationship, the MSI-R helps couples communicate hard-to-express feelings, providing an easy, economical way to gather information about a broad range of issues. Because the items refer to "partner" and "relationship" rather than "spouse" and "marriage," the test is useful with both traditional and nontraditional couples.
Ways to select items for a test: rational approach (1/3)
a) Rational scales: Theory guides item selection; Usefulness depends upon soundness of theory. This approach is a reasonable place to start but inadequate by itself. It encompasses three important strengths—the simplicity of the approach, the transparency of measures, and the intuitive appeal of test results—that also entail serious limitations. For example, there are no mechanisms to identify poorly functioning items, to evaluate whether scales or subscales include homogeneous items, to iteratively improve the psychometric properties of score reliability and validity, or to prevent the proliferation of measures of constructs that are differently named but that actually represent a common, core construct.
Ways to select items for a test: empirical keying approach (2/3)
b) Empirical or criterion-keyed approach: Item selection based on ability to statistically differentiate between groups; Item's meaning may not be transparent to test taker; May not understand why groups differ on certain items. Examining relationships among item responses is a factor analytic approach to test construction. Factor analysis is a statistical method for exploring and confirming the dimensionality of a set of item responses. Items that are strongly and positively correlated with one another can be estimated to share a common cause, referred to as an unobserved factor. Another empirical approach to test construction resembles norm referencing, in that items are chosen as being representative of the construct when examinees with known characteristics respond to them in consistent ways. This is referred to as the criterion-group approach to test construction. A criterion group of examinees is purposively selected because they are known to represent the construct in some way. The main drawbacks to the both of these empirical approaches to test construction are: (a) they require a large and representative sample of the test population, and can lead to biased results when the examinee sample is not representative; and (b) they may lead to test content that lacks face validity and that contradicts both logic and theory.
Measurement bias, accessibility, universal design
o A test may be biased if: -- Content or construction of items gives an unfair advantage to one group over another -- Differential item functioning -- Test doesn't allow equal access to construct being measured -- Formatting, mode of test administration, examiner personality factors favor one group over another -- The test is inappropriately applied -- Linguistic: If a test has been translated, has this been done accurately to ensure test items have the same meaning in translated and original test? -- Conceptual: Do constructs assessed by the test have the same meaning across cultures? -- Metric: Does the test have similar psychometric properties across different groups/cultures? Accessibility refers to the extent to which all individuals are given an equal chance to demonstrate their capabilities or level of functioning during the assessment process. Ideally, we want tests that are highly accessible. And to the extent that we have that, we would say that that test is probably following principles of universal design where we are, in universal design, essentially trying to put ourselves in the shoes of the test-taker and think through, what are different barriers that could exist in the way that a test is constructed that would make it challenging for some individuals to demonstrate their full capabilities, their full knowledge on a particular test?
Recommendations for an approach to selecting a battery of tests for a couple from Snyder, Heyman, and Haynes
o Assessment should be parsimonious o Integrate findings from multiple assessment methods o Consider psychometric properties and limitation of measures chosen o Assessment should be ongoing (Snyder, Heyman, & Haynes, 2009) o Screen individuals for couple distress; screen couples for individual distress o Progress from broad to narrow focus in assessment o Include areas with well-established connections to overall relationship difficulties (e.g., communication processes)
Challenges in test translation
o Challenges of test translation -- The "psychotechnical" nature of test directions -- Underlying psychological constructs may not be universal across cultures -- Back translation procedures not always followed -- Examinee test taking behaviors and orientations to test directions/procedures can vary across cultures
Different types of validity (Content: face, construct. Criterion: concurrent, predictive)
o Content-related: appropriate content -- Face validity: does the test appear to test what it aims to test? -- Construct validity: does the test relate to underlying theoretical concepts? o Criterion-related: relationship to other measures -- Concurrent validity: does the relate to a existing similar measure? -- Predictive validity: does the test predict later performance on a related criterion?
Considerations in choosing a test: be able to list and explain at least 2-3 of these
o Factors pertaining to the test: Is the test reliable? • Is there evidence that the test is valid for assessing the trait or construct it purports to measure? • Is the client likely to cooperate with the test procedures (e.g., does the test have face validity)? • Is the test nonreactive? o Factors pertaining to the client: Does the client possess the required reading level for the test? • Does the client have any physical or other limitations that might make completion of the test difficult (e.g., visual impairment, easily fatigued)? • Is the client likely to misreport symptoms or experiences (i.e., under- or overreport)? Has sufficient rapport been established with the client? • How closely does the client match (e.g., ethnicity, age, education level) of individuals in the normative sample? o Factors pertaining to the test administrator: Are you familiar with the administration, scoring, and interpretation of the test? • Is specialized training required? • Does the test require the administrator to adhere to/understand a particular theoretical basis? • Is the test feasible to give (time, efficiency considerations)?
Defining features of psychological tests
o Five main characteristics of a good psychological test are as follows: 1. Objectivity 2. Reliability 3. Validity 4. Norms 5. Practicability 1. Objectivity: The test should be free from subjective—judgement regarding the ability, skill, knowledge, trait or potentiality to be measured and evaluated. 2. Reliability: This refers to the extent to which they obtained results are consistent or reliable. When the test is administered on the same sample for more than once with a reasonable gap of time, a reliable test will yield same scores. It means the test is trustworthy. There are many methods of testing reliability of a test. 3. Validity: It refers to extent to which the test measures what it intends to measure. For example, when an intelligent test is developed to assess the level of intelligence, it should assess the intelligence of the person, not other factors. Validity explains us whether the test fulfils the objective of its development. There are many methods to assess validity of a test. 4. Norms: Norms refer to the average performance of a representative sample on a given test. It gives a picture of average standard of a particular sample in a particular aspect. Norms are the standard scores, developed by the person who develops test. The future users of the test can compare their scores with norms to know the level of their sample. 5. Practicability: The test must be practicable in- time required for completion, the length, number of items or questions, scoring, etc. The test should not be too lengthy and difficult to answer as well as scoring.
Know the general guidelines for what kinds of assessment LMFTs and LPCCs can do
o In California, MFTs can administer psychological tests so long as: They have received adequate training in the instruments they plan to use and are competent in their use. The tests are used for the purpose of assessing and treating one's own clients—MFTs cannot hire out their services to test people who are not their clients. The broader practice of psychological testing is done by psychologists (e.g., comprehensive personality, neurocognitive assessment). Basically both: "may administer and interpret such tests as long as they have received the appropriate training, and thus, are qualified to perform such procedures."
Ethical considerations in testing (know what some of these are and be able to briefly describe/define them)
o Know what's required in the setting in which you work o Only evaluate in a professional context o Understand the impact of rapport, expectation, emotional state on test results o Obtain informed consent o Protect the security of test materials o Know your test instruments o Know when to refer o Be careful in your write-up o Provide feedback
Special challenges in interviewing couples and families
o More people—more observations, but less time o How do you define a couple/family (and who do you interview)? o Consider groups and individual interviews o Maintaining balance; triangulation o Potential for more overt conflict o Countertransference issues o Dealing with "secrets" and requests for individual time
Test equivalence (linguistic, conceptual, metric)
o The Conceptual Equivalence of a questionnaire indicates that an item measures the same concept in all languages into which this questionnaire has been translated. o The linguistic equivalence is achieved if the target language (in specific linguistic medium) carries the same intended meaning or message that the source language carries. o Metric equivalence exists "when the psychometric properties of two (or more) cultural groups exhibit essentially the same coherence or structure"
Dyadic Adjustment Scale [DAS] (know general issues: structure of measure, strengths, relation to LWMAT)
o The DAS is a 32-item measure developed to measure dyadic adjustment, defined as "... a process, the outcome of which is determined by the degree of: (1) troublesome dyadic differences; (2) interpersonal tensions and personal anxiety; (3) dyadic satisfaction; (4) dyadic cohesion; and (5) consensus on matters of importance to dyadic functioning" (Spanier, 1976) o The 32 items are scored on a 6-point Likert scale and are summed to create a total score ranging from 0 to 151, with higher scores indicating more positive dyadic adjustment. There are four subscales: Dyadic Consensus (13 items; the degree to which the couple agrees on matters of importance to the relationship), Dyadic Satisfaction (10 items; the degree to which the couple is satisfied with their relationship), Dyadic Cohesion (5 items; the degree of closeness and shared activities experienced by the couple), and Affective Expression (4 items; the degree of demonstrations of affection and sexual relationships). o Takes 5-10 minutes to complete; can be adapted for interview format o Can be easily scored via computer or QuickScore Forms o Higher scores—better adjustment; raw scores on subscales and total scores converted to Tscores o Scores can be compared to married or divorced normative samples o No scales to assess response set o Widely used in research studies o Good evidence of reliability o Validity: able to differentiate between distressed/nondistressed couples; divorced vs. married couples
Locke-Wallace Marital Adjustment Test [LWMAT] (know general issues: e.g., one of first marital self-report inventories, easy to administer, some items are outdated)
o The Locke & Wallace Marital Adjustment Test (MAT) measures marital satisfaction, which is realized when "the mates feel satisfied with the marriage and each other, develop common interests and activities and feel that marriage is fulfilling their expectations" (Locke, 1951, p. 45). The scale focuses on issues such as involvement in joint activities, demonstration of affection, frequency of marital complaints, level of loneliness and well-being, and partner agreement on significant issues. A score of 100 is the dividing point between distressed and non-distressed individuals. The average score for distressed couples is 72 and the average score for non-distressed individuals is 136. o Designed to assess levels of satisfaction and accommodation of marital partners o 15-item scale, takes 10 minutes to complete o Ninth grade reading level o Cut score of 100 typically used; >= 100 is satisfied <= 85 indicates significant distress
Advantages and disadvantages of unstructured interviewing (2/2)
o Unstructured: These are sometimes referred to as 'discovery interviews' & are more like a 'guided conservation' than a strict structured interview. They are sometimes called informal interviews. An interview schedule might not be used, and even if one is used, they will contain open-ended questions that can be asked in any order. Some questions might be added / missed as the Interview progresses. Strengths 1. Unstructured interviews are more flexible as questions can be adapted and changed depending on the respondents' answers. The interview can deviate from the interview schedule. 2. Unstructured interviews generate qualitative data through the use of open questions. This allows the respondent to talk in some depth, choosing their own words. This helps the researcher develop a real sense of a person's understanding of a situation. 3. They also have increased validity because it gives the interviewer the opportunity to probe for a deeper understanding, ask for clarification & allow the interviewee to steer the direction of the interview etc. Limitations 1. It can be time consuming to conduct an unstructured interview and analyze the qualitative data (using methods such as thematic analysis). 2. Employing and training interviewers is expensive, and not as cheap as collecting data via questionnaires. For example, certain skills may be needed by the interviewer. These include the ability to establish rapport & knowing when to probe.
Construct validity
Construct validity refers to the extent to which a test captures a specific theoretical construct or trait, and it overlaps with some of the other aspects of validity.
Back translation (know what this is)
Refers to the process of translating a statement or paragraph back to its original language. BACK-TRANSLATION: "A person would do back-translation if he or she wanted to know what the message in the original language said prior to translation."
Definition of reliability
Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).
Advantages and disadvantages of self-report vs. observational method of assessment (insider vs. outside perspectives)
Self-Report: Advantages: Convenient, Usually inexpensive, Can compare results to a normative sample, Can capture client's attributions/cognitions, Client may be more willing to share information compared to face-to-face interaction Disadvantages: Perceptions may be inaccurate/biased/distorted, Clinician must reconcile inconsistencies in self-reports, Provide limited information on fine-grained details of moment-to-moment interactions among a couple or family members Observational: Advantages: Provides information about actual interchanges among a couple or family members, Can help provide empirical evidence to support theories of family interaction patterns Disadvantages: Generally requires use of recording equipment (cameras, audio recording devices), Coding systems to evaluate interactions can be complex, time-consuming, difficult to learn, Questions of ecological validity of interaction tasks (i.e., do family members' behaviors generalize outside of task?)
Know how to calculate a z-score if provided with normative data; be able to translate this score into a %ile rank using the z-score to %ile conversion table. (Don't forget to bring a calculator to the exam)
Tells you how far and in what direction the individual's score is from the mean in SD units (X - M)/SD X = client score M = normative sample mean SD = normative sample standard deviation
Areas typically covered in an assessment interview
o Demographics o Presenting problem(s) o Past psychiatric history (including family history) o Medical history (including family history) o Educational and work history (including military service) o Social history (family relationships, friendships)
Be able to list 3 reasons to conduct assessments
o Educational placement/vocational evaluation o Identify client strengths o Identify differences in perceptions, expectations, etc. among members of families/couples o Understand systemic contributions to individual problems (e.g., substance use, depression, etc.) o Self-knowledge • Research
Special challenges in assessment of families and couples (be able to list at least one issue)
o Family members getting sidetracked in session o Family members reluctant to talk about issues in front of other members
Elements of a mental status examination
o General appearance and behavior o Psychomotor activity o Attitude towards examiner o Orientation (person, place, time)/ consciousness o Emotions (mood and affect) o Perceptual disturbances o Speech and thought processes o Cognition (e.g., memory, intelligence) o Insight and judgment
MSI-R: Know the general areas assessed by this measure (so that you can recognize in a question what domains are and are not assessed by the instrument)
o Validity scales: -- Inconsistency (INC) -- Conventionalization (CNV) o Global distress scale o Communication and conflict scales: -- Problem solving communication (PSC) -- Affective communication (AFC) -- Aggression (AGG) -- Time together (TTO) -- Disagreement about finances (FIN) -- Sexual dissatisfaction (SEX) -- Role orientation (ROR) -- Family history of distress (FAM) o Child-related scales: -- Dissatisfaction with children (DSC) -- Conflict over child rearing (CCR)
What does it mean to make an assessment parsimonious?
Parsimony is a guiding principle that suggests that all things being equal, you should prefer the simplest possible explanation for a phenomenon or the simplest possible solution to a problem.
Advantages and disadvantages of structured interviewing (1/2)
o Structured: Consists of specific set of questions uses to determine if a person meets the criteria for a particular condition. The questions are asked in a set / standardized order and the interviewer will not deviate from the interview schedule or probe beyond the answers received (so they are not flexible). Strengths 1. Structured interviews are easy to replicate as a fixed set of closed questions are used, which are easy to quantify - this means it is easy to test for reliability. 2. Structured interviews are fairly quick to conduct which means that many interviews can take place within a short amount of time. This means a large sample can be obtained resulting in the findings being representative and having the ability to be generalized to a large population. Limitations 1. Structure interviews are not flexible. This means new questions cannot be asked impromptu (i.e. during the interview) as an interview schedule must be followed. 2. The answers from structured interviews lack detail as only closed questions are asked which generates quantitative data. This means a research will won't know why a person behaves in a certain way.
Know how validity scales on a test are different from the concept of test validity
A validity scale, in psychological testing, is a scale used in an attempt to measure reliability of responses, for example with the goal of detecting defensiveness, malingering, or careless or random responding. Scales are a manifestation of latent constructs; they measure behaviors, attitudes, and hypothetical scenarios we expect to exist as a result of our theoretical understanding of the world, but cannot assess directly. Scales are typically used to capture a behavior, a feeling, or an action that cannot be captured in a single variable or item. The use of multiple items to measure an underlying latent construct can additionally account for, and isolate, item-specific measurement error, which leads to more accurate research findings.
Concurrent validity
Concurrent validity is the degree to which a test corresponds to an external criterion that is known concurrently (i.e. occurring at the same time). If the new test is validated by a comparison with a currently existing criterion, we have concurrent validity. Very often, a new IQ or personality test might be compared with an older but similar test known to have good validity already.
MSI-R: Is there evidence that this measure has cross-cultural applications (e.g., with different ethnic groups, with same sex couples)?
Cross-cultural adaptations of the MSI-R can be helpful for building upon the research base of original measure. Approach assumes constructs assessed by the MSI-R have similar relevance/applicability to different cultural groups. There is also the challenge of ensuring linguistic equivalence of adapted measures. MSI-R has been translated in different languages including Spanish, German, Korean, and Russian. It has also been studied with diverse couple types including same sex couples, interracial couples, and religious couples. In some cases, additional scales may be needed to assess areas of potential conflict important for particular cultures. Example: conflict with in-laws scale on Korean MSI-R
Internal validity
Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other factor. In-other-words there is a causal relationship between the independent and dependent variable. External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity) and over time (historical validity).
What are normative data? What are characteristics that a normative sample should typically have?
Normative data is data from a reference population that establishes a baseline distribution for a score or measurement, and against which the score or measurement can be compared. Normative data is typically obtained from a large, randomly selected representative sample from the wider population. They can be used to easily transform individual scores or measurements directly into standardized z-scores, T scores, or quantiles. Examples of psychological tests that make use of normative data in scoring include the Wechsler Adult Intelligence Scale (WAIS), the Wechsler Intelligence Scale for Children (WISC), and the Vineland Adaptive Behavior Scales. Normative data can also incorporate additional variables such as age and gender, when these variables are expected to have significant effects on the distribution of measurements; head-circumference-for-age, height-for-age, and weight-for-age.
Alternate or parallel forms (reliability)
Parallel/alternative are different versions of a test which are designed to be equivalent. Parallel forms reliability measures the correlation between two tests. If you want to use multiple different versions of a test (for example, to avoid respondents repeating the same answers from memory), you first need to make sure that all the sets of questions or measurements give reliable results. In educational assessment, it is often necessary to create different versions of tests to ensure that students don't have access to the questions in advance. Parallel forms reliability means that, if the same students take two different versions of a reading comprehension test, they should get similar results in both tests. The most common way to measure parallel forms reliability is to produce a large set of questions to evaluate the same thing, then divide these randomly into two question sets. The same group of respondents answer both sets, and you calculate the correlation between the results. High correlation between the two indicates high parallel forms reliability.
How is assessment different from testing?
Test and assessment are used interchangeably, but they do mean something different. A test is a "product" that measures a particular behavior or set of objectives. Meanwhile assessment is seen as a procedure instead of a product. Assessment is used during and after the instruction has taken place.
Definition of validity
Validity refers to a test's ability to measure what it is supposed to measure.
Ways to select items for a test: analytical approach (3/3)
c) Analytic approach: Begin with theory to guide initial item selection; Administer test to a large sample; Conduct a factor analysis to determine which items cluster together; Determine if items loading on factors make sense according to theory
Areas to inquire into to help ensure you are conducting a culturally competent assessment
o psychological assessment is made culturally sensitive through a continuing and open-ended series of substantive and methodological insertions and adaptations, designed to mesh the process of assessment and evaluation with the cultural characteristics of the group being studied: -- To what degree does the test attend to issues of diversity? -- How do attitudes/beliefs/values affect test session behaviors? -- How do your values, theoretical assumptions, cultural biases, and countertransference issues affect the assessment process? -- Have you accessed resources that will help you appropriately select, administer, and evaluate test results? -- How do social issues (acculturation, SES stressors, discrimination, socialization experiences, intrafamilial conflicts regarding cultural identity, etc.) impact understanding of assessment findings? -- Have you considered etic and emic perspectives? -- Language proficiency -- Cultural/ethnic identity and acculturation -- Cultural explanations/idioms used to explain distress/symptoms -- Sources of stress/support related to culture -- Culturally relevant resources/adjunctive treatments -- How do cultural/ethnic differences impact client-clinician relationship?