PSY 451- Midterm
Time Sampling
1. Error associated with administering at least two different times Ex: Intelligence test
Settings of the Testing Process- Educational
Achievement test, diagnosis, diagnostic test
Settings of the Testing Process- Geriatric
May look at quality of life, dementia, pseudodementia (severe depression)
Range
Q3-Q1
Psychometric Soundness
Technical quality
Ethical Guidelines
o Body of principles of right, proper good conduct Ex: APA Ethics Code
Computer Test Pro
o Test administer have greater access to potential assess due to the Internet o Scoring an interpretation of the data tend to be quicker than paper pencil test o Cost are typically lower than paper and pencil o Can reach more populations (isolated or those with disabilities)
High Construct validity
.3-.4
Code of Fair Testing Principles in Education
1 Developing/ selecting tests 2. Interpreting scores 3. Striving for fairness 4. Informing test takers
How Can Ethical Issues Be Resolved
1. Describe the problem situation 2. Define the potential legal and ethical issues involved. Review guideline consult other as needed. 3. Evaluate the rights, responsibilities and welfare of all affected parties 4. Consider alternative action and the consequences of each action 5. Make the decision and the responsibility for it. Monitor outcomes.
Internal Consistency
1. Error associated with different set of items within one test a. Ex: How well do items measure concepts being measured
Three Types of Rational Theoretical Approach
1. Intuitively developed by test author 2. Content validation method using the judgement of experts in development and selecting items Ex: including others who identify with other groups for validity 3. Theory based according to a recognized theory of personality or social emotional functioning
Some Assumptions About Psychological Testing and Assessment
1. Psychological traits and states exist (there are internal things we don't typically observe but does exist) 2. Psychological traits and states can be quantified and measured (ex: how we define engagement) 3 Test Related Behavior Non-Tester Related Behavior (Relates to validity- Ex: wanting to understand how a student's engagement can tell us about what is happening in their life~ potential for achievement) 4. Test and Other Measurement Techniques Have Strengths and Weaknesses (often, we need to remember legal and ethical guidelines) 5.Various Sources of Error Are Part of the Assessment Process (reliability~ are we measuring what we want to measure) 6. Testing and Assessment Can Be Conducted in a Fair Unbiased manner (equity lens) 7. Testing and Assessment benefits Society
Process of Assessment
1. Referral for assessment from a source (teacher, school psychologist, counselor, judge, clinicians, corporate human resources specialist) 2. Typically 1+ referral questions used Ex: Can this child function in a general education environment. Is this defendant competent to stand trial. 3. The assessor may meet with assesee or others before final assessment to clarify reasons for referral 4. Assessment 5. Assessor writes a report about findings referring to referral questions 6. More feedback sessions with assesee and/or 3rd parties may be scheduled Assessment approaches
Process of Developing a Test
1. Test conceptualization 2. Test construction 3. Test tryout 4. Item analysis 5.Test revision
Testing People With Disabilities- challenges
1. Transforming the test into a form that can be taken by the test taker 2. Transforming the responses of the test taker so they are scorable 3. Meaningfully interpreting test data
Test Conceptualization- Things to Consider
1. What is the test design to measure? a. How is the construct designed? 2. Is there a need for the test? a. Self-report for students 3. Who will take this test? 4. How will the test be administered? 5. What is the ideal format of this test? 6. Is there any potential harm as the result of an administration of this test? a. Using anonymous names 7. How will meaning be attributed to score on this test?
Three Steps of Split Half Reliability
1.Divide test into two equivalence halves 2. Calculate Pearson's r for each halve 3. Adjust the half test reliability using Spearman's Browns formula
Code of Professional Ethics
A body of guidelines that set forth the standard of code of members of society
Ethics
A body of principles of right, proper or good conduct; contrast with laws
Psychological Test
A device or procedure designed to measure variables related to psychology (such as intelligence, personality, aptitude, interest, attitudes or values)
Ecological Validity
A judgement regarding how well a test measures what it proports to measure at the time and place that the variable being measured is actually emitted
Test
A measuring device or procedure
Interview
A method of gathering info though direct communication involving reciprocal exchange. Differs in length, purpose and nature.
Cumulative Scoring
A method of scoring whereby points or scores accumulated on individual items or subset are tallied and then, the higher the total sum, the higher the individual is presumed to be on the ability, trait or other characteristic being measured; contrast with class scoring and ipsative score
Assent
A participant is willing to do what we want them to do
Generalizable Theory
A person test scores vary from testing to testing because of variables in the testing situation
Group Think
A result of varied focuses that drive decision makers to reach a consensus
Quota System
A selection procedure whereby a fixed number or percentages of applicant with certain characteristic or from certain backgrounds are selected regardless of other factors such as documented ability
Kuder Richard Formula 20
A series of equation designed to estimate the inter item consistency of tests
Criterion Contamination
A state in which a criterion measure is itself based, in whole or a in part, on a predictor measure- When contamination does occur results cannot be taken seriously
Item Response Theory
A system of assumptions about measurement and the extent to which each item measurement that trait
Classical Test Theory (True Score Theory)
A system of assumptions about measurement that includes the notions that a test score is composed of a relatively stable component that actually is what the test or individual items as designed to measure as well as a component that is error
Portfolio
A work sample; referred to as a portfolio assessment when used as a tool in an evaluative or diagnostic process. Has been used for instructor hiring, their portfolio may contain document such as lesson plans, published writings, visual aids.
APA Ethics Code- General Principles
A. beneficence and nonmaleficence B. fidelity and responsibility C. integrity D. justice E. respect for the peoples right and dignity
Item Branching
Ability of the computer to tailor the content and order of presentation test item on the bases of responses to previous items
Measurement
Act of assigning numbers or symbols to characteristics of things
Role Play
Acting as improvised or partially improvised part in a stimulated situation
Test Revision
Action taken to modify a test content or format for the purpose of imposing the test effectiveness as a tool of measurement
Accommodation
Adaptation of a test, procedure, situation or the substitution of one test for another, to make the assessment more suitable for an assesee with exceptional needs
Measurement Error
All factors associated with the process of measuring some variable, other hand, the variable being measured.
Frequency Distribution
All scores listed alongside the number of times each score occurred
Spearman Brown Formula
Allows a test developers or uses to estimate internal consistence reliability from a correlation of two halves of a test Can be used to estimate the effect of the shortening of the test reliability Can be used to determine the number of items needed to obtain a desired level of reliability
Source of Measurement Error- Item Sampling
Alternate, equivalent or parallel form of reliability
Parallel Forms Reliability
An estimate of the extent to which item sampling and other errors have affected test score on version of sampling and other error have affected test scores on version of the same test, the means and variances of observed test scores are unequal
Alternate Forms Reliability
An estimate of the extent to which these different form of the score test have been affected by sampling or other errors
Reliability Coefficient
An index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
Overt behavior
An observable action or the product of an observable action including test or assessment related responses Ex: shy, very shy, not shy
Assessment Center
An organizationally standardized procedure for evaluation of multiple assessment techniques. Testing assessment acknowledged that tests were used only one type of tool used by professional assessors.
Test Developer
An umbrella term for all that goes into the process of creating a test
Trait
Any distinguishable, relatively enduring way in which one individual varies from another (trait is used very broadly and ambiguously when considering the concept)
Panel Interview (Board Interview) Pro
Any idiosyncratic bases of a lone interviews are minimized
Developmental Norms
Any trait, ability, skill or other characteristic that is presumed to develop, deteriorate or otherwise be affected by chronological age, school grade or stage of life
Construct Validity
Appropriateness of making inferences about the construct you are trying to measure based on the test scores from the test you developed
Assessment Process
Assessment is usually individualized. In contrast to testing, assessment more typically focuses on how an individual processes rather than simply the results of that processing
Assessment Skill of Evaluator
Assessment typically requires an educated selection of tools of evaluation, skill in evaluation, and thoughtful organization and integrating of data
Collaborative Psychological Assessment
Assessor and assesee may work as "partners" prom initial contact through final feedback
Naturalistic Observation
Behavioral observation that takes place in a naturally occurring resting for the purpose of evaluation and info gathering
Laws
Body of rules that must be obeyed for the good of society
Varience
Can only use when distributions are approximately normal
Score
Code or summary, statement, usually but not necessarily numerical, it reflects an evaluation of performance on tests, tasks, interviews or other samples of behavior
Assessment Approaches
Collaborative psychological assessment Therapeutic psychological assessment Dynamic assessment
Error
Collective influence of all the facts on a test core or measurement beyond there specifically measured by a test or measurement
Ipasative Scoring
Comparing a test taker score on one scale within a test to another scale within the same test
CAT
Computer adaptive testing, computer has the ability to tailor he test to the test takers ability or test taking patterns
Computer Test
Computer can serve as test administrator and as highly effective test scores
Test Conceptualization
Conceiving an idea for the best fit
Threats to Fairness
Construct irrelevance variance Test content Test context Test response Opportunity to learn
Reference Sources- Online Databases
Contain abstracts of articles, original articles and links to other useful websites
Split Half Reliability
Correlating 2 pairs of scores obtained from equivalent haves of a simple test administered once
Techniques Used to Calculate Measurement Error and Reliability- Alternate, Equivalent or parallel form Reliability
Correlation between equivalent forms of test with different items Ex: Look at scores between 2 PSY 100 exams
Techniques Used to Calculate Measurement Error and Reliability- Test Retest Reliability
Correlation between scores obtain on two occasions
Techniques Used to Calculate Measurement Error and Reliability- Split Half Reliability
Correlation between two halves of a test (Spearman Brown formula) Alpha (Use when test doesn't have right or wrong answers, Ex: Psychological scale)
Spearman's Rho
Correlation coefficient is frequently used when the sample size is small (fewer than 30 pairs of measurement) and when both sets of measurement are ordinal
Validity Coefficient
Correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure
Privileged Info
Data protected by law from disclosure in a legal proceeding; typically, expectation to privilege are also noted in law
Coefficient of Equivalence
Degree of the relationship between various forms of a test can be evaluated by means of an alternate form or parallel form of coefficient of reliability
Homogeneity
Degree to which a test measures a single factor. The more homogenous a test is, the more inter item consistency I can be expected dot have.
Heterogeneity
Degree to which a test measures different factors
Incremental Validity
Degree to which an additional predictor explains something about the criterion measure that is not explained by predictor already used
Discrimination
Degree to which an item differentiates among people with higher or lower levels of what is being measured
User Norms Program Norm
Descriptive statistic based on a group of test takers in a given period of time · Sampling to develop norms
Correlation Referenced Test
Designed to provide an indication of where a test taker stands with respect to some variable or criterion
Alternate Forms
Different version of a test that have been constructed to be parallel
Floor Effect
Diminished ability of an assessment tool for distinguishing test takers at the low end of the ability, trait or other attribute being measured
Celling Effect
Diminished utility of an assessment tool for distinguishing test takers at the high end of the ability, trait or other attribute being measured
State
Distinguished one person from another but are relatively less enduring
Threats to Fairness- opportunity to learn
Do all student engagement test takers have the opportunity to be cognitively and affectively engage given the interaction provided to them in their secondary school they attend
Threats to Fairness- test content
Does it just measure student engagement as defined by the test authors or may or may not represent diverse identities or does it represent student engagement across all potential diverse identities that individual participants might exhibit/identity with. Important that if others do not identify with those diverse identities that they collaborate with people who do, to ensure that both their research identifies or reflects those with diverse identities but also individual recommendations
Parallel Forms
Each form of a test, the mans and variances of observed test scores are equal
Variables of Assessment
Educational Assessment Retrospective Assessment Remote Assessment Ecological momentary Assessment (EMA)
Settings of the Testing Process
Educational, clinical, counseling, geriatric, business, military, gov and organizational credentialing, academic, program evaluation, health psychology
Observer Differences
Error associated with observers judging same behavior differently using some instrument
Item Sampling
Error associated with relation of one set of items from potential items within a domain for inclusion in a test items that could be in an SEI
Test Retest Reliability
Estimate of reliability obtain by correlating pairs of scores from the same people on 2 different administration of the same test
Estimate of Inter item Consistency
Estimate of reliability of a test obtained from a measure of inter item consistency
Confidentiality
Ethical obligation of professionals to keep confidential all communication made or entrusted to them in confidence, although professionals may be compelled to disclose such confidential communication under count order or extortionary conditions, such as when such communications safer to a third party in immediate danger; contrast with privacy right
Alternative Assessment
Evaluative or diagnostic procedure or process that varies from the usual, customary or standardized ways of measurement is desired, either by virtue of some special accommodation mode to the assesee or by means of alternative methods designed to measure the same variables
Correlation
Expression of the degree and direction of a correspondence between 2 things
Meta Analysis
Family of techniques used to statistically combine info across studies to produce single estimates of the data
Case History Data Example
Files or excerpts of files maintained by schools; employers, religious institution. Letter, written correspondence, photos, newspapers, magazine clippings ect.
Summative Scale
Final test score is obtained by summarizing the rating across all items, Likert Scale, method of paired comparisons, categorical scaling, guttman scale, scalogram analysis
Protocol
Form or sheet or booklet on which a test takers response is entered
Format
Form, plan, structure, arrangement and layout of test items Ex: computerized, pencil and paper ect
Psychological Assessment
Gathering and integration of psychology related data for the purpose of making a psychological evaluating that is accomplished through the use of tools such as tests, interviews, case studies, behavioral observation and specially designed apertures and measurement procedures
Settings of the Testing Process- Gov and Organizational Credentialing Example
Government licensing, certification or other credentialing of professionals- passing the bar exam
Normative Sample
Group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test takers
Rating Scale
Grouping of words, statement or symbols on which judgement of the strength of a particular trait, attitude or emotions are indicated by test takers
Active Consent
Having someone need to sign something before doing the assessment
Face Validity
How relevant do the test items appear to be · Cannot be statistically measured Ex: Reading fluency
Content Referenced Testing and Assessment
How scores relate to particular content area or domain
Content Validity
How well a test samples items measuring what it is intended to measure
Criterion- Related Validity
How well can you infer a test takers performance on another test base don the test you were given
Threats to Fairness- test context
Importance of environment and test itself as an environment Ex: directions for student engagement, will direct to all test takers~ reading instructions aloud so it is standardized for all
Testing People With Disabilities
Important to not all problem translate such as questions with artwork for people who are blind
Threats to Fairness- construct irrelevance variance
In a test we are trying to measure a specific construct
Ethics beneficence and nonmaleficence
In professional actions, psychologists seek to regard the welfare and rights of those with womb they interact professionally and other affected persons
Coefficient of Determination r2
Indication of how much variance is shaped by the x and y variables
Construct
Informed, scientific concept developed constructed to describe or explain behavior
Dynamic Assessment
Interactive approach to psychological assessment that usually follows a model of 1. evaluation 2. Intervention of some sort 3. evaluation
Source of Measurement Error- Observer Differences
Interrater, inter scorer, interobserver, inter judge reliability
Panel Interview (Board Interview)
Interview conducted with one interviewee by 1+ interviews at a time
Scalogram Analysis
Item analysis procedure and approach to test development that involves a graphic mapping of a test takers response
Guttman Scale
Items on it range sequentially from weaker to stronger expressions of the attitude, belief or feeling being measured
Comparative Scaling
Judgements of a stimulus in comparison with every other stimulus scale
Notification
Letting people know they are doing the assessment and to reply if they have any question, not asking for active or passive consent
Inference
Logical result or dedication
Concurrent Validity
Looking at the relationship between and our ability to infers from a test to a test administered at the same time Ex: IQ used to help make an inference in achievement test
Testing and Assessment with Communities of Color
Majority of tests standardized, validated and found reliable primary with white middle class English language sample · But historically have still been viewed as objective, culture free and generalized across cultures
Settings of the Testing Process- Business and Military
May use tests, interviews and other tools of assessment
Measures of Central Tendency
Mean, median and mode
Average proportional Distance Methods
Measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores
Criterion Referred Testing and Assessment
Method of evaluation and a way of deriving meaning from test scores by evaluating an individual score with reference to a set standard § Ex: Taking a driver's test
Norms Referenced Testing and Assessment
Method of evaluation and a way of deriving meaning from test scores by evaluating an individual test takers score and comparing it to scores of a group of test takers
Behavioral Observation
Monitoring the actions of other or oneself by visual or electronic means while recording qualitative and/or quantitative info regarding those actions
Ethics Fidelity and responsibility
Must remember professional and scientific responsibilities
Passive Consent
Needing someone to send something saying they are not willing to participate in the assessment
Types of Scale Measurement
Nominal- everything is mutually exclusive and exhaustive Ordinal- rank order- most frequently used in psychology (easier to manipulate stats. than ordinal) Interval- IQ Ratio- Has a true zero
National Norm
Norms derived from a standardized sample that was nationally representative of the population Ex: age, gender, racial/ethnic background, socioeconomic status, geographical background
Extended Scoring Report
Not only provides a listing of scores but statistical data as well
Correlation Coefficient
Number that provides us with an index of strength of the relationship between 2 things
Pearson R
Obtaining an index of the relationship between 2 variables when that relationship is linear and when the 2 correlated variables are continuous
Types of Scaling
Ordinal, nominal, interval ratio Age based Unidimensional, multidimensional Compositive, categorical
Techniques Used to Calculate Measurement Error and Reliability- Inter rater, Inter Scorer, Inter Observer, Inter Judge Reliability
Percent agreement ( Most common, BUT NOT best method, Does not consider level of agreement expected by chance) Kappa (Better method, Actual agreement as a proportion of potential agreement, corrected for change format?, From 1 (perfect agreement) to -1 (less agreement than expected by change alone), 1. >.07= excellent agreement 2. .04-.74= fair good agreement 3. <.4= poor agreement)
Percentiles
Percentage of people whose score on a test or measure falls below a particular raw score § Is a converted score that refers to a percentage of test takers Popular way of organizing all test related data
Informed Consent
Permission to proceed with a (typically) diagnostic, evaluative or therapeutic service on the basic of knowledge about the service and its risks and potential benefits
Types of Kurtosis
Platykurtic --> relatively flat Leptokurtic--> relatively peaked Mesokurtic--> somewhere in the middle
Types of Criterion-Related Validity
Predictive, concurrent
Pilot Work (Pilot study, pilot research)
Preliminary research surrounding the creation of a prototype of the test o Item may be evaluated if they should be used in the final form o Pilot works come from test construction
Standardization (test standardization)
Process of administering a test to a representative sample of test takers for the purpose of establishing norms
Scoring
Process of assigning such evaluative codes or statement to performance on tests, tasks, interviews or other behavioral samples
Validation
Process of gathering and evaluating evidence about validity
Psychological Testing
Process of measuring psychology related variables by means of devices or procedures designed to obtain a sample of behavior
Scaling
Process of setting rules for assigning numbers in measurement
Psychometrists/Psychometrists
Professional who use, analyzes and interprets psychological tests data
Validity
Proportion of the total variance attributed to true variance o The greater proportion of total variance attributed to true various, the more reliable a test is
Psychological Assessment- Test
Psychological tests or other tools of assessment may vary by content, format, administration procedures, scoring, interpretation procedure, technical quality
Ethics Justice
Psychologists exercise reasonable judgement and take precaution to ensure that their potential biases, the boundaries of their competency and the limitations of their exceptive do not lead to or condone unjust practices
Ethics Respect for the peoples Right and Dignity
Psychologists respect the dignity and worst of all people, and the right of individuals privacy, confidentially, and self-determination
CAPA (Computer Assisted Psychological Assessment) Example
Questions interactive
Scaling Methods
Rating scales, summative scale,
Standard Scores
Raw score that has been converted from one scale to another scale, where the latter scale as some arbitrary set mean and SD · Raw scores may be converted to standard scores because standard scores are more easily interpretable then saw scores
Case History Data
Records, transcripts and other accurate in written, pictorial or other forms that perverse archival info, official and informal accounts, other data and stems relevant to an assessee.
Cut Score (cutoff scores)
Reference point, usually numerical derived by judgment and used to divide a set of data into 2+ classifications (Cut scores on tests are usually in combination of other data, are used in many school contexts) Ex: Employees as aids to decision making about personnel hiring, placement and advancement
Ecological momentary Assessment (EMA)
Refers to the "in the moment" evaluation of specific problems and related cognitive and behavioral variables at the very time and place that they occur
Panel Interview (Board Interview) Con
Relates to utility; cost of using multiple interviews may not be justified
Item Bank
Relatively large and easily accessible collection of test questions
Characteristics of Criterion
Relevant, valid, uncontaminated
Case Study (case history)
Report or illustrated account concerning a person or an event that was completed on the basis of case history
Validation Study
Research that entails gathering evidence relevant to how well a test measures what it proports to measure of evaluating the validity or test or other measurement
Item Pool
Reservoir or well from which item will or will not be drawn form the final version of the test
Defining Fairness in Testimony
Responsiveness to individual characteristics and testing contexts, so that test scores will yield valid interpretation for intended users (pg 50)
Z- Score
Results from the conversion of a raw score into a number indication how many SD units the raw score in below or above the mean of the scale distribution
Continuous Scale
Scale used to measure a continuous variable
Discrete Scale
Scale used to measure a discrete variable Ex: mental health, a group of previously hospitalized, a group never hospitalized
Settings of the Testing Process- Counseling Examples
Schools, prisons, government or privately owned institutions
Psychometrics
Science of psychological measurement
Control Processing
Scoring conducted at a central location
Local Processing
Scoring done onsite
Simple Scoring List
Scoring report providing only a listing of score
Domain Sampling Theory
Seek to estimate the extent to which specific sources of variation under define condition contributing to test score
Scale
Set of numbers or symbols properties model empirical properties of the object to which the number are assigned
Distribution
Set of test scores arranged for recording a study
Culture
Socially transmitted behavior patterns, beliefs and products of work of a particular population community or group of people
Systemic Error
Source of error in measuring a variable that is typically consistent or proportionate to what is presumed to be the rue value of the variable being measured
Random Error
Source of excess in measuring a targeted variable caused by unpredictive fluctuation and inconsistences in other variables in the measurement process
Health Psychology
Specialty area of psychology that focuses on understanding the role of psychological variables in the onset, course, treatment, prevention of illness, disease or disability
Source of Measurement Error- Internal Consistency
Split half reliability
Stanine
Standard score derived from a scale with a mean of 5 and SD of approximately 2 Ex: achievement tests, SAT
Linear Transformation
Standard score that retains a direct numerical relationship to the original score
Criterion
Standardized on which a judgement or decision may be based
Standardized Error of Difference
Statistical measure that can aid a test users in determining how large a difference should be before it should be statistically significant
Kurtosis
Steepness of a distribution in its center
Categorical Scaling
Stimulus are placed in one of 2+ alternative categories that differ quantitatively with respect to some continuum
Raw Score
Straightforward, unmodified accounting of performance that is usually numerical Ex: Number of items responded to correctly an achievement test
Types of Standardization Sampling
Stratified random sample Purposive sampling Convenience sampling (incidental sampling)
Fixed Reference Group Scoring System
System of scoring where is the distribution of scores obtained on the test from one group of test takers (fixed reference group) is used as the basis for the calculation of these scores for future administration Ex: SAT
Motivational Interviewing
Targeted change in the interviews thinking and behaviors (THERAPUTIC DIALOUGE THAT COMBINES PRESSOR CENTERED LISTENING SKILLS SUCH AS OPENESS AND THERAPUTIC DIOLOUGE THAT OCMBINES PERSON CONETED LISTENING SKILLS SUCH AS OPENNESS, EMPATHY, WITH THE USE OF COGNITION ALTERING TECHNIQUES DESIGNED TO POSITVILY AFFECT MOTIVATION AND AFFECT THERAPUTIC CHANGE)
Tools of Psychological Assessment
Test Interview Portfolio Case history data Role play test Computer test
Where to go for Authoritative Info: Reference Sources
Test catalogue Test manual Professional book Reference volume Journal article Online data bases Directory of Unpublished Experimental Mental Measures
Sources of Error
Test construction (Item/Content Sampling) Test administration (test environment, test takers variables, examinees related variables) Test scoring interpretation (scoring glitch) Other (Margin or Error)
Teleprocessing
Test data may be sent to and returned from central facility by phone lines, mail or currier
Who are the parties in the Testing Process
Test developers, test users, test taker, society at large, other parties (organizations, companies, gov. agencies)
Polydomous Test item
Test items or questions with 3+ alternate responses, one score is correct or scored as being consistent with target track or construed
Norms
Test performance data of a particular group of test takers that are designed for used as a reference when evaluating or interpreting individual test score
Source of Measurement Error- Time Sampling
Test retest reliability
Group Frequency Distribution
Test score intervals replace the actual test scores
Class Scoring (category scoring)
Test takers response earn credit toward placement in a particular class or category with other test takers whose responses were onward in a similar way
Testing Process
Testing may be individual or group nature. After test administration the tester will typically add up the number of correct answers or the number of certain type of responses... with little if any regard for the how or mechanics of such content
Testing Skill of Evaluator
Testing typically requires technician like skills in terms of administering and scoring a test as well as interpreting a result
Culture Specific Tests
Tests designed for the use with people from one culture but not from another
Settings of the Testing Process- Clinical
Tests employed may include intelligence tests, personality test, neuropsychological tests, other specialized instruments. o Ex: Public, private and military hospitals, inpatient and outpatient clinics, private practice consulting room, schools and other institutions.
Assessment Role of Evaluator
The assessor is key to the process of selecting tests and/or other tools of evaluation as well as in drawing conclusions from the entire evaluation
Error of Variance
The component of a test score attributable to sources other than the trait or ability measured
Privacy Right
The freedom of people to choose the time, circumstances, and extent to which they work to share or withhold from other personal beliefs, opinions and behavior; contract wit confidentiality
Standard of Care
The level at which the average, reasonable and present professional would provide diagnostic or therapeutic serveries under the same or similar condition
Testing Role of Evaluator
The tester is not key to the process; practically speaking, one tester may be substituted for another tester without appreciably affecting the evaluation
Therapeutic Psychological Assessment
Therapeutic self-discovery and new understanding are encouraged throughout the assessment process
Diagnostic Test
Tool of assessment used to help narrow down and identify areas of deficient to be targeted for intervention
Role Play Tests
Tool of assessment where in assesses are directed to act as if they were in a particular situation. May be used in clinical settings to get a baseline and at the end of treatment
Standardized Error of Measurement
Tool used to estimate or infers the extent to which an observed score deviates from the true score
Consultive Report
Type of interpretive report designed to provide excerpt and detailed analysis of test data that mimics the work of an excerpt consultant
Effect Size
Typically expressed as a correlation coefficient · Can be replicated · Conclusions tend to be more precise · More focus on effect size than statistical significance · Promotes evidence base practice o Clinical and research findings
Assessment Objective
Typically to answer a referral question, solve a problem or arrive at a decision through the use of tools of evaluation
Testing Objective
Typically to obtain some gauge, usually numerical in nature, with regard to an ability or attribute
Local Validation Study
Typically undertaken in conjunction with a population different from the population for when the test was originally validated
Assessment Outcome
Typically, assessment entails a logical problem-solving approach that brings to bear many sources of data designed to shed light on a referral question
Testing Outcome
Typically, testing yields a score or scores of test scores
Retrospective Assessment
Use of evaluative tools to draw conclusions about psychological aspects of a persons as they existed at some point in the time prior to the assessment
Educational Assessment
Use of tests and other tools to evaluate abilities and skills relevant to success or failure in a school or pre-school context Ex: Intelligence tests, achievement tests, reading comprehension
Remote Assessment
Use of tools of psychological evaluation to gather data and draw conclusions about a subject who is not in physical proximity to the person or people conducting the evaluation
Utility
Usefulness or practical value that a test or other tools of assessment has for a practical purpose
Predictive Validity
Validity coefficient, incremental validity
True Score
Value that genuinely reflects on individual's ability (a trait) level as measured by a particular test o Domain sampling and generalizable theory
Error Variance
Variance from irrelevant, random sources
True Variance
Variance from true difference
Item Analysis
Various procedures usually statistical, designed to explore how individual test items work as compared to other items in the test and in the content of the whole test; contract with the qualitative item analysis
Issues Regarding Culture and Assessment
Verbal communication(Vocab may change, Translator skill or professionalism, Unintentional hints, Knowing proficiency of the language on the assessment) Nonverbal communication (Cultural norms that may be missed but make a difference to the test takers answers) Standards of Evaluation (preferences someone may have, individualist vs collectivistic culture)
Affirmative Action
Voluntary and mandatory efforts undertaken by federal, state and local government, private employers and schools to combat discriminating and promote equal opportunities for all in education and employment
Right to the Least Stigmatizing Label
When reporting a test
Non-Linear Transformation
When the data under consideration are not normally distributed but compare with normal distribution yet comparisons can be made
Coefficient of Stability
When the interval between testing is greater than 6 months
Predictive Validity
Whether a measure you are using at one point in time can help predict future scores Ex: predicting how students would do at state tests at the end of a year, by giving them a test earlier in the year
Convergent Validity
Will tell you if a test correlates highly in a predictive direction with a test that measures the same or similar construct it will tell you that the scale you are developing is measuring the same thing (converging) as this other measure you are comparing to How does it correlate?
Threats to Fairness- test response
Will test takers be able to understand how to respond to the five-point Likert scale Ex: looking at age to understand a Likert scale starting in grade 6
Rapport
Working relationship between the examiners and examinees
Test Construction
Writing test items (or rewriting or revising existing items) as well as formatting items, setting score rules, and otherwise designs or building the test
Types of Standard Scores
Z score T score Stanine Linear transformation Non-linear transformation
Minimum Competency Testing Program
formal evaluation program in basic skills, such as reading writing and arithmetic, designed to aid in educational decision making the ranges from remediation to graduation
Other Considerations
o A good test is one that trained examinees can administer, score and interpret with a minimum of difficulty o A good test is unequal § Yield additional results that will benefit individual test takers or larger society
Graphic Representation of Correlation
o Bivariate distribution o Scatter gram o Scatter diagram o Scatter plot o Curvilinear~ how curved a graph is
Reference Sources- Professional Book
o Book may shed light on how or why the test may be used for a particular assessment purpose, or administered to members of some special population o May provide useful guidelines for pretest interviews, drawing conclusions, making inferences about data derived from test o May alert to common error made
Reliability
o Consistency of a measuring tool o Psychological tests are consistent to verifying degree
Reference Sources- Journal Articles
o Currant journals may contain review of the test, updated or independent studies of its psychometric soundless or examples of how the instrument was used in either research or applied context o Some journals may specifically focus on matters related to the testing assessment
Grade Norms
o Designed to indicate the average test performance of test takers in a given school grade § Only useful with respect to years and moths of schooling completed Developmental Norms
Reference Sources- Test Manual
o Detailed info of a particular test and technician info relating to it should be found in the test manual o Test publishers typically require documentation of professional training before filling an order for the test manual o Universities often have test manual
Cultural Difference
o Differences between groups explained § As differences and potential strengths and not deficits
Cultural Deficit
o Differences between groups explained § By genetic and biological differences By cultural beliefs values and practice, lack of assimilation to majority culture
Reference Sources- Test Catalogue
o Distributed by published of test o Usually only contains brief description of test and seldom contains detailed technician's info a prospective user may require o Catalogue objective is to sell the test Very few highly critical review is over found in a publishes test catalogue
Evoking Interests in Culture Related Issues
o Ex: immigrants and intelligence test o Often lest out minorities in the sample population Culture Specific Tests (verbal and nonverbal communication, standards of evaluation)
Family Educational Rights and Privacy Act
o Guarantees privacy and confidentiality of educational records (including test results) o Can only be released to school employees with "legitimate educational interest" Ex: Grades when they needed to post grades somewhere before being able to submit electronically
Right to Be Informed of Test Findings
o If test is voided test takers have a right to know o Test takers are entitled to know what the recommendations are being made as a consequence of test data
Age Norms (Age Equivalent Scores)
o Indicate the average performance of different samples of test takers who were at various ages at the time the test was administered § Can be done with physical characteristics like bought or psychological characteristics like intelligence
Test Users Qualifications
o Level A~ tests or aids that can adequately be administered, scored and interpreted with the aid of the manual and general orientation to the kind of institutions or organizations in which one is working § Ex: achieved or proficiency test o Level B~ Test or aids that require some technical knowledge of test construction and use and of supporting psychological and educational fields such as statistics, industrial differences, psychology of adjustment, personally psychology and guidance § Ex: Aptitude, adjustment inventories applicable to normal population o Level C~ tests and aids that require substantial understanding of testing and supporting psychological fields together with supervised experiences in the use of the devices § Ex: projective test, individual intelligence tests
Test Tryout
o Look at data, get feedback o Narrow down items for a final test
Coefficient Alpha
o Men's of all split half correlation Preferred method for obtaining an estimate of internal consistency reliability Ranges from 0 to 1
Norm Referenced vs Criterion Referenced Evaluation
o Norm interpretation of test data, a usual area of focus is how individual performed relative to their people who took the test o Criterion referenced interpretation of test data, a usual area of focus is the test takers performance o Culture and inference
Reference Sources- Reference Volume
o Often contains a lot of details about test related info Ex: publishers, test author, intended population, test administration time
Example of Discrimination from Class
o Ricci vs. DeStefano (2009) § New Haven (CT) Fire Department § Exam to determine eligibility for promotion to lieutenant and captain § No African American and only 1 Hispanic would have been among the 15 promoted based on the exam § Civil Service Board threw out results, did not promote anyone § Frank Ricci, white fire fighter, who would have been promoted based on results, and others, sued New Haven Fire Department § Case eventually went to the U.S. Supreme Court § Argument · Ricci o Should have been promoted because: § Had dyslexia § Studied 13 hr./day § Paid someone to read textbook audio tapes to prepare flashcards · New Haven Fire Department o Was right in the throwing out the test results because: § Desperate impact of test on firefighters of color If they hadn't thrown out results, firefighters of color would have likely sued
Testing and Assessment with Communities of Color- Biases Can Involve
o Tests being more accurate for one group and not another Ex: Tests designed by for white middle class for SEI and white middle class students do better o One group scoring higher than another on a test designed to predict outcome on which groups are equal o Interpreting those differences as to cultural deficit
Computer Test Con
o Verification of identity o Refers in more general term o May have unstructured access to mater and other tools despite guidelines for test administrators
Major Issues with CAPA
§ Access to test administration, scoring and interpretation · Computerized tests are easily copied and duplicated § Compatibility of pen-paper and computerized versions of test § Value of computerized test interpretation § Unprofessional, unregulated, "psychological" testing online
Competency, based on MacCat-T
§ Being able to evidence a choice as weather one wants to participate or not § Demonstrating factual knowledge of the issue § Being able to season about the facts of a study, treatment or whatever it is to which consent is sought
Confidentiality Vs. Privilege
§ Confidentiality concerns matter of communication outside the courtroom § Privilege protects clients from disclosure to judicial proceedings
Deception
§ Do not use deception unless absolute necessary § Do not use deception at all if it will cause participants emotional distress Fully debrief participants
Written Form of Consent Specifies
§ General purpose of testing § Specific reason it is being undertakers in present care § General types of instruments to be administered
Restriction of Inflation of Range
§ If variance of either variable in a correlation analysis in restricted structured by the sampling procedure used, resulting correlation coefficient tends to be lower § If the variance of either variable in a correlational analysis is inflated by the sampling procedure, resulting correlation coefficient tends to be higher
Disparate Treatment
§ Practice intentionally designed to result in discriminatory outcome § Possibility due to social prejudice or desire to maintain status quo
Disparate Impact
§ Practice unintentionally results in discriminatory outcome Not viewed as stemming from planning or intent
Ethics Integrity
§ Psychologist seek to promote accuracy, honesty and tactfulness in the science, teachery and practice of psychology § Psychologists always minimize or avoid harm and if harm does occur, they need to correct it
Item format
§ Selected response format · Select a response from alternative response § Construct response format · Test takers supply a create the correct answers § Multiple choice § Matching item § True-false item/binary item § Completion item (fill in the blank) § Short answers items § Essay item
Testing and Assessment Benefits Society
· Allows for merit of a person's hard work to do good on an assessment to be lets say longer, rather than nepotism · Can help identity educational difficulties
Factor Analysis Approach
· Approach relies on factor analyzes and other related methods to sort and arrange individual test items into clusters or scales that are mathematically related or have specific properties · Test developers typically start by using rational theoretical approach to develop item first
Potential harm as the Result of an Administration of a Test
· Discrimination · Informed consent( Make sure the person giving permission is well educated about the purpose) · Privacy and confidentiality (Only share data with those who have an educated interest)
Various Sources of Error Are Part of the Assessment Process
· Error refers to a long-standing assumptions that factors other than what a test attempts to measure will influence performance on test o Test scores are always subject to questions about the degree to which the measurement process includes error
Discriminant Validity
· Is your scale not measuring what it is not supposed to measure o Do not want a correlation o The test you are giving should not tell you anything about this other test that has a completely different construct
SEI
· Model used by school psychologists · Broaden the idea that testing is more than just assessment · SEIà student engagement instrument · A school psychologist might use this to measure a student's engagement by looking at the student by observing, seeing how instruction looks for them, layout of classes ect. o Also look for student engagement~ opportunities for students to speak · Testing the learner is only one part of a global process used that involves multiple forms of assessment · Assessment is a greater process while testing is only one component
How are Assessment Conducted
· Responsible test users have obligations before during and after any measurement procedures is administered · Al appropriate materials and procedures must be collected prior to the assessment · Test users have the responsibility to make sure the sooner they use is suitable for testing in
APA Ethics Code- Intro and Applicability
· The Preamble and General Principles are aspirations goals to guide psychologists toward the highest ideals of psychology · Most ethical standards are written broadly in order to apply to psychologist varied roles · This ethics code applies only to psychologists' activities that are apart of their scientific, educational or professional roles · APA may implore sanctions on its members for violations of the standards of ethics code, including termination of APA membership, and may notify other bodies and individuals of actions
Concerns of the Public
· The concern of the use of psychological testing first came after WWI due to articles tested "the abuse of tests" · A year after the Russians releasing sputnik a satellite the U.S. studied assessing testing ability and aptitude to identity gifted and academically talented students which lead to another sprouting of public talk and showed in magazine article · Assessment has been affected in numerous and important wages by activities of the legislative, executive and judicial branches of the federal and state government
Test Related Behavior Non-Tester Related Behavior
· The objective of the test is to provide some indication of other aspects of the examinee behavior · Test related behavior can be used to aid in understanding of behavior that has already taken place
Legislation
· The public has also been quick to judge the utility of a test by calling it unfair and discriminant loading to group hiring and leading to quotas · State and federal legislator, executive bodies and counts have been involved in many aspects of testing and assessment. There has been little consensus about whether validated test on which there are social differences can be used to assist with employment related decision · Rule 702 allowed more experts in count to testify regarding the admissibility of the original expert testimony. Beyond expert testimony indicating that some research method or technique enjoyed general acceptance in that field other experts were not allowed to testify and present their opinion with regard to the admissibility of evidence · Daubert case gave trial judges a great deal of leeway in deciding what juices would be allowed to hear · General Elective Co. V. Joins (1997), court emphasized that the trial court had a duty to exclude unreliable expert testimony as evidence · Kumbo Tire Company V. Carmichael (1999) expended on Daubert to include all expert including psychologist
APA Ethics Code- Preamble
· This ethics code is intended to provide specific standards to cover most situations covered by psychologists · It has as its goal the welfare and protection of the individuals and groups with whom psychologists works