Selection
Barrett et al (2010)
(Special jobs) -Selection in public sector police & fire - Testing process can occur at two distinct times: first is for entry level or initial entry into the organization; second is for promotional testing - Adding personality test to cognitive ability test does not eliminate adverse impact -To decrease AI, increase time limits, reduce reading load, use job sample test, etc. -Practitioners designing selection/promotion tests for public sectors face a number of legal challenges. AI is likely to always be an issue since many common tests have significant subgroup differences. Personality tests shouldn't be used as a means to reduce AI as it's likely to result in other subgroup differences. Instead, perform item analyzes, reduce reading demands, increase time limits, include memory tests, and use job sample tests. The most common allegation by plaintiffs is that there is an alternative selection procedure which is equally valid and has less AI. *Also in Intro IO*
Sackett et al (2001)
(Affirmative Action) -REMOVED *Describe the diversity-validity dilemma & review research on different strategies for addressing it (promoting diversity and validity at the same time). They recommend (1) Assessing the full range of KSAs with a test format that minimizes verbal content (as much as possible while maintaining job relatedness) (2) Maximizing face validity, but conclude that AI is unlikely to be eliminated as long as you're assessing domain-relevant constructs that are cognitively loaded* - Also evaluate: Use of noncognitive constructs (unlikely to influence AI if also use cognitive), Remove culturally laden/high DIF items (doesn't make an impact & effect on validity unknown); Minimize reading/writing requirements with computer media (can't tell if it works because of construct or method change); Enhancing motivation & test prep (might not help but doesn't hurt); More generous time limits (may widen gap)
Plyburn et al (2008)
(Affirmative action)-Removed- Review of law/court decisions on preferential treatment *Title VII*-Only allowed to remedy "manifest imbalances in traditionally segregated jobs." - Must be limited in extent (can't "trammel" rights of others) and in time (to eliminate underrepresentation, but not to maintain proportional representation) *Strict scrutiny*: Public sector AA must (1) Further a "compelling government interest" (2) Not "trammel" the interests of others - SC hasn't clearly defined (a) "sufficient compelling interest" to justify public sector AA (b) how to demonstrate that plans have been sufficiently customized and do not violate the interests of the other group (in both the public and private sectors)
Aguinis & Smith (2007)
(Bias/Fairness) - Propose a framework that allows you to consider 4 important issues simultaneously/allow them to inform selection procedure decisions: (1) Test validity (correlation coefficient) (2) Bias (assessed w/ regression) (3) Selection errors (false pos/neg) (4) Adverse Impact - Most conceptualizations of bias consider predictive selection errors (imperfect regression predictions), but bias can also come in the form of *bias-based selection errors*, or false positives and negatives that arise from the use of a biased test as if it were unbiased - CRA (1991)- The use of differential selection cutoff scores and group-based regression lines is unlawful (read: orgs must use the same regression equation and same cutoffs w all applicants regardless of group membership
Aguinis, Culpepper, & Pierce (2010)
(Bias/Fairness) - Response to common belief that intercept-based bias (favoring minorities) exist, but not slope - In a simulation, they showed that nearly all previous tests of slope bias had insufficient power to detect a difference in slopes. - There are slope but not intercept differences. - Researchers are more likely to find performance overpredicted for minority group members when (a) mean minority group<mean majority group test score & (b) low reliability test measures -The accepted procedure to assess test bias is itself biased.
Kuncel & Hezlett (2010)
(Bias/Fairness) Evaluates 5 common beliefs about cognitive ability tests: 1) Other characteristics (eg personality) matter/predict success in life. Prediction of performance using CA tests can be improved by using measures of personality, values, interests, & habits (as long as they're valid). 2) Cognitive ability tests predict JP, leadership effectiveness, & creativity 3) Do not demonstrate predictive bias- a test is NOT biased if it accurately reflects a capability difference between groups and if the nature of the relationship between capability and performance is similar for all groups. 4) SES variables do not eliminate the predictive power of tests--does NOT explain the relationship between test scores & subsequent performance. 5) No predictive threshold
Hough (2010)
(BioData & Experience) - *BioData*- Information about an individual's past behavior or experience - Valid predictors, but validity depends on study design (concurrent higher), sample studied (incumbent higher), & type of criterion (*better predictor of training criteria than JP*) - If well-developed, VG does exist - Presents strategies for item generation & scoring (functional job analysis; *Mumford & Stokes Item generation*- identify life or job experiences necessary for dev of KSAOs IDed in JA)
Stokes & Cooper (2004)
(BioData & Experience) - BioData successfully predict (r= .3-.4), but sometimes lacks incremental validity in combo w/ GMA and personality (depending on content). More incremental validity over interviews. - Best for jobs that don't use extensive selection processes or when traditional measures (e.g., cognitive ability, personality) have been unsuccessful -Less AI than cognitive ability - Orgs need a good understanding of performance criteria to develop effectively - Response distortion continues to be an issue of concern.
Levine et al (2004)
(BioData & Experience) Review of experience/education - Experience & education--> knowledge & task proficiency-->JP - Experiences most likely to be useful=(1) Past accomplishments (2) Jobs held (3) Task proficiency (4) Experience-based KSAs - Education experience: (1) Graduate education (2) Degree (3) Specific courses (4) GPA extracurricular activities - Online administration & standardization will improve validity
Murphy (2009)
(Content validity) - Even though content-valid tests will "almost certainly" show criterion-related validity, so will tests that have not been content-validated for that job. In fact, non-content valid tests may even be more predictive. - While content validity is useful for many things (face validity, legal defensibility, transparency etc.) it should be considered a test construction method, rather than a validity strategy.
Binning & Barrett (1989)
(Content validity) -Validation=process of accumulating various forms of judgmental & empirical evidence to support the inferences we make based on tests - Unitary concept- construct, content, and criterion-related validity all sources of evidence - Predictor sampling strategy= construct (evoke psych construct domain) +content (evoke JP domain) vs. Criterion = assesses quality of predictor sampling strategy. - You need content or construct validity to build your case for inference 9 (the link between predictor test scores and job performance domain) because they allow you to sample from your predictor. - Criterion-based validity without content validity is dust bowl empiricism
McDaniel, Kepes, & Banks (2011a)
(Legal) - UG are out of touch with modern science and outdated & should be revised - Based on situational specificity hypothesis and 4/5rule - Don't allow meta-analysis, or VG, but should - Unrealistic requirements for local criterion-related/content validity studies, which most companies cannot comply with - AI is not a sign that test is flawed. Probably accurately measuring subgoup differences - When AI found, UG requires search for equally valid measure with less AI, but this is impossible because of Validity and diversity tradeoff- they're opposing goals
Ployhart & Holtz (2008)
(Legal) * -Interview about addressing Diversity-validity dilemma -Most effective strategies : (1) Alternative predictor measurement methods (interviews, assessment centers), (2) Assessing entire range of KSAs (3) Banding w/ minority pref (4) Minimizing verbal ability requirements of predictor. - Only effective strategy w/o validity tradeoff= assess full range of KSAs. Actually enhances validity.
Guttman & Dunleavy (2012)
(Legal) * MIGHT BE WRONG- - Pattern & Practice article - Justice O'Connor proposed a change in the adverse impact scenario formed in Griggs v. Duke Power (1971) such that: (1) plaintiffs must identify a cause(s) of adverse impact; (2) prove the causal relationship statistically significant (3) for the defendant to articulate a legitimate nondiscriminatory reason to support the identified causes. -You can have both disparate impact and disparate treatment in the same case
Landy (2008)
(Legal) **Also in Org Psych* -Stereotyping research might have little relevance to workplace. Characteristics of work decisions vary substantially from the characteristics of stereotyping research. Research needs to start incorporating these characteristics (individuating info.). - Interviews help eliminate discrimination by providing Individuating Information. - Meta-analysis is a problem because it aggregates data from a long time ago with modern data, and the work place has changed substantially over that period of time - IAT is faulty (unreliable, low validity for predicting behavior, individuating info mitigates implicit bias)
Arthur & Doverspike (2005)
(Legal) -Removed - Increase diversity/decrease discrimination through HR practices (recruitment, selection, training, performance management) - Best option within Selection practices = use valid job-related tests and *maximize face validity*. - Audit access to training and compensation strategies - Use multiple PA raters & frame of reference training to increase accuracy - Increase perceived fit to recruit minorities
Tippins (2009)
(UIT) - Panel agreed that UIT alone never acceptable in high-stakes testing, that when UIT is used, verification testing is needed, and that it's important to consider the nature of the test in making the decision to use UIT (Non-cognitive tests have fewer cheating opportunities) - *Pros*: Minimize costs, no need to travel to test, easier/less costly maint. Of testing systems, consistency in test administration, faster staffing process - *Cons*: Main problem = cheating (difficult to identify), verification tests don't provide unequivocal evidence of cheating, cheating reduces validity, inconsistent testing conditions, identity verification - *Ethical issues*: (1) Not being able to ensure the reliability and validity of the test since the psychologist was not able to monitor its administration, (2) Highest ethical concern is that UIT allows for cheating
Arthur et al. (2010)
(UIT) -REMOVED -UIT speeded cog test did not exhibit signs of cheating- Use of speeded tests (if in line with job requirements) one of the means of alleviating malfeasance concerns with UIT ability tests (but this doesn't do much about surrogate test takers) - UIT personality measures display response distortions similar to those that have been observed in proctored settings (you distort your response regardless of proctor) -*If proctored vs. unproctored testing affects validity, the effect is extremely small* - Even under conditions where it is very intuitive to expect widespread malfeasant behavior, we don't see it.
Reynolds & Rupp (2010)
(UIT) Technology-facilitated assessment, computer adaptive testing *Pros*- High fidelity, Technology availability (ease of use, global application, internet speed and cost), business efficiency, better insight about people, strategic perspective and impact (organizations monitor data trends, optimize job selection procs) - *Applicable guidelines?* APA Task Force report on Internet-based testing--> consider test security, identity protection (require ID confirmation) and equality of access across populations of applicants. Emphasis should still be on validity. - *Issues:* (1) Assessment equivalence, (2) Appropriate deployment conditions (protect info passed over internet and consider test-taker identity), (3) Cultural adaptation, (4) Data security and privacy
Mullins & Rogers (2008)
(Utility) - A lot of what we know about selection and predictors does not apply to academic positions because of range restriction of cognitive ability/job knowledge, greater emphasis on fit, etc. - We rely a lot on fit and intuition in academia - More research is needed on how to predict success in academic positions -In universities, faculty are often hired based on subjective judgments and not based on actual cognitive ability tests, and this seems acceptable. Different contexts should be considered.
Ployhart & Moliterno (2011)
(Utility) - Multilevel model of human capital emergence - Individual KSAs and unit-level HC resources both need to be considered- not just aggregates of each other, literatures from one do not necessarily generalize to the other - What allows individual KSAs to be conntected to HC resources is the joint interaction of the task environment and the unit members behaviors, cognitions, and affect. The more interactive and supportive the environment, the more likely emergence will occur - When human capital emergence occurs, organizations can leverage it as a resource in a way that gives them a competitive advantage
Highhouse (2008)
(Utility) - Selection is probabilistic and subject to error - Employers still want to rely on intuition and think that it's possible to predict JP perfectly -Believe their experience allows them to predict behavior - results in an over-reliance on intuition and a reluctance to undermine one's own credibility by using a selection decision aid - Obvious remedy to the limitations of expertise is to structure expert intuition and mechanically combine it with other decision aids (e.g., paper-pencil measures)
Nowack & Mashihi (2012)
*360 FB - Most motivated to use FB = (1) Conscientious (2) Achievement oriented (3) Extraverted (4) High self efficacy (5) Internal locus of control (6) LGO (7) Low anxiety - Leverage 360FB by managing initial neg rxns and directing awareness caused by FB to goal implementation
Buster et al. (2005)
(Min Quals) - UG require validation of Minimum qualifications, but not much literature on how to do this. Authors present a framework that held up in the courts/was judged as being consistent with UG. 1) Develop tentative MQ statements (Select SME sample- incumbents, supervisors, exclude probationary EES or those with unsatis JP, include minorities & EEs from different areas; Meet with SMEs & provide list of qualifying KSAs, individually generate potential MQs, analysts monitor) 2) Develop/admin MQ questionnaire (bracket potential MQs, collect infividual ratings of level of MQs necessary, linkage to KSAs, supplemental info) 3) Select MQs based on SME judgement of level (appropriate? Not too high/low?), SME agreement of appropriateness, link to KSAs, possible AI, SME's supplemental responses (licenses, coursework, substitutions etc) - Education and experience MQs were associated with less adverse impact than task-based MQs. Overall, process may produce less AI than traditional methods.
Arthur et al (2006)
(PO Fit) -PO Fit predicts attitudes and turnover. - Only predicts JP through attitudes, but the direct PO-JP relationship is minimal - P-O fit may not be appropriate for hiring/selection decisions, particularly without local validity evidence - May be more appropriate for posthire decisions (eg placement) -A meta-analysis found that P-O fit is not a good predictor of job performance. It is a better predictor of turnover. The relationship between P-O fit and job performance/turnover is partially mediated by attitudes.
Oswald & Hough (2008)
(Personality) - Be more proactive about preventing rather than detecting faking on personality tests--> (1) Explain test to applicant in terms of fit; (2) provide warning against faking; (3) examine test formats (forced choice) - Need to consider role of context---> Validities differ across situations
Tett & Christiansen (2007)
(Personality) - Pro-personality test reply to Morgenson et al (2007a)- Validity evidence is strong enough to warrant use of personality tests even in the face of faking. Weak correlations due to use of overly broad personality measures, failure to consider the situation, use confirmatory research, and failure to look at multivariate measures. Must use narrow, multivariate, measures based on theory, consider the context/situational cues, conduct confirmatory research - *Trait Activation Theory* suggests that measures should be situationally specific. More predictive if we consider situational cues - Faking weakens validity and isn't a good thing (there is no evidence that faking demonstrates social competence). Not everyone fakes, and faking does influence rank order
Oswald & Hough (2011)
(Personality) - FFM use may have faulty theoretical basis. The HEXACO is slightly better but both models combine facets into factors that are less useful than the individual facets themselves - Personality measures are generally valid because personality>goals>motivational states>JP - Small to moderate subgroup differences (A^ with age, Men more dominant, women dependable) - Use proactive approaches to prevent faking - Alternative strategies to prevent faking = (1) forced choice, (2) conditional reasoning, (3) 3rd party ratings
White et al (2008)
(Personality) - Personality tests have high utility in predicting military performance, even when validities are low -*Faking can lead to highly inflated test scores that have little to no criterion validity* - Faking & counterfaking research doesn't generalize to high-stakes testing environments--It is difficult to simulate the pressures of high stakes testing and the results of faking experiments can underestimate the score inflation observed in real settings - Be wary of generalizing between predictive & concurrent research design, & research vs. operational settings (i.e., military) - Need methodological research on how to transition personality tests to operational use. Context matters.
Ones et al (2007)
(Personality) Pro-personality test reply to Morgenson et al (2007a) that (1) Personality tests are valid (r=.27), (2) Not ruined by faking (doesn't lower validity) (3) Better prediction when *CWBs, OCBs, interpersonal behaviors of getting along, teamwork included* (4) Compound measures of personality (integrity tests, customer service scales, etc.) are useful above and beyond GMA (5) Consider specific facets of personality factors (6) *Self + observer ratings* (likely produces validities that are comparable to the most valid selection measures) *A lot of research supports the use of personality testing for selection purposes. There is little evidence that response distortion ruins validity.*
Morgenson et al (2007b)
(Personality) Reply to replies. Personality tests are likely relevant for understanding work behavior (research context), but aren't good predictors of job performance and should not be used in high stakes decisions (NOT in selection) - When *predictive* validation studies are conducted with actual job *applicants *where independent criterion measures are collected, *observed (>>)* uncorrected validity is very low and often close to zero - Must use observed/uncorrected validity b/c orgs don't correct - *Reactions to personality tests are among the least favorable* The use of personality testing for selection purposes is still criticized because the observed validities are consistently low, the value of personality tests for selection should be determined by relating it to job performance, not other criteria, and the observed, uncorrected validities are important.
Hough & Oswald (2008)
(Personality) Review of personality research (1) *Look beyond FFM*- doesn't include all relevant pers variables, lose predictive ability by looking at broad constructs, facets better left separate; (2) *Maintain construct-method distinction*- meta-analysis is a problem; interviews can measure multiple constructs; need more research separating effects of personality from self-report (examine validity of non-self report pers measures) (3) *Examine situation* as moderator- validities are higher when the measures are based on a job analysis and the item content is contextualized; validities vary between context (4) *Incremental validity over GMA* (5) *Low group diffs & AI* (6) *Faking*: Hard to fix, associated with social desirability, lab studies overestimate prevalence, SD doesn't weaken validity
Morgenson et al (2007a)
(Personality)5 former journal editors comment on the use of personality testing for selection. Conclude that faking is common on personality tests because the right answer is apparent to applicants. *Faking on self-report personality tests cannot be avoided and perhaps is not the issue (might not influence validity, might be good/indicative of social awareness); the issue is very low validity of personality tests for predicting job performance.* - Might add some incremental validity to a battery of cognitive tests, but they should never replace them -Don't use off-the-shelf published measures. If you use personality, use customized measures that are clearly job-related and face valid - Maybe try bogus items or forced choice - Corrections for faking do not improve validity
Gebhardt & Baker (2010)
(Physical ability testing) - Overview of physical ability tests- Different types of physical ability can be tested (muscular strength, endurance, aerobic, anaerobic, flexibility, equilibrium, coordination) - Historically valid (content and criterion) - Differential rates for women and men (typically result in AI for women) - Physical tests are seen as a means to reduce injuries, which in turn reduces work time and productivity lost - *Make plans for testing with legal issues in mind: offer practice tests, pre-employment fitness programs, use compensatory scoring, regress-out body fat and muscle mass to ID only relevant variance*
Ployhart & MacKenzie (2011)
(SJTs) - Big review shows SJTs (multidimensional measurement methods): (1) Moderate criterion-related validities (.2-.23) (2) Face valid (3) Content valid (rely on critical incident technique) (4) Tend to exhibit smaller subgroup differences than GMA (but higher than personality) - Subgroup diffs: White>Blacks (d=.40), Hispanics (d=.37), Asians (d=.47). Women>Men, may depend on cognitive loading and may get bigger after controlling for internal consistency & unreliability - Overview of Development (Generate stems & options w/ Critical incidents or theoretical basis) - Consider stem complexity (inf subgroup diffs), fidelity (MMSJTs), response instructions, job content/complexity - Complex (more detail in..) stems= higher reading requirements --> subgroup diffs - Higher fidelity (eg multimedia SJTs)--> Increased face validity and slightly reduced subgroup diffs - Nearly all literature based on incumbent samples
McDaniel et al (2007)
(SJTs) -Response instructions influence the constructs measured by SJTs. Knowledge *Knowledge instructions* (Eval effectiveness of possible responses) more strongly correlated with cognitive/knowledge constructs *Behavioral tendency*(How would you behave?) more strongly related to personality constructs - Both valid, but knowledge has higher validity- response instructions had little moderating effect on criterion-related validity (SJT r = .20) - Situational Judgment Tests have incremental validity over cognitive ability, the Big 5, and over a composite of cognitive ability and the Big 5 - Knowledge may be more susceptible to response distortion
Kehoe & Olson (2005)
(Cutoffs) - Describes treatment of cutoff scores in employment in litigation and compares legal considerations to features of common methods for setting cutoff scores. - Cut score framework: (1) *Purpose* (2) *Relevance* (3) *Threshold* level of desired work behavior/corresponding test score (Min qual vs. business relevance interpretations) (4) *Certainty desired* (ability to distinguish between candidates who are likely to perform well from those who aren't vs. exclusion of candidates who would succeed OTJ (5) Tradeoffs between expected level of work vs. other interests (AI vs. employment process costs) - Court rulings highlight several principles likely to apply in Title VII Cases: (1) Demonstrate cut score related to business value (e.g. upgrade workforce) (2) May need to show relevance to minimum qualification standard (especially in licensure cases) (3) Involve experts (4) Can set higher than some incumbents' scores demonstrate that they're below JP stds (5) Use professional expertise if you want to adjust according to emplmt relevant factors (cost, fairness, goals etc) (6)Post hoc reductions of cut scores to v avoid AI not likely to be supported w/o rationale other than litigation avoidance. - Different strategies for setting will depend on the context/availability of work & test info: *SEDiff Banding*- Norm-referenced method generally accepted by courts where bandwidts defined as 2x SEDiff; *Criterion banding*- Define bandwidths in terms of practical indifference; candidates within band expected to perform similarly once hired - Cut scores cannot be validated, but we can provide evidence that people with the target score generally perform well by demonstrating JP associated w/ cut score, demonstrating JR w/ WMT content judments
Cascio et al. (1988)
(Cutoffs) - Cutoffs set somewhat arbitrarily *Guidelines for setting cutoff:* (1) Courts/law do not specify "best" method (2) Cutoff should be consistent with JA results (also include level of KSA needed to perform effectively in JA to inform cutoff setting) (3) How test used affects selection and meaning of a cutoff score (eg MH) (4) Provide as much evidence as possible (SEs of measurement, validity, rationale for cut score) to justify decisions (5) Set high enough to ensure that minimum standards of the job are met; too low to avoid AI->destroys test credibility, should permit selection of qualified applicants (6) Cutoff should be consistent with normal expectations of acceptable proficiency within the workforce (UG) - Ideally (reliable/accurate pred & criterion applicant data available)base a norm-referenced cutoff score on base rate (#current successful EEs) - When possible, just use top-down selection & avoid setting cutoff - Overview of methods: Norm-referenced (Method of predictive yield); Expert judgment (Angoff, Ebel, Nedelsky); Utility analysis; Contrasting groups - Angoff well-received in courts *In setting cutoffs, the best that can be hoped for is that the basis be defined clearly and the definition be rational*
O'Boyle et al (2010)
(EI) - Of the 3 streams, measuring EI as an ability accounted for the largest variance in JP-- only beat by conscientiousness and GMA-- May be a good option for selection. Also more objective and less fakable. - EI related to personality-positively related to E, O, A, C, and cognitive ability, and neg. rel. to N - When you measure Emotional Intelligence as an ability, it may overlap with cognitive ability (probably shouldn't include in the same battery--My note), and can be viewed as an intelligence test, rather than personality. (My note-- might have implications for AI)
Schmidt et al (2007)
(GMA) - Reply to critiques of Le et al (2007) - Utility equation does NOT require assumption of normality, but DOES require linear predictor-JP rel & that applicants are selected in a top-down fashion - Any other strategy (non top down) for selection (eg random selection above a cut score) will result in reduced utility (utility equation will overestimate econ gains) -GMA is more valid than years of education - Validity estimates do not change, even when we correct for range restriction - GMA isn't the only useful tool, but the most valid one - Meta-analysis is useful
Le et al. (2007)
(GMA) "Meta-analysis is awesome" - Meta-analysis is an important/useful tool that can help unite science and practice. It allows for the comparison of predictive power of different selection methods and for translating conflicting results from single studies into conclusive and credible answers - Offers measurable value to orgs-Used in utility analyses to determine the economic value of a selection method. - Best prediction of JP & Utility (medium complexity jobs) in order: (1) combination, (2) general mental ability, (3) structured interviews, (4) personality *Practitioners are often reluctant to use good tools from I/O research because of inconsistencies in the research and concerns about the generalizability of the results. Meta-analysis is a useful tool for both researchers & practitioners because it yields generalized results which can yield large gains in selection settings & can be used to calculate utility. It's also accepted by courts.*
Biddle (2010)
(General validity) *Validity Generalization* - Uses meta-analysis to combines the results of several similar research studies to form general theories about relationships between similar variables across different situations - Local criterion-related validity studies are more legally defensible. Fundamental disconnect between goals of VG and Title VII. - Only use VG as a supplement. Unlikely to hold up against Uniform Guidelines and Title VII. - *6th Circuit EEOC v. Atlas Paper Case*- VG is at odds with Griggs v. Duke Power and Title VII at its core
Arthur & Villado (2008)
(General validity) - Research needs to maintain construct and method distinction. Failure to do so (e.g., meta-analysis and validity generalization) results in inaccurate/ inconsistent/misleading findings, uninterpretable research (e.g., sub group differences) - One of the reasons why claims that things like work samples and assessment centers have less AI than cognitive ability tests may be suspect. Research that supports this claim often fails to disentangle whether these effects occur as a result of a construct change or a method change
Cohen, Aamodt, & Dunleavy (2010)
(Legal) - Summary of "best practices" from committee AI experts (no one completely agreed with every point) - Multiple methods of adverse impact detection should be used. The more used, the more confident analysts, lawyers, and decision makers can be in judging the meaningfulness of a disparity. -Minimize redundancy/combine only methods that each add unique information. -*Bottom-line analysis* = whether whole selection produces AI -*Step/component analysis*- AI from a component; Usually necessary if BL analysis reveals AI/someone claims that a component is discriminatory - Most decisions (type of AI analysis, whether to aggregate, whether AI detection is legally actionable, etc.) depends on the context and the nature of the hiring process - Distinction between job seekers and applicants. Only applicants should be calculated in AI analyses. - Only count multiple applicants once - 4/5ths rule poorly accepted bc poor psychometric properties
Van Iddekinge & Ployhart (2008)
(General validity) - Review of criterion - related validity literature (Validity coefficient correction, Multiple predictor, Differential prediction analyses, Validation sample characteristics, Criterion issues) -Sample validity coefficients can differ from those in the population because of statistical artifacts (e.g., Indirect & direct RR). Can be corrected, but only if we don't violate assumptions (e.g., can't have indirect RR, homoschedasticity, must have linear predictor-criterion relationship) - For multiple predictors, make sure making unique contribution - Assertions that there is no differential prediction for cognitive ability testing may be unfounded-- Research may be biased because of statistical artifacts (power, RR, subgroups sample size, predictor-subgroup correlations) - Validation study results depend on study design (predictive vs. concurrent), sample (incumbents? repeat applicants?) - Selection moving towards expanding criterion domain (contextual performance, dynamic criteria) and debates about broad vs narrow criteria (best predicted by matching-ly broad or narrow predictor)
Schmidt & Hunter (1998)
(General validity) - We want predictors that are (1) Valid, (2) Provide high incremental validity, & (3) High utility - Advocate for use of GMA because (1) Highest validity (.51) & lowest application cost (2) Strong research foundation for validity (3) Stronger research basis for definition and development of the construct (what we're actually measuring) (4) Best available predictor of job related learning (.56) - GMA+Integrity (.65), GMA + Structured interview (.63) GMA+Work Sample (.63). Integrity combo better bc low correlation with GMA. - Situational specificity hypothesis has been disproven - Work samples had a higher validity (.54) compared to GMA but have much higher application costs. Structured interviews validity = .51 (but higher cost)
Huffcut & Culbertson (2011)
(Interviews) - Overview of interviews - When structured effectively, interviews are more reliable and reach validity levels comparable with GMA - Critical incident technique vs. develop questions directly from KSAs -Construct validity is an issue - Interviews are good predictors, but it isn't always clear the degree to which it captures specific job related KSAOs vs. general constructs - Impression management and interviewer biases can influence ratings - (1) Structure interview based on KSAOs and critical incidents,(2)Train interviewer, (3) Use behavioral and situational questions
Shippman (2010)
(JA Test creation) *Competencies*=Measurable, org relevant, behaviorally-based capabilities; more specific & behavioral than KSAs *Competency modeling*= ID broadly applicable individual capabilities - CM allows for clear link to business strategy and a greater emphasis on org change
Morgeson & Dierdorff (2011)
(JA Test creation) *Work analysis*-Investigation of work role requirements & broader context in which roles are enacted; broader than job analysis, focus on roles rather than job/tasks/responsibilities - Less static conceptualization of jobs, considers broad org context - Encompasses competency modeling - Sources of variance = : (1)Rater (cognitive ability, personality, work experience, performance level of workers) (2) Social & cognitive influences (conformity pressures, extremity shifts, motivation loss, impression management, social desirability, demand effects, information overload, heuristics, categorization) (3) Contextual influences
Doverspike & Arthur (2012)
(JA Test creation) - Test development 1st step=JA, produces *job narrative* (MWBs & Tasks, KSAOs, MQs, Linkage w/ work behaviors) 1) Review info (Professional/scientific lit, JDs, training) 2) 1st JD draft (all but linkage) & get SME FB 3) Interviews (Semi-structured; review/develop/refine statements), On-site visit/obs; 2nd draft dist for SME FB 4) Create linkage questionnaires (Task importance & frequency, Task-KSAO linkages, Need upon entry) 5) Distribute/analyze questionnaire (all incumbents, include supervisors, identifying info); use data for final JD 6) ID/define constructs to be measured (lit review, JD, combos of KSAOs) 7) Identify measurement methods (consider reliability, validity, usability, cost, practicality, subgroup differences; Local or off the shelf?; Determine weights) 8) Reverse linkage questionnaires (validity evidence linking test & JA info; Rating extent item/test captures KSAO/Task statement) 9) Distribute RLQs to SMEs (consider security)/analyze reverse linkage questionnaires 10) Final test specification plan (link test to tasks/KSAOs, review items) 11) Determine cutoffs (Angoff, questionnaire w/reverse linkage Q) & weights (unit-weights or based on importance ratings) Common JA Mistakes = Analyzing person rather than job; Too little detail (need both tasks & KSAs); Test security - Also covers item development *Also in Intro/Ethics*
Barrett, Miguel, & Doverspike (2011)
(Legal) - Agree with McDaniel et al (2011)- UG inconsistent w/ sci/prac & AI shouldn't trigger legal action, but UG isn't the problem. UG never intended to be scientific (just a framework for companies). Main problem=disparate impact (DI) theory (result of Civil Rights Act)/not UG. -Likely to find DI b/c: (1) Numerous stat. procedures & (2) Subgroup differences/unequal distribution of KSAs in population - DI cases: (1) Plaintiff must show evidence of DI w. 80%/stat sig test. (2) Defendant demonstrates test validity (3) Even if defendant succeeds, plaintiff just needs to find an equally valid alternative with less adverse impact
McDaniel, Kepes, & Banks (2011b)
(Legal) - Group diffs in KSAs exist, will continue to exist, and need to be addressed -Best way to reduce mean racial differences in JP=high-validity selection procedures without consideration of race - Acknowledge that changing UG wouldn't help - Concede that the disparate impact theory of discrimination is the main problem, not the UG - Selection procedures do not cause mean racial differences-- just measure differences.
Arthur et al (2013)
(Legal) - Impossible to guarantee Adverse Impact reduction/elimination. This includes guarantees based on the inclusion of "alternative" selection devices (which have been purported to help reduce AI). - What we can guarantee are sound/valid tests/assessment decisions that can be defended. - Subgroup difference reduction and AI reduction require fundamentally different strategies - *Subgroup diff techniques*= Pre-test- (1) Remove internal bias (2) Improve perceptions, pretest coaching) - *AI reduction* = post-test- (1) cut score (2) weights (3) banding (maybe illegal) - AI = situationally specific (depends on hiring rate, proportion of minority applicants, variance of subgroup scores, score skewness, test-taker motivation) so strategies may work in some situations but not others - Statistically likely to find AI (too many subgroups, applicants, stat. procs) - AI doesn't necessarily mean discrimination *Also in Intro/Ethics*
Campion et al (2001)
(Weights/Banding) - Extensive disagreement exists regarding whether banding should be used in selection procedures. - Some argue that there are meaningful differences in performance within bands and that banding ignores these differences, others argue that differences are arbitrary - May help reduce AI & increase diversity, but might be a legal risk because of minority preference, and might undermine the meaningfulness of testing. -No court decisions have outlawed banding & the premise and logic of banding have been upheld; however, what has been successfully challenged is how candidates are selected from within the band.-- giving minorities preference is of questionable legality - Support use of banding, but disagree on how - *If banding is NOT used with minority preference, then it's ok. Courts appear to reject banding w/ systematic minority preferences. Banding is psychometrically sound.* - The Civil Rights Act of 1991 led to an increase in banding, because it forbid subgroup norming. Banding can reduce adverse impact because it treats everyone within a band the same way, but it may be hard to get people to believe it's fair
Bobko et al. (2007)
(Weights/Banding) Evaluate use of different weighting systems (Regression, archival expert info, expert judgments, unit) - Unit weights have substantial predictive ability when compared to regression weights and other methods - as useful, sometimes more - Unit weights do *not* lead to higher adverse impact -Weighting systems should *not* be used as a mechanism for reducing AI (i.e. assigning highest weights to tests with least AI). - Exceptions to their support of unit weights: (1) When underlying latent weights differ substantially (2) When reactions need to be considered (i.e. when experts have a strong argument regarding a certain portion of test battery, or when trying to increase SME ownership/involvement)
Bobko et al. (2005)
(Work Sample) - Many suggest that WS produce less AI than other methods, but these are based on studies that do not differentiate between method and construct, and studies that use job incumbents (RR on d estimates) - Study compensated for these problems and found that d was significantly larger than previously thought (closer to .71, which is comparable with Wonderlic GMA test), suggesting higher levels of AI. - WS might not be a good alternative to cognitive ability tests - can have as much or almost as much AI as cognitive tests
Roth et al. (2008)
(Work Sample) - Work sample literature subject to major limitations-> Indirect RR in incumbent samples causes us to underestimate WS group differences. The benchmark for black-white group differences (effect size d) is .38 (from previous studies), but it looks like it's more like .73. - Values of d can depend on exercise type (Technical and in-basket exercises (ds in .70s) > role plays and oral briefings (.20 ish) and construct saturation (with cognitive ability)-- highly saturated = .80; low =.20 - There is reason to believe that Work Samples (WS) produce higher levels of AI than previously thought
Marie & Tippins (2010)
* (Global jobs) - Global assessments help (1) Maintain consistency in quality of hires, (2) Identify EEs capable of working in multiple locations, (3) Help maintain a consistent staffing brand image, BUT they're complicated to implement - Make sure jobs equivalent across locations - Maintain test equivalence (are items understood/interpreted correctly given cultural/contextual differences?) - Consider differences in their legal system
Bott et al (2007)
* (Personality) - Applicants DO fake/inflate scores - Significant personality score differences between incumbent & applicant samples and it DOES matter -Social desirability-(+)->Personality composite --- Implies that we should correct personality scores based on SD scores - Setting cutoffs based on incumbent data leads to the underestimation of the number of applicants who will pass. This has major implications for the cost and time spent in future screening - Other suggestions = (1) warnings; (2) use situational judgment rather than Likert format
Vinchur et al (1998)
* (Special jobs) *A meta-analysis revealed that the best predictors of objective and subjective job performance for salespeople include (1) personality (potency & achievement), (2) sales ability tests, (3) interest inventories, and (4) biodata inventories.* -There are aspects of the sales (greater degree of autonomy & rejection) jobs that make unique demands on EEs & may contribute to a pattern of validity coefficients different from other jobs -Cognitive ability and age predict some criteria well (sales), others not so much (performance ratings)
Cascio & Boudreau (2011)
* (Utility) Supply-chain analysis - Aim to optimize investments at each step of the staffing process by using an integrative framework of selection, placement, performance management, onboarding, retention etc (instead of looking at each independently) - Allows for diagnosis of which areas can be improved to maximize supply-chain decisions
Kravitz (2008)
*(Affirmative action) - Discuss nonpreferential forms of affirmative action. The diversity-validity issue can partially be eliminated by using attraction, selection, inclusion, and retention techniques to increase employment of targeted groups. - Other forms of AA are almost always illegal - Need to attract more minorities to increase their representation in the workplace (Attraction-Selection-Inclusion-Retention of minority EEs)
Dean et al. (2008)
*(Assessment centers) - ACs associated with more AI than originally thought, possibly because of overlap with GMA - Black-white d = .52, Hispanic-white = .28, Male-female = -.19 (F>) - ACs may be associated with more AI against blacks than is portrayed in the literature, but may have less AI and can be more diversity-friendly for Hispanics and females
Arthur & Day (2011)
*(Assessment centers) - Overview of assessment centers (methods used to collect info regarding multiple behavioral dimensions) - Include in baskets, leaderless group discussions, 1:1 role plays, oral presentations, written case analyses - (1) high validity, (2) high utility, (3) favorable participant reactions, but (4) content validity has been debated
Berry et al. (2007)
*(Personality) -Integrity tests measure a combination of C+A+N+ honesty (HEXACO) - Measures a lot of C, which is why it predicts JP well - *Predicts CWB better than FFM* - Faking can be a problem but might be able to detect it with response times - *Applicants may react negatively*
Foldes et al. (2008)
*(Personality) Overall, racial differences in personality factors are negligible and won't result in adverse impact. However, some comparisons (e.g., Blacks-Asians) and subfacets could result in adverse impact at the factor or facet level.
Cascio & Aguinis (2011)
Selection Textbook