MGMT 485 Exam 1
measurement in HR
"involves the systematic application of rules for assigning numbers to objects (usually people) to represent the quantities of a person's attributes or traits" Measurement allows us to (attempt to) categorize these differences Hopefully, we have an idea of which differences are related to the subsequent job performance of those individuals selected from the applicant pool People are different
precision
"the number of distinct scores or gradations permitted by the selection procedure and criterion used" Much of what we do in HR results in categorization (dichotomization) of people even when we use scales with more precision Do not offer a job vs. offer a job Do not promote vs. promote It is better to use tools/scales that have more precision
task performance
(old thinking) how well workers completed job activities (commonly referred to as job tasks) was how well they performed on their jobs production data - consist of the results of work. The data comprise things that can be counted, seen, and compared directly from one worker to another. Other terms that have been used to describe these data are output, objective, and nonjudgmental performance measures judgmental data
selection program development steps
1. job analysis 2. identification of relevant job performance dimensions 3. identification of work related characteristics necessary for job 4. development of assessment devices to measure WRCS 5. validation of assessment devices content criterion 6. use of assessment devices in the processing of applicants
Job Adaptability Inventory (JAI)
A job analysis method that taps the extent to which a job involves eight types of adaptability handling emergencies or crisis situations handling work stress solving problems creatively dealing with uncertain and unpredictable work situations learning work tasks, technologies and procedures demonstrating interpersonal adaptability demonstrating cultural adaptability demonstrating physically oriented adaptability
percentiles
A percentile compares a score to all the other scores on the test A percent compares the score to the test itself Someone with a 50% got ½ the questions right (60th got 60 right out of 100) Someone with a score at the 50th percentile did better than half of the people who took the test A 50% may be a really good score on a test that is really hard -or - it may be a terrible score on an easy test
reliability and validity
A test can be reliable but not valid A test that is not reliable cannot be valid 𝑟𝑥𝑦 = sqrt(𝑟𝑥𝑥𝑟𝑦𝑦) 𝑟𝑥𝑦 = maximum possible correlation between predictor X and criterion Y, the validity coefficient 𝑟𝑥𝑥 = reliability coefficient of predictor X 𝑟𝑦𝑦 = reliability coefficient of predictor Y 𝑟𝑥𝑦 = sqrt(.85 ∗.52) = 0.66 < keep in mind, this is the maximum
validation strategies
A validation study provides the evidence for determining the legitimate inferences that can be made from scores on a selection measure. Most often, such a study is carried out to determine the accuracy of judgements made from scores on a predictor about important job behaviors as represented by a criterion
Test-Retest Reliability Estimates
Administer the measure twice and then correlate the two sets of scores using the Pearson product-moment correlation 𝑟 = [E^𝑛𝑖=1(𝑥𝑖 − 𝑥-)(𝑦𝑖 − 𝑦-)]/[sqrt(E^𝑛𝑖=1)(𝑥𝑖 − 𝑥-)^2]* [sqrt(E^𝑛𝑖=1)(𝑦𝑖 − 𝑦-)^2] Example: Give a math ability test to applicants on one occasion then give it to them again 8 weeks later Considered by some to be the best estimate of reliability Not frequently used
issues with internal consistency
Alpha, specifically, is the lower bound estimate for internal consistency reliability. That is to say, unless the items are tau equivalent, α will be lower than the real internal consistency reliability. Internal consistency estimates are affected by test length. Shorter tests should have lower reliability than longer tests (if you take more shots at the target, you're more likely to hit the bullseye)
predictive validity
Also called the "future employee method" or "follow-up" method 1. Conduct a Job Analysis 2. Determine the relevant WRCs required to perform the job successfully 3. Chose or develop the experimental predictors of those WRCs 4. Select or develop criteria of job success 5. Administer predictors to current employees and file results 6. After passage of a suitable period of time, collect criterion data 7. Analyze predictor and criterion data relationships differences from text 5. Administer predictors to job applicants 6. Do Not Use the predictor data from job applicants to make selection decisions 7. Analyze predictor and criterion data relationships for those job applicants who were hired at a later date Note that this is a description of one way of conducting a predicative validity study Strengths and weaknesses: 1. Must have large sample sizes of people willing to participate (think more than 100) 2. Must control for differences tenure 3. Must consider representativeness of present employees and job applicants 4. Certain people might not elect to participate 5. Motivation of those who do participate may be lacking 6. Time gap between when the predictor and criterion data are collected can cause problems
concurrent validity
Also called the "present employee method," "local validity," or "direct validity" 1. Conduct a Job Analysis 2. Determine the relevant WRCs required to perform the job successfully 3. Chose or develop the experimental predictors of those WRCs 4. Select or develop criteria of job success 5. Administer predictors to current employees and collect criterion data 6. Analyze predictor and criterion data relationships Note that this is a description of the ideal concurrent validity study Strengths & Weaknesses 1. Must have large sample sizes of people willing to participate (think more than 100) 2. Must control for differences tenure 3. Must consider representativeness of present employees and job applicants 4. Certain people might not elect to participate 5. Motivation of those who do participate may be lacking
developing selection measures
Analyze the job for which a measure is being developed Select the method of measurement to be used Planning and developing the measure Administering, analyzing, and revising the preliminary measure Determining the reliability and validity of the revised measure for the jobs studied Implementing and monitoring the measure in the human resource selection system
characteristics of selection for an initial job
Applicants are external to the organization. They are commonly students, people who have recently completed an education, those who are currently not employed, or those who are employed at other organizations. Applicants are recruited through formal mechanisms such as media advertisement, Internet contact, employment agencies, and suggestions of present or former employees of the organization. These recruitment mechanisms frequently produce a large number of applicants, especially when jobs are in short supply. When there is a large number of applicants, the costs of selection become an important factor for an organization. Frequently, this number is reduced drastically by a brief selection instrument, such as an application form that collects only limited information. Only a small number of applicants complete additional selection instruments that gather more extensive information. These remaining applicants go through a formalized program that has a series of steps such as interviews, ability tests, and job simulations. Decisions about to whom to extend employment offers also are formalized. Either statistical analysis is used or multiple people meet to discuss the candidates and identify those who are offered positions.
ordinal
Can be rank ordered but the degree of difference in the rankings cannot be quantified Restaurant rankings (how much better is a 3 star Yelp review over a 2 star Yelp review?) labeled meaningful order
characteristics of selection for promotion
Candidates are already internal to the organization—that is existing members of organization compete for a position. A limited number of recruitment techniques are used, for example, postings of job vacancies either online or on bulletin boards, announcements by HR specialists or managers of the organization, and requests for nominations including self-nominations. Often no formal recruitment techniques are used. One or a small group of managers identify a small number of individuals who are thought to be able to do the job. Frequently these individuals do not even know they are being considered (are actual applicants) for the job. Because the applicants are members of the organization, there is already a great deal of information about them, such as performance reviews, training records, work history, records of attendance, reprimands, awards, and so on. Few formal selection instruments are used. Often the evaluation of applicants is not formalized—that is, the decision makers make the decision about whom to promote based on subjective decision making. As we will explain many times, we do not agree with such subjective selection decisions. Actually, we hate them.
what should small organizations do?
Cannot use traditional criterion related validity studies because sample sizes (number of current employees) are too small Use content validation If you can group enough jobs together to get a large sample size then use component / synthetic validity Meta-analytic results can be useful. Just be careful and back up your use of a tool with local / content validation
Interpreting Validity Coefficients
Cohen in the 1980s defined correlations as: Small: |.10| to |.29| Medium: |.29| to |.49| Large: ≥ |.50| Percent variance explained: _𝑟_2_ * 100 .3^2 * 100 = 9% When correlations are low: We are not predicting very much variance in the criterion We are likely wildly misclassifying most job applicants
Cross-Validation
Collect data from two samples to see if the results (see below) are the same < considered better than the more commonly used method outlined below More commonly: We collect data from a big sample Split it in half randomly (a "weighting group" and a "holdout group") Compute a regression equation in the weighting group Use the regression equation in the weighting group to predict scores in the holdout group Compute the correlation between the predicted and actual performance in the holdout group. If this correlation is significant then the regression equation is useful
Content Validity
Content Validation - General Information Also called "local" or "indirect" Is the best option with small samples (small businesses) Can be used when attaining criterion data is impossible Content valid selection tools are often viewed favorably by job applicants Content Validation includes: Sampling behavior from the actual job but may also include knowledge, skills, and abilities that are necessary for effective work performance The behaviors and other WRCs are referred to as the job content domain The closer, the more realistic, the better the measure's content must be representative of the requirements for successful job performance Important things to consider: Differs greatly from most other validation concepts because it relies almost exclusively on expert judgment. Major Aspects of Content Validation Conduct a comprehensive job analysis Describe the tasks performed on the job Measure criticality or importance of the task Specify WRCs required to perform critical tasks Measure the criticality of the WRCs Link important job tasks to important WRCs Pretend that the Job Analysis has been done. Your job is to evaluate their work. Select experts (SMEs) to participate in a content validation study (job incumbents, supervisors, must be qualified, use a diverse group) Specify selection procedure content < this overlaps with Job Analysis You will probably use SMEs for this but they may not be the same SMEs who conduct the content validation work Selection procedure as a whole Adequate coverage? Most important content? Item-by-item analysis Will it predict Performance? Safety consequences? Supplementary indications of content validity Align with job requirements? Will people who are and are not "test wise" perform similarly Consider physical and psychological fidelity. Physical fidelity - concerns the match between how a worker actually behaves on the job and how an applicant for that job is asked to behave on the predictor used in selection Psychological fidelity - "occurs when the same WRCs required to perform the job successfully are also required on the predictor Misalignment means you 1) aren't assessing what you think you're assessing, 2)the measure may not predict what you hope it predicts, and 3) the tool you're using may be biased whereas what you should have used would not have been biased Assess selection procedure and job content relevance An essential step - one could argue that this step is content validation SMEs - hired experts, managers, expert job incumbents - actually take the test and then rate the extent of test content-to-WRC overlap "To what extent are each of the following work-related characteristics needed to answer the se questions (or perform these exercises)? 1 = Not at all 2 = To a slight extent 3 = To a moderate extent 4 = To a great extent" (p. 287) Note: It is possible to compute some statistics on these ratings as well - see p. 288
The Effect of Reliability on Validity Coefficients
Correction for reliability in both criterion and predictor 𝑟(hat)𝑥𝑦 = 𝑟𝑥𝑦/[sqrt(𝑟𝑥𝑥*𝑟𝑦𝑦)] .4/[sqrt(.85∗.6)] = .56 Correction for reliability in just the criterion: operational validity 𝑟(hat)𝑥𝑦 = 𝑟𝑥𝑦/[sqrt(𝑟𝑦𝑦)] . .4/[sqrt(.6)] = .52 𝑟(hat)𝑥𝑦 = corrected validity coefficient of the predictor if the criterion were measured without error 𝑟𝑥𝑦 = correlation between the predictor and the criterion 𝑟𝑥𝑥 = reliability coefficient (usually alpha) of the predictor 𝑟𝑦𝑦 = reliability coefficient (often an interrater reliability) of the criterion for ex. 𝑟𝑥𝑦 = .4 𝑟𝑥𝑥 = .85 𝑟𝑦𝑦 = .6
variation
Data may not be normally distributed. This busts most of the assumptions for the types of statistics we use in HRM. We'll ignore that point We need measures that exhibit variation (and we'll hope they are normally distributed or nearly so): If subjective measures of job performance do not vary (every supervisor rates his/her employees at the highest level, which they are inclined to do) then we can't use them. ... there's no variation to predict If the predictors do not vary then we cannot use them to predict whatever variation might exist in the criterion You cannot correlate a variable with a constant
job performance
Defining job performance is much harder than it sounds and often includes multiple dimensions that must be considered simultaneously Objective performance (number of widgets produced, number of publications) Subjective performance (supervisor ratings, student teacher ratings) Attrition, absenteeism, other CWBs Teamwork task performance > asking supervisors to judge/rank > include fellow workers and customers + supervisor
nominal
Differ in categorization only. Cannot order the categories What state are you from? labeled
selection measure suggestions
Difficulty Homogeneity Response format: Ability tests should look like the ability tests you have been taking your whole life Likert scales should use 5, 7, or 9 rating categories Administration and Scoring - Similar as possible for all candidates Screen out bad data < (not in book) Use instructed questions (select "4" or "c" for this question) Look for unlikely response patters including "making Christmas trees," straight-lining, etc. Bad data make for poor reliability estimates Do not be afraid of long tests / long application materials < (not in book)
Fixing Interrater Reliability Statistics
Do not use subjective ratings If you must use subjective ratings Consider behaviorally anchored rating scales Train the raters (calibration) Track agreement and kick out "bad raters"
construct validation
Does the measure indeed reflect a specific construct?
error
Error, on average, is expected to be normally distributed Error will be as equally likely to cause someone to have a higher score than their true score as it is likely to cause them to have a lower score than their true score Errors are not expected to be correlated It is assumed that error of measurement is related only to the measure itself. Thus, predictor reliability is not expected to be correlated with criterion reliability Error attenuates the relationship between predictors and criterion The observed correlation between a predictor (IQ) and a criterion (performance) will be lower to the extent that both measures have error
Internal Consistency Reliability Estimates
Estimate of the extent to which items used in the selection procedure (test) measure the same thing. Are the questions homogenous? If the questions are homogenous then they should correlate highly with one another Split-Half Reliability Kuder-Richardson Reliability (KR-20) Coefficient Alpha (α) popularized by Cronbach And more! Congeneric reliability, omega...
regression equation
For a single predictor the regression equation tells you the slope (b) and the intercept (a) of the regression line 𝑌(hat) = 𝑎 + 𝑏𝑋 For every one-unit increase in X we expect a b increase in Y For multiple predictors (multiple regression), we get a b for every predictor variable 𝑌(hat) = 𝑎 + 𝑏1𝑋1 + 𝑏2𝑋2 + ... + 𝑏 𝑛𝑋𝑛 Notice the little hat over the Y, that is because it is the predicted level of Y after having used the fitted equation
appropriateness of content validation
From the Uniform Guidelines: "A selection procedure based upon inferences about mental processes cannot be supported solely or primarily on the basis of content validity. Thus, a content strategy is not appropriate demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, aptitude, personality, commonsense, judgement, leadership, and spatial ability." Do not use content validation for the above (as stated) Do use content validation for: knowledge, skills, abilities, or observable behaviors shown to be job related Your book authors note that: Frank Schmidt says that general cognitive ability (intelligence tests) can be used for all jobs Content validation is best used for less abstract concepts when the inferential leap is small If the inferential leap is large, some courts will not accept content validation and that criterion-related validation must be used If the WRC is something learned on the job then do not select for that WRC
averaging studies
Good - correct for sampling error. The normally distributed errors do average out (or close enough) to 0. We do tend to use a weighted average where we give more credit to studies with big sample sizes in the hopes that they are better estimates (less error) Bad - Differences in study quality, measures used, standardization (remember chapter 6!) matter. Putting a bunch of low quality studies in a meta-analysis doesn't fix them. Mixing high quality and low quality studies does not fix the low quality studies. It contaminates the good ones.
split-half reliability issues
How do you split the test in half? Odd / Even First Half / Last Half? Random? Turns out the, the split matters. Some result in higher estimates than others. What if we then split the test in all possible ways it can be split, compute the correlations for all of those possible split-half reliabilities, and then average those correlations together?
tests
Hundreds of options including ability, personality, simulation tests, and integrity predictor
how high should reliability be?
In the early stages of research on predictor tests or hypothesized measures of a construct, one saves time and energy by working with instruments that have only modes reliability, for which purposes of .60 or .50 will suffice In those applied settings [selection] where important decisions are made with respect to specific test scores, a reliability of .90 is the minimum that should be tolerated, and a reliability of .95 should be considered the desirable standard
Placement
In the ~1940s we selected people into an organization (the military) and then placed them in one of many open positions Today, the two functions occur largely simultaneously. we do a job analysis and select then place people directly into a position in the organization.
rater error
Inadvertent bias in responses this bias most frequently is described in one of the following four ways: halo, leniency, severity, and central tendency Halo - rating the subordinate equally on different performance items because of a rater's general impression of the worker. The rater does not pay specific attention to the wording of each individual scale but rather makes the rating on the basis of the general impression Leniency or severity - occurs when a disproportionate number of workers receive either high or low ratings respectively. This bias is commonly attributed to distortion in the supervisor's viewpoint of what constitutes acceptable behavior Central tendency - occurs when a large number of subordinates receive ratings in the middle of the scale. Neither very good nor very poor performance is rated as often as it actually occurs
factors that cause error
Individual physical and mental health, motivation, mood level of stress, understanding of instructions, content of the items... Characteristics of the test administrator (example, one applicant interviews with a friendly person and another interviews with a more stoic person) When the selection procedures use subjectivity and judgment Characteristics of the environment such as temperature of the test location, lighting, noise
range restriction
Is a real effect and is common. Is a great reason to conduct a predictive validation study though it does not fix all problems Correction for Range Restriction (formula is massive, look on ch8slide37) 𝑟(hat)𝑥𝑦 = estimated validity if restriction had not occurred 𝑟𝑥𝑦𝑟 = validity computed on restricted scores 𝑆𝐷𝜇 = standard deviation of predictor scores from unrestricted group 𝑆𝐷𝑟 = standard deviation of predictor scores from restricted group
errors of measurement (general)
It is important to keep in mind that reliability deals with errors of measurement All measures have error Even really fine measures of physical properties "Measure twice, cut once Cannot be measured, wouldn't be an error Measurement error is NOT how closely the actual quantity measured actually aligns with the quantity we desire to measure Early IQ tests asked people about current events. Highly reliable but a poor assessment of IQ
homogeneity
Items should correlate at about the same level with one another. Items that do not correlate with the others should be deleted. But there is a construct breadth v. homogeneity tradeoff to consider
content validation
Judge whether the content of selection tools is similar to job performance measures
work related characteristics (WRCs)
KSAs Identification of WRCs (the things necessary in an employee to perform the job well) often relies on a subjective judgment of HR specialists or Subject Matter Experts (SMEs) After the WRCs have been identified, it becomes necessary to either find or construct the appropriate selection measures or instruments measured via: applicant blanks, biographical data forms, reference checks, the selection interview, mental and special abilities tests, and personality assessment inventories
constraints
Limited Information on Applicants - Collecting lots of data is expensive Applicant and Organization at Cross-Purposes - the necessity to discern what is true and what may be false in the data collected from applicants. applicant wants the job, organization wants the best applicant. People fake their WRCs and work experience Predicting Job Performance is Probabilistic - Measurement is not perfect and unmeasured factors predict performance. the measurement of many WRCs is difficult and not as precise as we would wish Selection Research vs. Selection Practice - Many organizations do not use evidence-based HRM (managing by translating principles based on evidence into organizational practice). Use "in house" best practices (that are not actually best practices)
good selection programs
Lower employee turnover Higher employee performance Lower friction between employees and the employer Higher customer satisfaction Greater innovation Higher firm performance (good selection costs money but the gain from good employees overweighs the selection costs) Higher employee productivity Greater post-recession recovery
Component / Synthetic Validity
Multiple approaches to component / synthetic validity The argument is that if a selection procedure predicts task performance on one type of job, we can use that same selection procedures for other jobs that require that same task The task and what constitutes performance on that task must be consistent across jobs (math ability of middle school students is not the same as math ability for college students even if the construct, math ability, is called the same thing) Good because you can group jobs together Can simplify some validation efforts Great for smaller organizations because it provides for an increased sample size to conduct a criterion validity study And has been viewed favorably in courts Bad because Some validities have been lower for component validity compared to a full validation study You are only looking at part of a job with component validity Important job WRCs for the job in total may be left out What constitutes high levels of work performance my not be similar across the jobs that are grouped together
Parallel/Equivalent Forms Reliability issues
Not as easy as it sounds to make truly parallel forms Time consuming to make Time consuming to go through the process to make sure they mm really are equivalent If they are not equivalent then you have introduced some source of error and are underestimating the true reliability Easier to do for some characteristics (math, spelling, vocabulary) than others Are you really going to get applicants to take two of the same test?
interrater agreement
Not as strict as interrater reliability Estimates include: Percentage of agreement Kendall's coefficient of concordance (W), Cohen's Kappa (κ) 𝑟𝑤𝑔 < not mentioned by your book's authors
criteria
Objective production data Personnel data Judgement data Job or work sample data Training proficiency
interviews
Often used to determine employee "fit" with the organization predictor
assumptions for using corrections
Overall Assumptions Bivariate normal distributions Homoscedasticity of error variances Specific to reliability The reliability estimate is a good estimate Tau-equivalent items if you use alpha That the time interval is appropriate for test-retest reliability That reliability in the predictor and criterion are not correlated That reliability is not also affected by range restriction (it probably is) If reliability(ies) is(are) affected by range restriction then you need to correct the reliability(ies) first before correcting the validity Specific to range restriction That selection ratios are not extreme (they probably are). Extreme meaning that 20% or fewer job applicants are selected. If selection ratios are extreme then the range restriction formula can way under correct. If selection ratios are extreme and the distributions are not bivariate normal then the range restriction formula can way over correct That range restriction is direct, meaning you selected people on the predictor you are correcting (it probably isn't in a validation study) If you have indirect range restriction then you need to apply a different formula. And there are several different forms of indirect range restriction
alpha
Popularized by Cronbach and often referred to as Cronbach's α. He *stole* borrowed it from Guttman: Guttman's λ3 Is easy to compute. It is the average of all possible split-half reliabilities and is equivalent to KR-20 𝛼 = 𝑘/𝑘−1[1 − (E𝜎𝑖^2)/(𝜎𝑦^2)] k = number of items on the selection measure 𝜎𝑖^2 = variance of respondents scores on each item (i) on the measure 𝜎𝑦^2 = variance of respondents' total scores on the measure
mismatch of interests
Rapid and costly turnover, lower performance levels, and friction between an employee and the organization are among the results
Interrater Reliability Statistics problems
Raters view the same behavior differently or at different times Raters interpreted the same behavior differently The role of idiosyncratic halo for each rater, such that general impressions bias ratings of specific behaviors The sample of behavior itself may be inappropriate for misleading Error in rating or recording each impression
interpreting reliability
Reliability coefficients range from 0 to 1 They can be thought of as a measure of error Percent Error = (1 - reliability estimate) * 100 (1 - .7) * 100 = 30% 30% Error and 70% True Variability in obtained Scores Reliability is specific to the sample on which it was calculated A necessary but not sufficient condition for validity A measure that is not reliable, by definition, cannot be valid Is expressed by degree (is not yes v. no) Is determined by judgment
standard error of measurement
Reliability is a group based statistic. It does not tell us how error will affect a score 𝜎𝑚𝑒𝑎𝑠 = 𝜎𝑥[sqrt(1 − 𝑟𝑥𝑥)] 𝜎𝑚𝑒𝑎𝑠 = the standard error of measurement for measure X 𝜎𝑥 = the standard deviation of obtained scores on measure X 𝑟𝑥𝑥 = the reliability of measure X Example: 𝑟𝑥𝑥 = .90, 𝜎𝑥 = 10 10[sqrt(1 − .9) = 3.16 A score of 50 has a possible individual range of 46.84 to 53.16
purchasing tests
Reliability is not validity. If a test is not reliable then it cannot be valid. Just because a test is reliable, it does not then mean that it is valid of over 1000 commercially available tests published Over 22 percept appeared without any reliability information 7 percent showed neither reliability nor validity data 9 percent showed no reliability data for certain subsets or forms 28 percent did not report any normative data
Validity Generalization & Meta-Analysis
Sampling error is a fact (no way around it). Small sample sizes are a problem. In small companies, samples will always be small. More then 95% of all companies employ less than 100 people But, sampling error is supposed to be normally distributed with an average of 0 What if we could develop some kind of grand average by combining lots of validity studies? The error should average out And the grand average should be the average of the sampling distribution, the "real" effect (the real validity) There is only very little wrong with the thinking on the previous two slides. But Schmidt and Hunter took things to the next level.
concurrent vs predictive validity
Some argue that Predictive is better. Often need both Concurrent is often where we try to figure out what is going to work but this can be problematic (range restriction) But measures that work one way with current employees might not work the same with job applicants (e.g., sales self-efficacy) Sampling error (problems with sample sizes smaller than a few thousand) can cause big problems with either method Should consider continuing to collect data to see if anything has changed
attributes
Some of what we measure about people are psychological characteristics and we cannot measures these directly: Intelligence, grit, conscientiousness, locus of control, achievement motivation Even objective measures often do not directly measure the construct of interest: Degree attained and GPA when we want to know "education" We measure constructs - things that are not directly observable We must make inferences from our measures (e.g., the numbers from the test are a high quality representation of the underlying construct)
standards, principles and guidelines
Standards for Educational and Psychological Testing (Standards) Do not limit on what can be validated using content validity Principles for the Validation and Use of Personnel Selection Procedures (Principles) Do not limit on what can be validated using content validity Uniform Guidelines Restrict the use of content validation so that it cannot be used for psychological constructs
The Courts and Criterion-Related Validity
Statistical Significance - necessary but insufficient (sample size) Some courts also want to know content Some courts want to know about legal history and may dismiss positive local empirical results (they ignore data!) Some courts differentially weigh results from predictive versus concurrent validity studies Utility matters - a small significant effect is not as good as a big significant effect Courts differ on their acceptance of statistical corrections (SCOTUS has never found in favor of the use of corrections)
difficulty
Test questions of moderate difficulty (where roughly half or 50 percent of the test takers answer a question correctly) will spread out the test scores (bad)
Test-Retest Reliability issues
Test-Retest is affected by time Short time periods between testing mean the person remembers how they responded to the questions. Tests memory, not the characteristic we desire People change such that long time periods may capture real change in a person rather than unreliability Not appropriate for characteristics of individuals that are expected to change over time (example: mood) How often do you think we get to test job applicants on the same materials eight weeks apart as part of the application process?
interval
The difference between numbers take on meaning and there is a constant unit of measurement Most ability tests used in selection labeled meaningful order measurable difference
reliability
The extent to which we can quantify how well our measurement device will consistently and dependably measure the same thing whenever it is that we try to measure it the degree of dependability, consistency, or stability of scores on a measure used in selection research often denoted as rxx
utility analysis
The goal of utility analysis is to translate the results of a validation study into terms [aka. money] that are important to and understandable by managers 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 $$ 𝑔𝑎𝑖𝑛 𝑓𝑟𝑜𝑚 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛= 𝑁𝑠*𝑟𝑥𝑦*𝑆𝐷𝑦*𝑍𝑠 − 𝑁𝑇(C) 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 $$ 𝑔𝑎𝑖𝑛 𝑓𝑟𝑜𝑚 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 = the return in dollars to the organization for having a valid selection program 𝑁𝑠 = number of job applicants selected 𝑟𝑥𝑦 = validity coefficient of the selection procedure 𝑆𝐷𝑦 = standard deviation of work performance in dollars 𝑍𝑠= average score on the selection procedure of those hired expressed in z or standardized score form as compared to the applicant pool (an indication of the quality of the recruitment program) 𝑁𝑇 = number of applicants assessed with the selection procedure 𝐶 = cost of assessing each job applicant with the selection procedure Ex. 𝑁𝑠 = 10 𝑟𝑥𝑦 = .51 𝑆𝐷𝑦 = $12,000 𝑍𝑠= 1 𝑁𝑇 = 100 𝐶 = $20 expected gain = 10(.51)(12000)(1) − 100(20) =$59,000 Sure looks like it would be handy Easy to get lost in the details (where does _𝑆𝐷_𝑦_ come from?) such that people end up distrusting the results (and the hoped for benefit of doing the analysis disappears) Not many HR experts currently use Utility Analysis Other factors (political acceptance of the tool) can be more important
the validity coefficienty: r
The last step of a validity study is "Analyze predictor and criterion data relationships" Often done by exploring properties of the Pearson Product-Moment Calculation (aka., zero-order correlation, correlation coefficient, r) < Yes, there's more than one type of correlation Size: Correlations range from -1 to 1. Courts seem to like correlations > |.3| Statistical Significance The Pearson Product-Moment correlation is (generally) the correlation that Excel computes ("=correl()") Is good for quantifying linear relationships A positive number means the two variables are positively related to one another A negative number means the two variables are negatively related to one another Your book is wrong, a correlation does not tell you about the steepness of a slope in an X-Y Scatter Plot A correlation tells you how much the observations deviate from the best fitting straight line
BARS vs BES
The main difference between BARS and BES is in the wording of the incidents. BARS incidents are worded to reflect actual work behaviors, for example, "greets customer as he or she approaches." BES incidents are phrased as expected behavior, for example, "can be expected to greet a regular customer as he or she approaches." The wording difference points out to supervisors that the employee does not need to demonstrate the actual behavior of the incident to be scored at that level. The incident is to be interpreted as representative of the performance of the employee. Scores can be obtained for each dimension or summed for a total score of job performance
what is high quality
The operationalization of the construct (the measurement and the numbers that result) closely aligns with the actual construct we intend to measure. That is to say, the score represents what it is that we think it represents. The measure is reliable (see chapter 7) The measure is valid (see chapters 8 - 14)
meta-analyses
The original term, "validity generalization" is no longer used. It is long known that validities do not generalize. Today, we use the term "meta-analysis" for the same math, but we attempt to draw different conclusions Meta-analysis cannot be ignored. It has its place You cannot take the validity from a meta-analysis and assume that you will get the same validity in your workplace. Use it, but Use a local validity study too Use content validation too Meta-analyses conducted with different assumptions often come to different conclusions (so much for a final word) Most modern meta-analyses find moderation, meaning there is not one answer, the true score itself is myth, and situational and other factors still matter. If we ignore those factors, we are messing up (possibly big time). Meta-analysis has never won in the SCOTUS The prominence of meta-analysis in other fields has dropped considerably. Management cannot be far behind
finding and constructing measures
The process of identifying selection measures to be used in an HR selection study should not be taken lightly. Identification of measures is not accomplished simply by familiarity with or personal whims about what measures are best in a specific situation predictors What does the predictor measure? Is the predictor cost effective? Has the predictor been standardized? Is the predictor easy to use? Is the predictor acceptable to the organization? To management? To the candidate? criteria Is the criterion relevant to the job for which it is chosen? Is the criterion acceptable to management? Are work changes likely to alter the need for the criterion? Is the criterion uncontaminated and free of bias, so that meaningful comparisons among individuals can be made? Will the criterion detect differences among individuals if differences actually exist (discriminability)? Are meaningful differences among individual actually scored with respect to the criterion? Does the measure unfairly discriminate against sex, race, age , or other protected groups? Does the measure lend itself to quantification? Is the measure scored consistently? How reliable are the data provided by the measure? How well does the device measure the construct for which it is intended (construct validity)?
validation procedures
The purpose of validation is to provide evidence that data from the selection instruments are related to job performance. Statistical data analysis, usually correlational analysis (which measures how closely related two different sets of scores are), is the most straightforward manner of producing this evidence. We need to be able to assess whether or not the data we are going to collect from job applicants is really actually useful for predicting their later performance in our work organization predictor - wrc/ksa criterion - job performance r = ?
lack of variance
The scores that are generated must have differences among them. It is conceptually useless and statistically difficult to make sense out of numbers in which there is little, if any, difference in performance levels. If every worker performs at the same level, the differences in WRCs make no difference in performance levels. This lack of variance can be caused by two factors: standardization in output due to work process inappropriate use of the measurement device
interpreting scores based on the standard error
The standard error allows us to see how a lack of perfect reliability affects our certainty of individual scores Are scores of 50 and 51 really that different? Using the data from the previous slide, "no" The difference between two individuals' scores should not be considered significant unless the difference is at least twice the standard error of measurement of the measure Said differently, if you were trying to pick between two candidates whose scores were 50 and 51 you would not be able to justify picking the 51 over the 50 based on the data from the previous slide
ratio
The value of zero has meaning and it is the absence of the quantity Money, length, mass, ... and intelligence? No. labeled meaningful order measurable difference true zero starting point
importance of measurement
There is no substitute for using high-quality measures in selection Quality measurement is critical in HR selection. The use of meaningless numbers will result in meaningless conclusions, no matter how complex the statistical analyses Measurement is like the foundation of a home to Selection in HR. If the foundation is bad, the home is unsafe no matter how well the rest of the house is built, no matter how refined the finishes, and no matter how good the decorating
schmidt and hunter
They claimed that once we accounted for sampling error and: 1. Differences in predictor reliability 2. Differences in criterion reliability 3. Differences in restriction in range 4. And a few other things that we can account for (though they list several things that cannot be accounted for too) that all or nearly all between study variation would be accounted for. And. And! Validities are much bigger than we always thought. They argued that situational specificity was a myth and that their math could allow you to show that validities generalized across all work settings (well, except for the ones where their own work showed variation) Promised us the truth Literally. They claimed that through validity generalization we would be able to compute the "true score validity" for different selection procedures We would no longer need significance testing. Once application of math told us the one true number, we would be able to take the number safely anywhere Validity generalization was custom made to tell selection experts what they had always wanted to hear Their calculations have been shown to have major problems They tend to overcorrect validities (the "true" validity is too high) They tend to account for too much between study variation (often over 100%) Updated tools show that there's still considerable between study variation even after accounting for all of the things Schmidt and Hunter listed (long live situational specificity) The statistical assumptions that need to be met to use corrections (for reliability and range restriction) are almost never met in practice and the way the assumptions are broken tend to overcorrect validities Basically, much of what they promised was a scam But don't tell the book authors because they still seem to believe it.
Classical Test Theory (CTT)
True scores - the real, actual quantity of the characteristic we are trying to measure about a person (or anything else really) with a measure (predictor or criterion) In people, true scores really probably do not exist, and "ideal conception" 𝑋𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 = 𝑋𝑡𝑟𝑢𝑒 + 𝑋𝑒𝑟𝑟𝑜𝑟 𝑋𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 = obtained score for a person on a measure 𝑋𝑡𝑟𝑢𝑒 = true score for a person on the measure; that is, the actual amount of the attribute measured that a person really possesses 𝑋𝑒𝑟𝑟𝑜𝑟 = error score for a person on the measure; that is, the amount that a person's score was influenced by factors present at the time of the measurement that are unrelated to the attribute measured. These errors are assumed to represent random fluctuations or change factors
Parallel/Equivalent Forms Reliability
Two forms (Form A and Form B) for the selection procedure (usually a test) are generated Provide one form at one time and then the second form at a later time period The reliability is estimated by correlating (Pearson product-moment correlation) the scores from the two tests and then correcting this correlation for length using the Spearman-Brown Prophecy formula Considered a coefficient of equivalence and stability
internet-based selection measures
Two general types of internet selection testing: computer-based, supervised assessment. In this type, applicants come to a physical facility designated by the company and equipped with computers either in individualized spaces or in a large room that houses multiple computer stations. Such testing is very similar to traditional selection methods with the big difference being the computer versus paper-pencil or discussion-based testing. unproctored assessment. In this, the applicant can complete the selection measures at any location he chooses, at any time, and with any electronic device that can access the Web Issues with Computer literacy, graphic transmission (interpretation of same message), technical failures, equivalency (same results as traditional testing such as pen&paper?), cheating, security (do others know the questions you're asking?), standardization of test environment
criterion related validity
Two major methods of determining Criterion Related Validity: Concurrent Validity (mostly likely with job incumbents) Predictive Validity (can be job incumbents or job applicants but will always be what we are doing with job applicants) You build the nomological network (you show, using empirical results, that things are related to what you think they should be related to) Content validity efforts are good. Criterion Validity can be better. If you have the statistical power to conduct a criterion related validity study, do it.
errors of measurement (selection)
Unlike physical measurements of static characteristics (the weight, height, thickness, and width of the text book), characteristics of people are much more difficult to asses if errors of measurement can be assessed, a measure's reliability can be determined (impossible) instead, estimate reliability based on variability in the measurement If you took the ACT/SAT twice, did you get the same scores both times? What changed? You? The test?
existing measures
Use existing measures when you can Less expensive and less time-consuming than developing new Reliability and validity information will already exist (and may be similar to what you will find in your workplace) Existing measures (developed and trialed by experts) are often superior to what can be developed in house (by non experts with limited data)
Interrater Reliability Estimates
Used when "scoring is based on the individual judgment of the examiner (the interviewer) Examples including ratings of: Jobs by job analysis Subordinates' job performance by supervisors Candidates' performances in a selection interview by interviewers Applicants' performances on a behavioral-based measure Two major sources of error Variation in what is being rated Characteristics of the rater(s) Empirical studies of interrater reliability estimates of supervisor ratings suggest these reliabilities range from the high .4s to the low .6s
intraclass correlation
Used when three or more raters have made ratings on one or more targets Many different ways to estimate intraclass correlations. Most of them rely on ANOVA models and require boatloads of math or the ability to use SPSS
interclass correlation
Used when two raters make judgements about a series of targets or objects being rated (such as interviewees, jobs, subordinates Can use Pearson product-moment correlation, Cohen's Kappa (κ)
estimating reliability
Ways of conceptualizing reliability: How dependably can an individual be assessed with a measure at a given moment? How dependably will data collected by a measure today be representative of the same individual at a future time? How accurately will scores on a measure represent the true ability of an individual on the trait being sampled by a measure? [this is not reliability and I don't know why they list it here] When individuals are being rated by more than one rater, to what degree do evaluations vary from one rater to another? Or, to what extent is an individual's score due to the rater rather than to the individual's behavior or other characteristics being rated?
Spearman-Brown Prophecy Formula
We can try to estimate how much more reliable a test would be if we were to make that test longer (we need to apply this to split-half correlations because we cut the test in half) Also estimate how much we will hurt reliability by making a test shorter 𝑟𝑡𝑡 = 𝑛𝑟𝑥𝑥/(1+(𝑛−1)𝑟𝑥𝑥) rtt = the estimated correlation of the lengthened test rxx = the average correlation of the items on the current test n = number of the times the test is increased in length Reliability is affected by how highly the items on the test are correlated with one another Reliability is affected by test length Is it good for a measure if all the items are very highly similar?
predictor variables
a category of variables required in determining job applicants' success Predictors are Selection Procedures including tests and interviews. We attempt to assign numbers to these procedures as objectively as possible
adaptive performance
a deliberate change in the thinking or behavior of an individual because of an anticipated or existing change in the work activities or work environment Change can include new individuals or groups of people, change in work activities or technology, organizational structure or systems change, or market place changes that affect the work of the individual. Differences in WRCs can be used to predict differences in AP. The field of selection is at the stage that it has accepted that both OCBs and AP are facets of job performance and can be included in operational selection programs
validation study
a study "designed to determine which selection procedures are related to job success and should be used in selection decision making
Personnel data
absenteeism, tardiness, turnover, accidents, salary history, promotions, special awards
valid
accuracy of a measure
judgmental data
an individual familiar with the work of another is required to judge this work. This measurement is usually obtained by using a rating scale with numerical values. In most cases, the individual doing the evaluation is the immediate supervisor of the worker being evaluated Many jobs such as managerial, service, professional, and staff positions no longer produce tangible, easily counted products on a regular basis—for which the use of production data would be appropriate. Almost by default, judgmental data are increasingly being used for performance measurement The information is supplied by individuals who should know firsthand the work and the work circumstances; after appropriate initial development, the use of the judgment scales should be relatively easy and quite accurate
application blanks
asked applicants for superficial information regarding education, work history, and previous personal behaviors
predictors
background information interviews tests
trait rating scales
bad method requires the supervisor to evaluate subordinates on the extent to which each individual possesses personal characteristics thought to be necessary for good work performance personality traits such as dependability, ambition, positive attitude, initiative, determination, assertiveness, and loyalty. Frequently these traits are not defined on the rating scales that evaluators are asked to use Trait ratings are measures of personality characteristics that have no proven relationship to performance. Moreover, the accurate assessment of such traits by a supervisor is nearly impossible
360 degree feedback
based upon the assumption that the nature of managerial work is so complex and includes so many interpersonal relationships that ratings from only a supervisor would provide very limited information. Gathering judgmental information from all the levels of people with whom the manager works would provide more useful and more complete information about the manager's job performance. hathers judgmental performance ratings from three groups: superiors, peers, and subordinates of the individual being reviewed Ratings of the three groups are averaged separately, which provides three scores on each scale. The ratings are interpreted by a trained evaluator who then discusses the results of the surveys (hence the term feedback) with the manager. All raters are guaranteed anonymity
Organizational Citizenship Behaviors (OCBs)
behaviors individuals do at work that are not formally part of their job task behaviors but are done by the individual to assist other workers or the organization itself. OCBs are conceptually related to two other very similar concepts: prosocial behaviors and contextual performance Teaching new workers Assisting other workers Putting extra time and effort into work
Behaviorally Anchored Rating Scales (BARS)
best method judgmental measures developed to define the scale's rating points by using job behaviors as examples.* Such definitions are intended to reduce the difficulty for supervisors of consistently interpreting the performance associated with various points on the scale systematically developed by obtaining information from workers and supervisors involved with a particular job. This development starts with gathering descriptions of important, specific job behaviors that make a difference between good and poor performance. Similar behaviors are grouped into dimensions and then assigned points depending on the extent to which the job behavior is indicative of good performance. These assigned points are then used to select behaviors that serve as the scale point for rating a worker's performance on each dimension
simple behavioral scale
better method based on information about tasks determined from job analysis. The supervisor is asked to rate each subordinate on major or critical tasks of the job The number of tasks used in the evaluation differs according to the complexity of the job, but commonly the range is between 4 and 10 tasks. A supervisor scores the subordinate using a rating scale similar to that described for use with trait rating scales, that is, usually 3-point to 7-point scales using integers and adjectives Scores can be added across all task scales to produce an overall measure of job performance, or an individual task scale can be used to obtain a measure of a specific aspect of job performance. The major limitation in using this type of measure is that supervisors of the same job often disagree on what level of performance on a task is required for a specific score (for example, a score of 3 or "average")
Selection
bringing in a person using tools / analysis the process of collecting and evaluating information about an individual in order to extend an offer of employment. could be either a first position for a new employee or a different position for a current employee. The selection process is performed under legal and market constraints and addresses the future interests of the organization and of the individual should be coordinated with the activities the firm carries out under recruitment, training, compensation, and job performance review
production data
consist of the results of work. The data comprise things that can be counted, seen, and compared directly from one worker to another. Other terms that have been used to describe these data are output, objective, and nonjudgmental performance measures easy to gather obvious, easily understood direct result of job actions. They are the objectives of the work process thought to be unchallengeable and easily accepted by workers these measures are often limited and often must be corrected. Most correction factors require that a manager make a judgment about how to correct the raw data, and these judgments can vary considerably in their effects on performance measurement
reliable
consistency of a measure
interpersonal predictors
cooperativeness, sociability, and social intelligence
validity coefficient
correlates individual workers' selection test scores with the same individuals' performance scores. Therefore, if accurate individual worker data cannot be gathered, then validation is difficult to carry out and interpret
Training proficiency
error rates in training, training success
relevancy
extent to which the normed sample matches your test population (those in your jobs or your applicant pool)
selection measures to predict job performance
four general categories of characteristics: abilities, occupational interests, work values, and work styles Among the most important selection instruments are: various measures of cognitive ability personality traits integrity judgment of the best option to pursue in specific job situations. The traditional application blank that asked applicants for superficial information regarding education, work history, and previous personal behaviors has also been updated to provide information that is clearly job- and task-related
hiring vs selection
hiring = bringing in a person without much thought (ex. giving your brother a job) Selection = bringing in a person using tools / analysis (ex. matching WRCs of job to those of applicant pool to find best candidate)
empirical validation
involves calculating correlation coefficients between scores on the selection instruments (WRCs) and on the job performance measure. two types of data are collected: the scores on the selection devices from a representative sample of individuals and measures of how well each of these individuals is performing in important parts of the job.
the inference points
job > inferential leap 1 specification of job tasks > inferential leap 2 Identification of Work-Related Characteristics (WRCs) Required to Perform Job Tasks > inferential leap 3 Development of Selection Procedure Form and Content to Assess Applicants' Work-Related Characteristics (WRCs)
selection system model
job analysis > Identification of Relevant Job Performance Dimensions > Identification of Work Related Characteristics (WRCs) Necessary for Job > Development of Assessment Devices to Measure WRCs > Validation of Assessment Devices Content & Criterion > Use of Assessment Devices in the Processing of Applicants
Evidence-based management
managing by translating principles based on evidence into organizational practice means that managers should become knowledgeable about research results in specific topics of management and how this research is translated into practice
relevant job performance measures
many jobs in which individuals produce an object or meet customers - pretty easy - The objects can be counted and inspected for quality, or the customers who receive a service (for example, from a teller) can be counted and surveyed about their satisfaction with the service team-based jobs - not as direct - it is difficult to determine how much any one individual has accomplished research and development work, it may take an extended amount of time to translate an idea into a product. the best source of information about job performance is usually the judgment of the supervisor or the other work team members the information as to what constitutes successful job performance is used to help develop which WRCs are to be measured in the selection program
selection measures
means both predictors and criteria because both are used to determine which selection procedures work (and which do not) There are two requirements for choosing selection devices to be used: the device must measure the WRCs the selection specialist has identified as needed for the job it should be able to differentiate among applicants
content
measurement method all people assessed are measured by the same information or content. This includes the same formant (e.g., multiple-choice, essay) and medium (e.g., paper and pencil, computer, video). Content does not include exact same question content but the question content should be equivalent (e.g. 2 + 2 = ? and 3 + 1 = ?)
administration
measurement method information is collected the same way in all locations and across all administrations each tie the selection measure is applied Same directions, time to complete, and physical test conditions
scoring
measurement method rules for scoring are specified before administering the measure and are applied the same way for each scoring All subjective scoring requires the development of an answer key (yes, really) All subjective scoring requires training All subjective scoring should be recorded and tracked to compute and ensure reliability
standardization of selection measurement
measurement method content administration scoring Variation in Content, Administration, and/or Scoring can lead to differences in selection tool outcomes that are not related to true differences between the applicants
criterion variables
measures that indicate "employee success on the job" (p. 205) and "serve as a standard for evaluating how well predictors do the job they were intended to do" The most frequently used criterion is job performance Criterion measurement is sadly poor. Thus, inferences based about "high quality," especially validity of predictors, comes into question.
majority of HR measures
nominal or interval book says most interval
transitory
norms change over time Younger generations are more narcissistic than past generations Younger generations, globally, are also smarter
Objective production data
number of widgets made, scrap, sales completed
leniency/severity
occurs when a disproportionate number of workers receive either high or low ratings respectively. This bias is commonly attributed to distortion in the supervisor's viewpoint of what constitutes acceptable behavior
central tendency
occurs when a large number of subordinates receive ratings in the middle of the scale. Neither very good nor very poor performance is rated as often as it actually occurs
background information
often collected on the application form (chapter 9). Includes contact information, prior training and education, licenses, reference information, biographical information predictor
Job or work sample data
often used as predictors but would include the "job in miniature" (coding trials for programmers)
Big 5 personality dimensions
onscientiousness, extraversion, openness to experience, stability, and agreeableness
outcome-based control systems
outcome-based control systems (relying primarily on objective task performance to measure performance)
big data
people analytics Big companies are collecting loads of data on current employees and, through people analytics (statistics), are using that data to design new selection systems xerox - went from interviews and basic measures to using an online series of tests of personality, cognitive skill, and multiple-choice questionnaires about how the applicant would handle specific situations for selection rate of employees leaving the job fell by 20 percent and the promotion rate rose
dimensions of organizational citizenship behaviors
philip podsakoff helping behavior sportsmanship organizational loyalty organizational compliance individual initiative civic virtue self-development
sampling error
problems with sample sizes smaller than a few thousand
turnover
quitting firing furlough
halo effect
rating the subordinate equally on different performance items because of a rater's general impression of the worker. The rater does not pay specific attention to the wording of each individual scale but rather makes the rating on the basis of the general impression
validity
refers to the degree to which available evidence supports inferences made from scores on selection measures Not really a "yes" vs. "no" questions More vs. less valid Valid for what? Validity is often estimated by a correlation, 𝑟𝑥𝑦, the validity coefficient predictor (WRC/KSA) > rxy = ? > criterion (job performance)
factor analysis
relies on statistical correlational analysis. In this approach, all individual criteria measures are correlated with one another. The intercorrelation matrix is factor analyzed; this, statistically, combines these separate measures into clusters, or factors. Ideally, a majority of the separate measures would be combined into one factor that would then serve as the composite performance measure. The factor analysis procedure also provides weights that could be applied to each specific measure in forming the composite
Selection vs promotion
selection: Applicant pool is external (e.g. newly graduate students) Use formal recruiting (e.g., advertisements, employment agencies) Applicant pool is usually large (meaning selection ratios are low) Frequently use screening tools to winnow the applicant pool down (e.g., application form) Formalized tools for remaining applicants (e.g., interviews, tests) Formal decision tools for evaluating applicants. Then the job offer is extended promotion: Candidates are internal Recruitment is limited (e.g., company bulletin board, manager nomination) Few selection tools used. Lots of existing information (e.g., past work evaluations) Decision is often informal
physical abilities specifically for the dimension of physically oriented adaptability
stamina, strength, endurance, and agility
Judgement data
supervisor performance ratings
norms (normative sample, standardized groups)
the distribution of "scores of relevant others in groups" Ex: IQ tests are standardized so that the average is 100 and the standard deviation is 10 Ex: Test publishers information about the distributions of people, often grouped into jobs, who have take the test
job analysis
the gathering of information about a job in an organization ... including the tasks, outcomes produced (products or services), equipment, material used, and environment (working conditions, hazards, work schedule, and so on) that characterize the job Used to convey information about the job to potential job applicants and develop a database of information used in the rest of the selection process
dollar criterion
the logical basis of combination develop a composite that represents economic worth to the organization. This entails expressing the various job performance measures that we have discussed as a monetary amount that represents the value of worker performance to the organization Put simply, the main purpose of selection is to improve the economic position of the firm through better workers, who are identified through a valid selection program. Expressing job performance in dollars would be a direct reflection of this thinking at least three major methods are used to determine this dollar value of job performance: One method requires that job experts estimate the monetary value of various levels of job performance. A second method assumes that the dollar value of job performance is a direct function of yearly salary. The third method also uses employee salary but partitions this among a job's activities, so that each activity is assigned a proportion of that salary in accord with the job analysis results Hubert Brogden and Erwin Taylor
judgmental surveys
the most commonly used method for collecting CWB data the most commonly used method for collecting CWB data
inferences
the numbers from the test are a high quality representation of the underlying construct
measurement method
the systematic application of preestablished rules or standards for assigning scores to the attributes or traits of an individual
single vs mutliple criteria
the use of a single measure of job performance translates into viewing it as overall or global performance. The argument for using a single performance measure in validation is partially based on the fact that validation procedures become more straightforward with one criterion. This makes both the interpretation of the findings and the judgment about the adequacy of the selection program relatively simple The argument for the use of multiple criteria measures is made on the basis of research findings and logic. Essentially, this argument starts with the fact that job analysis studies identify multiple tasks within jobs. Multiple tasks are indicative of the multiple aspects of job performance. Also, studies of job performance have concluded that a global measure of performance may not reflect all of the job activities even for simple entry-level jobs
demographic group differences
there are consistent differences among demographic, ethnic, and racial groups in these WRC scores. Some groups score consistently higher than do other groups. Because scores are used to make decisions on whom to select for employment, you can guess one of the effects of making such decisions—various groups are selected at different rates Some measures show large and repeatable demographic differences. Sometimes useful, sometimes not. We need to work on this
constructs
things that are not directly observable
frame changing
to alternate between multiple ways of attending to and interpreting problems and solution strategies
learning agility
to apply lessons learned from previous experience
cognitive complexity
to consider and integrate conflicting information
resiliency
to persist and recover quickly
problem solving
to persist and work through the details of a problem
applicant pool
total number of people who have applied for an open position
judgemental instruments
trait rating scales simple behavioral scale BARS/BES
counterproductive work behavior (CWB)
undesirable performance actions that harm the organization itself and often its employees and customers A whole industry is devoted to developing, marketing, and selling selection tests, referred to as integrity tests, to organizations that will identify applicants that have a higher than normal probability of committing CWBs any intentional behavior by an organization member viewed by the organization as contrary to its legitimate interests. Behaviors such as stealing, punching a manager, and destroying property are the most commonly thought of as CWBs. CWBs often are linked to loss of property, money, reputation, customers, and suppliers and therefore are damaging and costly to the organization
expert judgement
uses judgments of job experts to form the composite. Essentially, the problem for these judges is to identify the relative weights of the specific performance aspects
standardized scores
z-scores z = (X - M)/SD z = the standard score X = an individual's raw or obtained score M = the mean (average) of the normative group's raw scores SD = the standard deviation of the normative group's raw scores Z-scores are in "standardized" format They are interpreted as "standard deviation units" away from the mean Negative scores are below the mean, positive above the mean If scores are normally distributed then we can use the cumulative standard normal table to determine the proportion of people who fall above or below the obtained z-score
Standard Error of the Estimate
𝑠𝑑𝑦.𝑥 = 𝑠𝑑𝑦[sqrt(1 − 𝑟𝑥^2y)] 𝑠𝑑𝑦.𝑥 = standard error of estimate 𝑠𝑑𝑦 = standard deviation of criterion scores on Y 𝑟𝑥𝑦 = validity coefficient for predictor X and criterion Y 𝑠𝑑𝑦 = 5 𝑟𝑥𝑦 = .52 𝑠𝑑𝑦.𝑥 = 5 * sqrt[(1 −.52^2)] = 4.27 An applicant with a predicted score of 80 then has a: 68% chance of having actual performance of 75.73 to 84.27 95% chance of having actual performance of 71.63 to 88.37