Industrial psych

Ace your homework & exams now with Quizwiz!

Jacobs et al 2011

Adverse impact is far more complicated than the uniform guidelines indicate

Aliger et al 1997

Cognitive skills training has been linked to greater skills, self-efficacy, and performance. Although research in a variety of organizational settings has demonstrated training efficacy, few studies have assessed cognitive skills training using rigorous, longitudinal, randomized trials with active controls. The present study examined cognitive skills training in a high-risk occupation by randomizing 48 platoons (N 2,432 soldiers) in basic combat training to either (a) mental skills training or (b) an active comparison condition (military history). Surveys were conducted at baseline and 3 times across the 10-week course. Multilevel mixed-effects models revealed that soldiers in the mental skills training condition reported greater use of a range of cognitive skills and increased confidence relative to those in the control condition. Soldiers in the mental skills training condition also performed better on obstacle course events, rappelling, physical fitness, and initial weapons qualification scores, although effects were generally moderated by gender and previous experience. Overall, effects were small; however, given the rigor of the design, the findings clearly contribute to the broader literature by providing supporting evidence that cognitive training skills can enhance performance in occupational and sports settings. Future research should address gender and experience to determine the need for targeting such training appropriately. A large, rigorous, longitudinal group-randomized trial with soldiers in basic combat training showed that cognitive skills training resulted in (a) greater use of a variety of cognitive skills; (b) higher levels of self-confidence at earlier phases of the training; and (c), to a modest extent, better performance relative to an active control condition. Specifically, soldiers in the mental skills training condition reported using the mental skills of self-talk, relaxation, control of negative thinking, and automaticity more than did soldiers in the military history condition, although by the end of training, ratings of self-talk and relaxation had begun to converge. Measures of self-confidence showed a similar pattern in that mental skills training led to higher self-confidence at T2 and T3 relative to military history, followed by convergence at T4. Contrary to study hypotheses, there were no significant training effects for goal setting, imagery, emotion control, and attention control. In terms of performance measures, only the slide to victory performance task demonstrated a main effect associated with mental skills training. Overall, however, the performance results indicated a more complex pattern in that both gender and previous experience interacted with training condition. Women appeared to benefit from mental skills training more than did men on the wall hanger, slide to victory, and sit-up performance tasks. In addition, women with prior weapons experience in the mental skills training showed a 13% higher performance score on basic rifle marksmanship than did women in the military history condition, although men showed improvement associated with mental skills training if they had no prior experience. The gender effect may be partially explained by a ceiling effect for men with previous weapons experience, given that 90% passed the grouping task. Although it is not clear why women benefited from training on these tasks and men did not, the results highlight the need to consider moderator effects when examining training efficacy (Aguinis & Kraiger, 2009). Interactions between previous task experience and training were more consistently found for APFT tasks, regardless of gender. Those soldiers with previous competitive sport experience performed better if they had been assigned to the mental skills training condition. This finding suggests that experience may facilitate receptivity to and use of cognitive skills for gross motor tasks like running, push-ups, or sit-ups that do not require fine motor skills and procedural instruction, as does basic rifle marksmanship. From a statistical perspective, our ability to test for effects related to basic rifle marksmanship may have also been hindered by low power. High ICC(1) values will generally reduce the power to detect effects in group-randomized trials, and basic rifle marksmanship demonstrated a high ICC(1) value (.27). Interestingly, the high ICC(1) value suggests that platoons differed significantly in terms of initial grouping scores and implies that differences in platoon-level training had a strong influence on platoon performance. Taken together, this randomized trial demonstrated small but significant effects of cognitive training principles as a performance enhancer within the context of a well-established training program (basic combat training). Our findings underscore the point that cognitive skills essential to employee training (Aguinis & Kraiger, 2009) can be effectively taught (Salas et al., 2012). Although this study did not include a cost- benefit analysis associated with training, the results demonstrated a positive training effect in a high-risk occupation, and the training can serve as a basis for improving efficacy in the future. We also believe the results make a significant contribution by adding positive (albeit weak) evidence of the efficacy of cognitive skills training from a particularly strong research design. Weak evidence from a strong design bolsters the impact of findings from studies that have found stronger effects using weaker designs: The combined evidence supports the broader idea that cognitive skills training is a useful and viable tool for enhancing performance.

Jackson et al 2016

Despite a substantial research literature on the influence of dimensions and exercises in assessment centers (ACs), the relative impact of these 2 sources of variance continues to raise uncertainties because of confounding. With confounded effects, it is not possible to establish the degree to which any 1 effect, including those related to exercises and dimensions, influences AC ratings. In the current study (N 698) we used Bayesian generalizability theory to unconfound all of the possible effects contributing to variance in AC ratings. Our results show that 1.11% of the variance in AC ratings was directly attributable to behavioral dimensions, suggesting that dimension-related effects have no practical impact on the reliability of ACs. Even when taking aggregation level into consideration, effects related to general performance and exercises accounted for almost all of the reliable variance in AC ratings. The implications of these findings for recent dimension- and exercise-based perspectives on ACs are discussed. After over 60 years (see Sakoda, 1952), the literature on ACs still sways between a focus on dimension- and a focus on exerciserelated sources as the major contributors to reliable variance in AC ratings. Scrutiny of this literature reveals confounding, which raises challenges to ascertaining which factors define reliable AC variance. Because ACs are multifaceted measures incorporating numerous different effects that could potentially influence ratings, confounding is a threat to the interpretation of findings from studies involving AC data. Capitalizing on the advantages of Bayesian generalizability theory, ours is the first known study to decompose, and thus, unconfound, all of the 29 sources of variance that could potentially contribute to variance in AC ratings. Of these 29 variance sources, two effects are relevant to reliable dimensionbased variance, three effects are relevant to reliable exercise-based variance, and one effect is akin to a general performance effect. Our proposition was that if dimension-based sources of variance contributed the majority of reliable variance in AC ratings, then the dimension perspective (e.g., Kuncel & Sackett, 2014; Meriac et al., 2014) would prevail. If, however, exercise-based sources of variance explained most of the reliable variance, then the exercise perspective would prevail (e.g., Jansen et al., 2013; Speer et al., 2014). If both dimension and exercises sources contributed meaningfully to reliable AC variance, then the mixed approach would prevail (e.g., B. J. Hoffman et al., 2011). Our research aimed to address the problem of confounding in studies of AC ratings. Our results reveal that when sources of variance in AC ratings are appropriately decomposed, and even when taking aggregation-level into consideration, dimension-based sources explain very little of the variance and have very little impact on the reliability of the ratings. In our unconfounded study, much more variance was explained by general performance and exercise-based sources. These findings call for further investigation into the primary reasons for correlations between summative scores based on AC ratings and outcomes. They suggest a challenge to the belief apparently espoused by proponents of the dimension approach, that such relationships are the result of the dimensions purportedly measured in ACs. In challenging this view, our findings also present a challenge to the mixed perspective that reliable variance results from a combination of dimension- and exercise-related variance. Our findings partly support the use of exercise- or task-based ACs; however, they also suggest that the role of general performance requires a greater emphasis and more thorough investigation.

Alder et al 2016

Despite years of research and practice, dissatisfaction with performance appraisal is at an all-time high. Organizations are contemplating changes to their performance management systems, themost controversial of which is whether to eliminate performance ratings. The pros and cons of retaining performance ratingswere the subject of a lively, standing-room-only debate at the 2015 Society for Industrial and Organizational Psychology conference in Philadelphia (Adler, 2015). Given the high interest in this topic, this article recaps the points made by the panelists who participated in the debate. The arguments for eliminating ratings include these: (a) the disappointing interventions, (b) the disagreement when multiple raters evaluate the same performance, (c) the failure to develop adequate criteria for evaluating ratings, (d) the weak relationship between the performance of ratees and the ratings they receive, (e) the conflicting purposes of performance ratings in organizations, (f) the inconsistent effects of performance feedback on subsequent performance, and (g) the weak relationship between performance rating research and practice in organizations. The arguments for retaining ratings include (a) the recognition that changing the rating process is likely to have minimal effect on the performance management process as a whole, (b) performance is always evaluated in some manner, (c) "too hard" is no excuse for industrial-organizational (I-O) psychology, (d) ratings and differentiated evaluations have many merits for improving organizations, (e) artificial tradeoffs are driving organizations to inappropriately abandon ratings, (f) the alternatives to ratings may be worse, and (g) the better questions are these: How could performance ratings be improved, and are we conducting the entire performance management process properly? The article closes with questions organizational members have found useful for driving effective performance management reform. If managers and employees engaged in effective day-to-day performance management behavior as needed, in real time, there should be less if any need for formal performance management systems, including formal performance ratings. Some managers, in fact, do this and realize high team and individual performance as a result, in spite of the formal performance management system. However, not all or even most managers regularly engage in effective performance management behavior, especially given that the formal system often gets in the way and drives ineffective performance management behavior and reactions. Although organizations can abandon ratings and even their formal performance management systems entirely, many are not ready to consider such extreme steps. Furthermore, many organizations feel the need to maintain evaluations of record, for legal and other purposes. The merits of these positions have been argued above, but what remains is a practically more important question: How can organizations make the right decisions about performance management reform, including the question of ratings, to best mitigate negative impact on effective performance management behavior and performance? The concept of performance management is squarely aimed at helping individuals and organizationsmaximize their productivity through enabling employees to perform to their potential. To achieve this, performance needs to be managed with three critical goals in mind: Enable employees to align their efforts to the organization's goals. Provide guideposts to monitor behavior and results, and make real-time adjustments to maximize performance. Help employees remove barriers to success. Each aspect of the performance management process should be designed to efficiently and directly impact one ormore of these goals. Many organizations seeking to improve their performance management approaches want to start with questions such as "should we have ratings?" or "should we use a forced distribution, and if so what should the cutoff percentage be for the lowest rating?" These are the wrong questions to ask at the start. The better questions to start with are these:What are the critical outcomes we want to achieve, and how can we best ensure employees deliver against key goals and outcomes? Framed fromthis perspective, there is no right answer to the ratings question. It really is "It depends," based on the organization's goals, strategies,maturity, trust, openness to change,management philosophy, and other contextual factors. Taking a broader and more strategic approach versus a narrow view focused simply on ratings will help keep the focus on what the performance management system needs to attain holistically. This will require answering key questions that matter most in deciding on the right performance management strategy for each situation. We close with those questions organizational members have found most useful for driving effective performance management reform.

Keith & Frese 2008 (Meta)

Error management training (EMT) is a training method that involves active exploration as well as explicit encouragement for learners to make errors during training and to learn from them. Past evaluation studies, which compared skill-based training outcomes of EMT with those of proceduralized erroravoidant training or of exploratory training without error encouragement, have yielded considerable variation in effect sizes. The present meta-analysis compiles the results of the existing studies and seeks to explain this variation. Although the mean effect of EMT across all 24 identified studies (N 2,183) was positive and significant (Cohen's d 0.44), there were several moderators. Moderator analyses showed effect sizes to be larger (a) for posttraining transfer (d 0.56) than for within-training performance and (b) for performance tasks that were structurally distinct (adaptive transfer; d 0.80) than for tasks that were similar to training (analogical transfer). In addition, both active exploration and error encouragement were identified as effective elements in EMT. Results suggest that EMT may be better suited than error-avoidant training methods for promotion of transfer to novel tasks. The present meta-analysis compiled results from 24 studies that investigated the effectiveness of EMT. These studies compared training outcomes of EMT with those of proceduralized or exploratory training methods that did not involve explicit encouragement of errors (i.e., relative effectiveness of EMT). The average effect across all studies was positive (Cohen's d 0.44), indicating that EMT leads to, on average, better training outcomes than do these alternative training methods. This result demonstrates that deliberately incorporating errors into training can be an effective means for promotion of learning—a result that is in contrast to many traditional training approaches that focus exclusively on correct behaviors and that deny any positive functions of errors during training (e.g., Bandura, 1986; Skinner, 1953). This meta-analysis further identified several moderator variables that affected the magnitude of the effect size. First, EMT appeared to be effective only when posttraining performance and not within-training performance was considered. This result is in line with training theory and research that emphasizes the distinction between within-training and posttraining transfer performance (Goodman & Wood, 2004; Hesketh, 1997; R. A. Schmidt & Bjork, 1992). From a practical perspective, this result implies that trainers should not focus on optimizing within-training performance, which may be slowed down in EMT as participants make errors, but should keep in mind that a training method can be effective despite apparently impaired initial performance, as may be the case with EMT. Also, this result underscores empirically the call for evaluation of training effectiveness, be it of EMT or of any other training method, on the basis of posttraining outcome measures rather than of performance during training itself (Hesketh, 1997; R. A. Schmidt & Bjork, 1992). Second, the present results showed EMT to be particularly effective when adaptive transfer rather than analogical transfer is involved. Thus, employing EMT to deliver training seems most useful when the major training goal is to transfer learned skills to novel problems that require the development of new solutions (i.e., adaptive transfer), for example, in situations in which the skills required on the job are too diverse to be covered completely during the allotted training time. When the training goal is to learn and to apply just one particular procedure, however, other training methods that involve direct instruction of this procedure may also be effective while being less time consuming and less effortful than EMT (see Ivancic & Hesketh, 1995/1996). Third, the present meta-analysis found significant mean effect sizes not only in studies that compared EMT with proceduralized error-avoidant training (without active exploration and without error management instructions that encourage errors) but in studies that compared EMT with exploratory training (with active exploration but without error management instructions). This finding can be interpreted to the effect that both elements of EMT— namely, active exploration and explicit encouragement of errors— are effective in EMT and that any exploratory practice should be supplemented with error management instructions, given that these simple and easy-to-administer instructions can produce significant incremental effects. It would be desirable to conduct more studies that included both proceduralized and purely exploratory training in one experimental design to further examine the feasibility of this interpretation. Finally, for one moderator, clarity of task-generated feedback, results were mixed, and it cannot be concluded that EMT was effective only when task-generated feedback was clear. This result may be due to the general usefulness of feedback for learning and performance: Although clarity of feedback may be important if EMT is to be effective, it may be just as important for the other training methods that served as comparison training conditions for EMT. In addition, the relatively low interrater reliability for the feedback variable (Cohen's .65) may have contributed to the nonconclusive findings regarding this moderator.

Levy & Williams 2004

Performance appraisal research over the last 10 years has begun to examine the effects of the social context on the appraisal process. Drawing from previous theoretical work, we developed a model of this process and conducted a systematic review of the relevant research. This review of over 300 articles suggests that as a field we have become much more cognizant of the importance of the social context within which the performance appraisal process operates. First, research has broadened the traditional conceptualization of performance appraisal effectiveness to include and emphasize ratee reactions. Second, the influence that the feedback environment or feedback culture has on performance appraisal outcomes is an especially recent focus that seems to have both theoretical and applied implications. Finally, there appears to be a reasonably large set of distal variables such as technology, HR strategies, and economic conditions that are potentially important for understanding the appraisal process, but which have received very little research attention. We believe that the focus of recent performance appraisal research has widespread implications ranging from theory development and enhancement to practical application. Our goal in completing this review was to examine the extent to which researchers have heeded Bretz et al.'s (1992) call to better understand the social context of performance appraisal. For example, research in the last few years on the feedback culture or environment has suggested completely new approaches to performance management and coaching that were not previously well established or even considered. In fact, it seems to us that we now have well-developed theoretical frameworks, measurement technologies, and some early empirical results suggesting that the dynamic nature of the feedback environment is important. A second area that has emerged as extremely important in the recent PA literature is the newer ways in which appraisal systems are evaluated. Our review of the literature and its placement in the historical context in which PAhas developed (Farr&Levy, in press) resulted in our model of Appraisal Effectiveness (Figure 2).We think this model accurately portrays theways in which the effectiveness or success of performance appraisal systems can be evaluated. Our reviewindicates that Appraisal Reactions is where there has been the most growth in the PA research since 1995 and also where practitioners see the most potential benefit. Third, it is clear that more empirical work should be conducted to better isolate and understand the various relationships discussed throughout the paper. This observation is exciting and we are hopeful that the current review will help identify and clarify new research avenues for researchers. In writing this paper we observed, that often, research areas tended to focus on either rater or ratee effects, but often neglected to examine the effects of variables simultaneously on both participants. In some instances, this singular focus makes sense (e.g., rater training). However, in other areas, this focus on either the rater or ratee seems to leave the other side of the coin unexamined. Although much has been learned in various studies focused on either rater or ratee variables, we believe that understanding the PA process would be well-served by research examining both sides of the coin simultaneously. Finally, while it appears that these initial studies are yielding useful information, it will take time to see whether these results actually benefit the practice of appraisal. Our review suggests that as a field, we seem to be moving in that direction, however, the goal should continue to be two-pronged: (1) gain a better understanding of the PA process and (2) apply that enhanced understanding to organizations so as to improve performance appraisals in use. The focus on the social context of PA has taken us down the appropriate road, but there are still many more miles to cover.

Bye & Sandal 2016

Purpose We investigated how job applicants' personalities influence perceptions of the structural and social procedural justice of group selection interviews (i.e., a group of several applicants being evaluated simultaneously). We especially addressed trait interactions between neuroticism and extraversion (the affective plane) and extraversion and agreeableness (the interpersonal plane). Design/Methodology/Approach Data on personality (preinterview) and justice perceptions (post-interview) were collected in a field study among job applicants (N = 97) attending group selection interviews for positions as teachers in a Norwegian high school. Findings Interaction effects in hierarchical regression analyses showed that perceptions of social and structural justice increased with levels of extraversion among high scorers on neuroticism. Among emotionally stable applicants, however, being introverted or extraverted did not matter to justice perceptions. Extraversion did not impact on the perception of social justice for applicants low in agreeableness. Agreeable applicants, however, experienced the group interview as more socially fair when they were also extraverted. Implications The impact of applicant personality on justice perceptions may be underestimated if traits interactions are not considered. Procedural fairness ratings for the group selection interview were high, contrary to the negative reactions predicted by other researchers. There was no indication that applicants with desirable traits (i.e., traits predictive of job performance) reacted negatively to this selection tool. Originality/Value Despite the widespread use of interviews in selection, previous studies of applicant personality and fairness reactions have not included interviews. The study demonstrates the importance of previously ignored trait interactions in understanding applicant reactions. The results showed that applicants' traits explained an important part of the variance in perceptions of the fairness of group interviews. Applicants' perceptions of social justice increased with levels of extraversion among high scorers on neuroticism. Among emotionally stable applicants, however, being introverted or extraverted did not matter to perceptions of social justice. Paralleling the findings for social justice, our results showed that among applicants high in neuroticism, being introverted was associated with lower levels of perceived structural fairness. Among emotionally stable applicants, levels of extroversion were not important to ratings of structural justice. Similarly, levels of extraversion did not impact on the perception of social justice for applicants low in agreeableness. Agreeable applicants, however, experienced the group interview as more socially fair when they were also extraverted. Contrary to our predictions, extroversion and agreeableness did not interact significantly in the prediction of structural fairness. Rather, there was a main effect of agreeableness; applicants who are trusting and flexible experienced the group interview as more structurally fair. It is especially interesting to note that none of the five traits were significant predictors of fairness perceptions in the first steps of the regression analyses. Put differently, the conditional effects were not significant, but the incremental interactive effects were. Admittedly, the confidence intervals indicate that the variance in fairness perceptions explained by the interactions may range in size from essentially zero to medium/large. On the other hand, three out of four interaction effects were strong enough to reach significance, despite our modest sample size. Thus, the main contribution of the present study lies in the inclusion of the trait interactions and the finding that three out of four interaction effects were significant. Although our study is also limited in that we only considered two specific trait interactions (i.e., neuroticism x extraversion and extraversion x agreeableness), we believe that our results provide valuable nuances to the discussion on the impact of applicant personality on fairness perceptions. For example, contrary to the results in previous studies suggesting that extraversion is unrelated to fairness perceptions (Merkulova et al. 2014; Oostrom et al. 2010; Truxillo et al. 2006), our findings suggest that extraversion does relate to fairness perceptions but at specific levels of neuroticism and agreeableness. Our study also adds to the growing literature on how personality traits interact to shape individuals' work-related behaviors more generally (Burns et al. 2014; Jensen and Patel 2011; Judge and Erez 2007; Witt et al. 2002). Contrary to our hypotheses, openness-to-experience did not predict perceptions of fairness. This is inconsistent with previous research showing relationships between openness and aspects of the perceived fairness of cognitive ability, personality, situational judgment, multiple choice, and computerized in-basket tests (Bernerth et al. 2006; Oostrom et al. 2010; Truxillo et al. 2006; Van Vianen et al. 2004; Wiechmann and Ryan 2003). One possible explanation for the inconsistency is that openness-to-experience predicts the perceived fairness of individual-based and cognitively oriented testing, but not group-based, socially oriented testing. This interpretation is consistent with the results of Merkulova et al. (2014) who did not find a relationship between openness-to-experience and reactions to an assessment center consisting of group exercises, roleplays, and oral presentations. Another, more technical explanation, concerns the measures employed in the studies. Openness-to-experience was significantly correlated with the four other traits (rs ranging from ±0.17 to 0.51) in the studies by Oostrom et al. (2010) and Truxillo et al. (2006). In Merkulova et al. (2014), openness was only significantly correlated with extraversion (r = 0.34) and in our study it only correlated with agreeableness (r = 0.23). Thus, it is possible that the correlations between openness and perceived fairness observed by Truxillo et al. (2006) and Oostrom et al. (2010) were somewhat inflated due to the overlap between openness and the other personality traits in their measures. Beyond the effective prediction of who will be a good employee, a selection procedure should not negatively affect applicants' attraction to the job or organization (Ryan and Huth 2008). Contrary to concerns raised (Tran and Blackman 2006), our results showed that on average the applicants rated the group interview as both socially and structurally fair, with means above four on a five-point scale. This demonstrates that conducting group interviews that applicants experience as fair is possible.

Wilhelmy et al 2016

To remain viable in today's highly competitive business environments, it is crucial for organizations to attract and retain top candidates. Hence, interviewers have the goal not only of identifying promising applicants but also of representing their organization. Although it has been proposed that interviewers' deliberate signaling behaviors are a key factor for attracting applicants and thus for ensuring organizations' success, no conceptual model about impression management (IM) exists from the viewpoint of the interviewer as separate from the applicant. To develop such a conceptual model on how and why interviewers use IM, our qualitative study elaborates signaling theory in the interview context by identifying the broad range of impressions that interviewers intend to create on applicants, what kinds of signals interviewers deliberately use to create their intended impressions, and what outcomes they pursue. Following a grounded theory approach, multiple raters analyzed in-depth interviews with interviewers and applicants. We also observed actual employment interviews and analyzed memos and image brochures to generate a conceptual model of interviewer IM. Results showed that the spectrum of interviewers' IM intentions goes well beyond what has been proposed in past research. Furthermore, interviewers apply a broad range of IM behaviors, including verbal and nonverbal as well as paraverbal, artifactual, and administrative behaviors. An extensive taxonomy of interviewer IM intentions, behaviors, and intended outcomes is developed, interrelationships between these elements are presented, and avenues for future research are derived. Previous research on IM in interviews has been fruitful, but this literature has lacked a conceptual model to aid in understanding how and why interviewers try to make impressions on applicants. Instead, previous work has been based on the assumption that interviewers use the same IM behaviors as applicants without acknowledging what intentions and opportunities interviewers actually have when they interact with applicants. Thus, as a response to repeated calls for research on interviewer IM (e.g., Dipboye & Johnson, 2013; Gilmore et al., 1999; Macan, 2009), our study offers a new perspective on the selection interview by systematically examining interviewer IM. Following a grounded theory approach, we identified how interviewers apply IM in terms of what they intend to signal to applicants (i.e., interviewer IM intentions) and which signals interviewers deliberately use to create their intended impressions (i.e., interviewer IM behaviors). Furthermore, we examined why interviewers apply IM in terms of the outcomes they want to achieve by deliberately sending signals to applicants (i.e., intended interviewer IM outcomes). We developed a conceptual model of interviewer IM that comprises interviewer IM intentions, behaviors, and intended outcomes, which also shows patterns of relationships among these elements. In addition to the model, we generated an extensive taxonomy of different interviewer IM intentions, behaviors, and outcomes. Specifically, we found that interviewers' primary intentions are to signal attractiveness and authenticity, while their secondary intentions are to signal closeness and distance (i.e., distance in terms of professionalism and in terms of superiority). Another finding was that interviewer IM may have different aims—aims in terms of creating a certain impression of the interviewer as a person, an impression of the job, of the team, and of the organization as a whole. In order to create these impressions on applicants, interviewers may deliberately apply a broad spectrum of signals such as verbal, nonverbal, paraverbal, artifactual, and administrative IM behaviors. Additionally, we found that interviewers use IM behaviors in order to improve a wide range of different outcomes related to recruitment, selection, and the interviewers themselves.

McCarthy et al 2017

We provide a comprehensive but critical review of research on applicant reactions to selection procedures published since 2000 (n = 145), when the last major review article on applicant reactions appeared in the Journal of Management. We start by addressing the main criticisms levied against the field to determine whether applicant reactions matter to individuals and employers ("So what?"). This is followed by a consideration of "What's new?" by conducting a comprehensive and detailed review of applicant reaction research centered upon four areas of growth: expansion of the theoretical lens, incorporation of new technology in the selection arena, internationalization of applicant reactions research, and emerging boundary conditions. Our final section focuses on "Where to next?" and offers an updated and integrated conceptual model of applicant reactions, four key challenges, and eight specific future research questions. Our conclusion is that the field demonstrates stronger research designs, with studies incorporating greater control, broader constructs, and multiple time points. There is also solid evidence that applicant reactions have significant and meaningful effects on attitudes, intentions, and behaviors. At the same time, we identify some remaining gaps in the literature and a number of critical questions that remain to be explored, particularly in light of technological and societal changes. Our review indicates that the field of applicant reactions has advanced considerably and made substantial and meaningful contributions to theory, methods, and practice of recruitment and selection over the last 15 years. First, the field boasts a broader theoretical base, which has prompted a more nuanced understanding of the mechanisms underlying applicant reactions. This is reflected in our updated conceptual framework (see Figure 1). Second, the field demonstrates stronger research designs, with studies incorporating greater control, broader constructs, and multiple time points (see Table S1 in the supplemental appendix). There is also an increased emphasis on moderators that facilitate the understanding of when and with whom applicant reactions matter. The field has expanded its international focus, with studies emerging from all corners of the world. Third, there have been significant strides in bridging the research-practice gap by providing specific practical recommendations that organizations can adopt. Researchers have also started examining interventions that can be used to leverage applicant reactions (see Table S1). Finally, the field is examining applicant reactions to new selection technologies. Moving forward, we feel that as a field it is time to significantly increase the sophistication and impact of research in this area in order to keep up with the revolutionary changes that are taking place in the way that organizations recruit and select employees. One example is the beer company Heineken and their new "Go Places" campaign. Part of the campaign includes an interactive online job interview. The purpose is to show that Heineken is a dynamic, purposeful, and fun place to work in order to attract the millennial generation (Heineken, 2016). The field of applicant reactions is well positioned for this kind of dynamic thinking, as we have a solid theoretical and empirical foundation from which to explore innovative research questions. Below, we speculate on ways in which new studies can make revolutionary advances in what we know. First, in line with Heineken's interviews, we can provide applicants with real-time feedback about exam scores. This feedback could range from simple information about whether the applicant passed or failed the exam to more comprehensive information on how the applicant scored within each section and/or question. If real-time feedback is provided, consideration of how to contextualize the feedback so that applicants understand it in a way that does no harm is also essential. For instance, telling someone that he or she is highly neurotic without any contextualization may do more harm than good. The field must also stay on the cutting edge of new proposed methods for hiring people, such as neuroscientific and biometric assessments (Tippins, 2015). Such tools could assess brain structures to determine people's levels of cognitive ability, personality types, and emotion regulation, as examples. While this may seem very futuristic, there are suggestions that it is coming soon and is even partially here (Bolton, 2015; Maffin, 2013). Importantly, these new technologies need to be assessed not only in terms of validity and ethics but also from the applicant's perspective to see whether they are perceived as fair and good. There are also important questions around how we manage the complex issue of applicant privacy that protects the employer but also provides applicants with what will seem like a fair experience. How do we work to make applicants less worried about their privacy so that they will give employers the information that they want while at the same time explaining the risks to applicants? Relatedly, how do we track the ever-fluctuating attitudes of applicants about the hiring process and tests we use? For example, when online testing first emerged, many applicants feared that their test scores would be tracked from one employer to another. Now many applicants see this as a convenience. This is a prime example of how societal norms continue to change and render research and practice surrounding applicant reactions more complicated and challenging. We suspect that the tracking of applicant attitudes, often with the aid of big data analytics as described earlier, will prove to be an important aspect for generating practical organizational solutions. Finally, it has been pointed out that the simple validity coefficient showing a linear relation between a test and job performance that was started in the early 20th century may be supplanted by the alignment of selection tests with alternative criteria, such as unit- and organization-level outcomes (Ryan & Ployhart, 2014). This raises a number of important questions, including how we explain to applicants how test scores relate to performance and other criteria, perhaps in a nonlinear fashion. Relatedly, how do we—as social scientists who are uniquely qualified in managing people's reactions—initiate meaningful collaborations with experts from relevant fields, such as engineers and mathematicians? Such collaborations could go far in advancing the way we think about validity from both the academic and applied lens. To conclude, there is a multitude of ways in which the field of applicant reactions can push the boundaries in order to advance knowledge about the nature, meaning, and implications of applicant reactions into the future. We look forward to seeing what the next 15 years bring and hope that our conceptual model serves as a valuable foundation for future work.

Giumetti et al 2015

In the current study, we examined the probability of an organization encountering an AI violation after implementing an FDRSand how this probability might change depending on several conditions: (a) the length of time under which an FDRS is carried out, (b) whether and how employees are replaced each year, (c) the size of the organization, (d) the percentage of the workforce laid off, (e) the definition of the employment action, and (f) the method of calculating AI. Results from the simulation revealed the greatest risk for an AI violation was in the first year of layoffs over a 5-year period. Results also showed that replacing layoffs each year increases the likelihood of AI violations slightly over time as compared with not replacing layoffs. Additionally, replacing employees randomly results in roughly the same likelihood of an AI violation as replacing employees using a valid selection system with no subgroup differences. This suggests that the FDRS approach may be more worrisome when used as a tool to replace subpar workers with new hires rather than in downsizing, although both uses are likely to result in AI violations Organization size was also found to impact the likelihood of violations, with larger organizations risking 6/5ths rule violations nearly 100% of the time. Generally, it appears that the 4/5ths and 6/5ths rules were largely unaffected by organization size, whereas the statistical significance tests (2 and the Z test) were affected by size. In addition, our results indicate that the probability of AI violations may not differ greatly when layoff ratios are higher (i.e., laying off 15% of the workforce, instead of 5% or 10%). Results also indicated that the definition of the employment action (i.e., whether it is classified as a retention [and the 4/5ths rule is used] or a layoff [and the 6/5ths rule is used]) had a large impact on the likelihood of an AI violation, with violations occurring much more frequently with the 6/5ths rule for layoffs than the 4/5ths rule based on retentions. The method of calculating AI was also found to impact the percentage of violations of AI, with the 6/5ths rule indicating AI most frequently, followed by the Z test, FET, chisquare, and then the 4/5ths rule. Finally, consistent with Scullen et al. (2005), workforce quality increased the most in the first year after FDRS implementation and declined thereafter. One of the most noteworthy results of this study is that the likelihood of violating any AI method is the highest in the first year of implementing an FDRS. As noted in Tables 1-3, the change in workforce quality was also highest in Year 1. Taken together, these results suggest that whereas an FDRS might improve workforce potential in the first few years after implementation, it is also most likely to create AI against protected groups during this period. The current study also sheds additional light on the use of practical significance testing in determining AI. More specifically, their use tends to reduce the likelihood of an AI flag, but the size of this reduction depends on organization size and which AI test is being examined. In small organizations, the likelihood of a 6/5ths or chi-square violation is reduced to near zero when a practical test is applied. In a medium-sized organization, the likelihood of a 6/5ths violation or a significant chi-square is reduced by 13%- 23%. And in large organizations, the likelihood of a 6/5ths rule violation or chi-square violation after applying PT4 is reduced by only 2%, whereas it is reduced greatly when the MJI is applied (54%). One possible reason for these differences is that the 4/5ths rule and 6/5ths rule are susceptible to the law of small numbers. That is, small organizations have fewer minority employees, and thus adding one to two more can make a larger difference than in a large organization. Additionally, it appears that the MJI signals AI less often than a traditional practical significance test that examines the impact of retaining one or two more minorities on the significance of the chi-square test. This is likely because the MJI involves a two-stage process that minimizes the impact of sample size on the results of significance testing. Specifically, when a significant chi-square is found, an AI flag is only signaled if the standardized difference between subgroups is greater than d .20. The results of the current simulation suggest that, particularly for larger organizations, it may be important to pair a practical significance test such as the MJI with statistical significance testing results in order to obtain a clearer indication of the extent of AI (Murphy & Jacobs, 2012).

Landers & Reddock 2017

Purpose Research examining learner control of adult webbased instruction has been inconsistent, showing both positive and negative effects on learning outcomes. In addition, the specific implementation decisions made across studies that are labeled ''learner control'' often differ dramatically. The purpose of the present study was to provide a theoretical framework by which to understand objective learner control and to empirically test it. Design/Methodology In this study, a nine-dimensional hierarchical framework of objective learner control was developed from an extensive literature review. This framework includes instructional control (skip, supplement, sequence, pace, practice, and guidance control), style control (i.e., control of aesthetic training characteristics), and scheduling control (time and location control). Hypothesized effects were tested meta-analytically. Findings Findings suggested that (1) types of learner control are almost always confounded in experimental learner control research; (2) objective learner control is not a multidimensional construct but instead of a set of related design choices; (3) across types, learner control is generally effective in skill training but varies greatly in knowledge training and in terms of reactions; and (4) sequence control is the only type that generally does not harm either learning or reactions across contexts. Implications Given the significant confounding present in most of the literature, learner control researchers are recommended to isolate specific control features. Practitioners should identify specific targeted outcomes and choose features according to those goals. Originality/Value This is the first study to propose and test a theoretically derived framework of objective learner control, providing a roadmap for research and state-of-theart practice. For training designers, the most important conclusion is that the overall effects of learner control are generally small and subtle. Implementation of learner control will not produce dramatically improved learning, and the use of the types of control most beneficial to learning may harm reactions. If assessments have been developed to be part of a training program, trainees should be required to complete them. Furthermore, there is very little literature exploring the impact of learner control on transfer, and the effects on transfer are likely to be smaller than the effects on learning. As a result of this, the current pervasiveness of learner control in training is not scientifically justified. In fact, given the results presented here, we can currently only recommend the use of sequence control. We do not recommend this because it is the most impactful but instead because it is the most consistent; it is the only dimension that has either a positive or no effect on all outcomes. A similar pattern of results was reported regarding practice control, but we felt too few studies have been conducted to be confident in its impact on reactions or skill gain. Beyond those, all other types of control studied generally bring disadvantages to either reactions or learning, and we therefore recommend practitioners carefully consider the likely effects of the control they are currently implementing or plan to implement by consulting Tables 3, 4, and 5. Types of control likely to help one outcome are often likely to harm another, so the specific goals of implementing control should be identified clearly and types of control chosen to meet those goals. For example, for those implementing knowledge training concerned about both reactions to training and knowledge gains, the incorporations of sequence, practice, and guidance control are unlikely to be harmful. Although other types of control may ultimately be demonstrated to be helpful to learning designers under certain circumstances, there is currently insufficient evidence to suggest what circumstances these might be. Pace, scheduling, and style control have not even been studied extensively enough to draw any meta-analytic conclusions about their mean effects. Although intuitively appealing, learner control is not always in the best interests of either the learner or the organization, and much additional research in organizational training settings is needed to better understand the boundary conditions of success.

Pulakos et al 2015

In spite of numerous attempts over decades to improve performance management (PM) systems, PM is viewed as more broken than ever, with managers and employees seeing it as a burdensome activity that is of little value. Yet, the behaviors that PM is meant to achieve are in fact important drivers of engagement and performance. So where is the disconnect? The problem is that formal PM systems have reduced PM to intermittent steps and processes that are disconnected from day-to-day work and behaviors that actually drive performance: communicating ongoing expectations, providing informal feedback in real time, and developing employees through experience. To deliver on its promise, PM needs to shift from focusing on the formal system to focusing on the PM behaviors that matter every day. We describe a 5- step PMreform process that helps organizations achieve this change and that shows promise for increasing satisfaction and positive outcomes fromPMprocesses.Central to the intervention is that organizationalmembers need to intentionally practice and solidify effectivePMbehavior through a structured, on-the-job, experiential learning intervention that yields meaningful behavior change. The change-management and training interventions discussed here provide a model for organizational culture and behavior change efforts beyond PM. fundamental changes in how we design and execute PM by streamlining formal PM processes and redirecting attention to critical, day-to-day PM behaviors. To provide a more concrete view of how this transformation can be accomplished, examples fromCargill's PMreformjourney were provided in which they implemented their "Everyday PM" process. The new overall approach PMdescribed here has shown promising results, at Cargill and elsewhere, especially with respect to engagement-related factors, such as satisfaction with the quality and frequency of feedback and perceptions about the value of PM. The positive comments from Cargill managers and employees shown in Table 6 show quite a contrast to the high levels of dissatisfaction and the typical negative comments that are seen in relation toPMsystems. Although longer term studies are needed to evaluate ROI, prior research has established strong linkages between the focal PMbehaviors and the performance outcomes. The core PMbehaviors discussed here are not new. However, recent research suggested that ideas of what constitutes effective PM behavior might be misinformed. For example, behavioral feedback models that train managers to describe what the employee did ineffectively and needs to improve are unlikely to be effective in many feedback situations, most of which are teachable moments. For these situations, coaching skills are more effective than "tellme" models in driving both engagement and performance.We additionally pointed out the importance of connecting people towork and clarifying their purpose and importance to drive engagement.We also provided a new framing of goal setting that enables better customization to match the work. Finally, we suggested a stronger emphasis on leveraging work itself for employee development. Thus, we are proposing a redefinition of what several of the target PM behaviors mean and a redirection of them in ways that are more likely to drive engagement and performance. The changes in mindset and behavior that are needed for PM reform are not trivial and will not happen overnight. In fact, multiple PM cycles may be needed to realize improved individual and organizational performance. However, once the right PM behaviors are solidified, they become self-perpetuating because, done right, they give people what they want from work. Implementation success relies on taking a holistic approach that considers how PMfits into the larger talent management landscape and aligning change with this bigger picture, treating PM reform as a change-management effort, and driving complex behavior change throughout the organization. It is the last piece—driving complex behavior change—that is the most difficult. The goal, however, is for managers and employees to formvery deep associations that hardwire new PM behaviors so that these become automatic and habitual. To achieve this, it is necessary to go beyond discrete training events that sit outside work tasks and instead leverage the work itself for learning, with its embedded, strong learning drivers. Although there are many types of on-the-job experiential learning (Mc- Cauley, DeRue, Yost, & Taylor, 2013), being able to drive large-scale PMbehavior change from a practical perspective requires scalable models.We offered one such approach here that builds purpose, structure, and direction into on-the-job learning of PM behavior by focusing learners on what to deliberately practice as part of their ongoing work, how to practice, how to extract learning from these experiences, and what indicators signal learning effectiveness. Although we have focused on PM reform, the proposed experiential learning techniques are applicable to any effort that necessitates complex behavior change (e.g., driving innovation, agility, collaboration, or aspects of the organization's culture). In addition, given the significant resources organizations invest in formal training, with very little evidence of ROI, the present on-the-job experiential learning model also holds significant promise for an individual contributor, for leaders, and for high potential development. With respect to the focal topic of PM reform, however, we believe it's important to stop reinventing and overinvesting in formal PM systems. Instead, these should be streamlined to the greatest possible extent, and the substantial time andresources invested in themshouldbe redirected tomore productivework. There is no best way to streamline currentPMsystems, and there are no specific PMcomponents that should always be retained or eliminated. These decisions need to be made on a case-by-case basis, and they will depend on theway thatPMsysteminformation is currently used; the value of this information; and the specific strategy, PM goals, and needs of the given organization.

Naemi et al 2015

(SJTs) to be conceptualized as measures of general domain knowledge,which the authors define as knowledge of the effectiveness of general domains such as integrity, conscientiousness, and prosocial behaviors in different jobs. This argument comes from work rooted in the use of SJTs as measures of implicit trait policies (Motowidlo & Beier, 2010; Motowidlo, Hooper, & Jackson, 2006), measured with a format described as a "single response SJT" (Kell, Motowidlo, Martin, Stotts, & Moreno, 2014; Motowidlo, Crook, Kell, &Naemi, 2009).Given evidence that SJTs can be used as measures of general domain knowledge, the focal article concludes with a suggestion that general knowledge can be measured not only by traditional text-based or paper-and pencil SJTs but also through varying alternate formats, including multimedia SJTs and interactive SJTs. We extend this point by exploring several ways this conceptualization of SJTs as measures of general domain knowledge might interact with different formats, pointing out issues and concerns across differing format types and presenting areas in need of further research. It is clear that the use of multimedia SJTs presents an important issue for consideration when casting SJTs as measures of general domain knowledge. We contend that media-rich video or virtual SJTs present a case in which the situational content of SJTs matters, as these SJT formats may compensate for the construct under representation (Messick, 1995) of text-based SJTs by measuring test takers' ability to accurately perceive situations. In comparison with text-based SJTs that frequently flatly describe the details of a given situation or scenario (e.g., "your boss angrily tosses the report into the trash can"), video and virtual SJTs require the respondent to synthesize verbal and nonverbal information and infer traits or emotional states that are expressed in the presented situation. In this way, video and virtual SJTs can be said to measure at least two constructs: situational perception and the ability to identify effective behavioral responses. The extent to which both of these constructs are incorporated as a measure of general domain knowledge is a question that deserves further exploration. Perhaps future studies can examine the degree to which latent classes emerge in response data reflecting differing perceptions of a situation and situational cues or accurate identification of effective behavioral responses to that situation. Scoring for "general domain knowledge" in this case would reflect not only accurate appraisal of the situation (possibly measured against subject matter expert ratings if rationally scored) but also accurate selection or rating of effective responses to the situation. The idea of SJTs as measures of general domain knowledge is an intriguing one for further research, and examining the way in which this idea extends across multimedia SJTs can help clarify not only implications across formats but also the construct of "general domain knowledge" itself.

McDaniel et al 2011/2015

2011 Uniform guideliens are a detriment to the field of personnel selection The primary federal regulation concerning employment testing has not been revised in over 3 decades. The regulation is substantially inconsistent with scientific knowledge and professional guidelines and practice. We summarize these inconsistencies and outline the problems faced by U.S. employers in complying with the regulations. We describe challenges associated with changing federal regulations and invite commentary as to how such changes can be implemented. We conclude that professional organizations, such as the Society for Industrial and Organizational Psychology (SIOP), should be much more active in promoting science-based federal regulation of employment practices. 2011- encouraging debate on the uniform guidelines and the disparate impact theory of discrimination This response summarizes commentaries on the M. A. McDaniel, S. Kepes, and G. C. Banks (2011) article, which argued that the Uniform Guidelines on Employee Selection Procedures are a detriment to the field of personnel selection. Several themes were present in the commentaries. No compelling arguments were presented to dispute the assertion that mean racial differences in job-related attributes will be with us for a long time. However, compelling arguments were made that the disparate impact theory of discrimination is a more central issue for personnel selection than the Uniform Guidelines. Similarly, arguments were presented that the assessment of adverse impact is problematic and that expert witness testimony needs improvement. Areas in need of further investigation were also identified. Finally, the role of the Society of Industrial and Organizational Psychology (SIOP) in guiding regulatory, legislative, and court actions was considered.

Outtz 2011

Abolishing the uniform guidelines: Be careful what you wish for Abolishing or overhauling the Uniform Guidelines would be an inappropriate objective (particularly for SIOP) for a number of reasons including the following: • The stated purpose of the Uniform Guidelines is to ''prohibit discrimination in employment practices on grounds of race, color, religion, sex, or national origin.'' Why should SIOP take the lead or participate in an effort to abolish such a document? In my opinion it would be a public relations nightmare. Many provisions of the Uniform Guidelines are unrelated to scientific research, and they should be. Therefore, why abolish or revamp the entire document? • If the Uniform Guidelines were abolished or overhauled, what would be the replacement? There is no readily available or feasible process for replacing the Uniform Guidelines (e.g., too many stakeholders with competing or conflicting interests). • Even if the Uniform Guidelines were abolished or revamped, the case law generated by them would remain. • Some provisions of the Uniform Guidelines (e.g., the search for alternatives) have been the impetus for major advances in employment selection research. Again, why abolish or revamp the entire document? The appropriate goal of a revision of the Uniform Guidelines should be to better ensure substantive, informed, unbiased evaluation of employment selection procedures. Here are several less intrusive and more feasible suggestions for achieving this goal: • SIOP should prepare position papers that address key provisions of the Uniform Guidelines that warrant more careful interpretation in light of current scientific research and standards of best practice. • SIOP should lobby federal enforcement agencies to adopt these position papers, respond to them, or produce their own. • SIOP should bemore active in employment litigation to inform courts of current scientific research and standards of accepted professional practice. The goal would be to obtain court opinions that address and clarify key technical provisions of the Uniform Guidelines.

Gonzalez-Mule et al 2014 (Meta)

Although one of the most well-established research findings in industrial- organizational psychology is that general mental ability (GMA) is a strong and generalizable predictor of job performance, this meta-analytically derived conclusion is based largely on measures of task or overall performance. The primary purpose of this study is to address a void in the research literature by conducting a meta-analysis to determine the direction and magnitude of the correlation of GMA with 2 dimensions of nontask performance: counterproductive work behaviors (CWB) and organizational citizenship behaviors (OCB). Overall, the results show that the true-score correlation between GMA and CWB is essentially 0 (.02, k 35), although rating source of CWB moderates this relationship. The true-score correlation between GMA and OCB is positive but modest in magnitude (.23, k 43). The 2nd purpose of this study is to conduct meta-analytic relative weight analyses to determine the relative importance of GMA and the five-factor model (FFM) of personality traits in predicting nontask and task performance criteria. Results indicate that, collectively, the FFM traits are substantially more important for CWB than GMA, that the FFM traits are roughly equal in importance to GMA for OCB, and that GMA is substantially more important for task and overall job performance than the FFM traits. Implications of these findings for the development of optimal selection systems and the development of comprehensive theories of job performance are discussed along with study limitation and future research directions. The purpose of this meta-analytic study was to enhance our understanding of the way GMA predicts job performance criteria by expanding the criterion space to include two nontask performance criteria: OCB and CWB. Our results show that GMA is a weak predictor of CWB and is a moderately useful predictor of OCB. Additional results also show that GMA is a less important predictor of CWB than the FFM and roughly equivalent with the FFM when predicting OCB. This finding augments the evidence that CWB and OCB are related yet distinct from each other. Overall, these results address a void in the industrial- organizational psychology literature by providing essential information about the way GMA relates to the three major domains of job performance. We hope that this information aids scholars in refining existing theories of job performance and in informing relevant personnel selection practices.

Sinaceur et al 2015

Although recently some research has been accumulated on emotional expressions in negotiations, there is little research on whether expressing sadness could have any effect in negotiations. We propose that sadness expressions can increase the expressers' ability to claim value in negotiations because they make recipients experience greater other-concern for the expresser. However, only when the social situation provides recipients with reasons to experience concern for the expresser in the first place, will recipients act on their other-concern and, eventually, concede more to a sad expresser. Three experiments tested this proposition by examining face-to-face, actual negotiations (in which participants interacted with each other). In all 3 experiments, recipients conceded more to a sad expresser when, but only when, features of the social situation provided reasons to experience other-concern for the expresser, namely (a) when recipients perceived the expresser as low power (Experiment 1), (b) when recipients anticipated a future interaction (Experiment 1), (c) when recipients construed the relationship as collaborative in nature (Experiment 2), or (d) when recipients believed that it was inappropriate to blame others (Experiment 3). All 3 experiments showed that the positive effect of sadness expression was mediated by the recipients' greater other-concern. These findings extend previous research on emotional expressions in negotiations by emphasizing a distinct psychological mechanism. Implications for our understanding of sadness, negotiations, and emotions are discussed. Three experiments involved face-to-face, actual interactions and tested the proposition that sadness expression would increase value claiming in negotiation when, but only when, features of the social situation provides reasons to experience concern for the expresser. Experiment 1 examined structural features of the social situation, whereas Experiments 2 to 3 examined relationship-related and normative features of the social situation. More specific, we examined four features of the social situation that might provide justifications to experience other-concern for a sad expresser, namely when recipients (a) perceived the opponent as low-power; (b) anticipated a future interaction; (c) construed the relationship as collaborative in nature; or (d) construed blame as inappropriate. Further, all three experiments showed the mediating role of concern for the expresser. In this way, the current research departs from prior research on emotional expressions in negotiation by emphasizing the importance of the recipients' noninstrumental reactions (rather than more strategic, informational inferences; Kopelman et al., 2006; Van Kleef, 2014; Van Kleef et al., 2010; Van Kleef & Sinaceur, 2013). Even in mixed-motive interactions such as negotiations, expressing sadness can be an effective strategy to appeal to the other party.

Meinecke et al 2017

Despite a wealth of research on antecedents and outcomes of annual appraisal interviews, the ingredients that make for a successful communication process within the interview itself remain unclear. This study takes a communication approach to highlight leader-follower dynamics in annual appraisal interviews. We integrate relational leadership theory and recent findings on leader-follower interactions to argue (a) how supervisors' task- and relation-oriented statements can elicit employee involvement during the interview process and (b) how these communication patterns affect both supervisors' and employees' perceptions of the interview. Moreover, we explore (c) how supervisor behavior is contingent upon employee contributions to the appraisal interview. We audiotaped 48 actual annual appraisal interviews between supervisors and their employees. Adopting a multimethod approach, we used quantitative interaction coding (N 32,791 behavioral events) as well as qualitative open-axial coding to explore communication patterns among supervisors and their employees. Lag sequential analysis revealed that supervisors' relation-oriented statements triggered active employee contributions and vice versa. These relation-activation patterns were linked to higher interview success ratings by both supervisors and employees. Moreover, our qualitative findings highlight employee disagreement as a crucial form of active employee contributions during appraisal interviews. We distinguish what employees disagreed about, how the disagreement was enacted, and how supervisors responded to it. Overall employee disagreement was negatively related to ratings of supervisor support. We discuss theoretical implications for performance appraisal and leadership theory and derive practical recommendations for promoting employee involvement during appraisal interviews. Using a blend of quantitative interaction coding, qualitative open-axial coding, and survey methodology in a field study design, our study generated several findings. In focusing on the functionality of different communicative behaviors, our multimethod approach highlighted emergent leader-follower interaction patterns while accounting for the temporal nature of the communication data. In line with our predictions, lag sequential findings revealed the role of relation-oriented supervisor communication for initiating active employee involvement in the interview process. Active employee contributions, in turn, were linked to (even more) relation-oriented communication by supervisors. Task-oriented supervisor statements, on the other hand, led to passive employee agreement. Moreover, perceptions of the interview were not related to isolated counts of supervisor or employee behavior, but rather to patterns of interaction, in terms of sequences of relationoriented supervisor behavior followed by active employee contributions. Specifically, both supervisors and employees perceived interviews as more successful when they contained more patterns of relation-orientation and subsequent active employee contributions. Our qualitative analysis further revealed that employee disagreement— even though infrequent—was meaningfully related to perceptions of supervisor support. Findings suggest that the way employee disagreement was enacted is more important in shaping supervisor responses than the actual content of the disagreement.

Bindl et al 2018

Employees often self-initiate changes to their jobs, a process referred to as job crafting, yet we know little about why and how they initiate such changes. In this paper, we introduce and test an extended framework for job crafting, incorporating individuals' needs and regulatory focus. Our theoretical model posits that individual needs provide employees with the motivation to engage in distinct job-crafting strategies—task, relationship, skill, and cognitive crafting—and that work-related regulatory focus will be associated with promotion- or prevention-oriented forms of these strategies. Across three independent studies and using distinct research designs (Study 1: N 421 employees; Study 2: N 144, using experience sampling data; Study 3: N 388, using a lagged study design), our findings suggest that distinct job-crafting strategies, and their promotion- and prevention-oriented forms, can be meaningfully distinguished and that individual needs (for autonomy, competence, and relatedness) at work differentially shape job-crafting strategies. We also find that promotion- and prevention-oriented forms of job-crafting vary in their relationship with innovative work performance, and we find partial support for work-related regulatory focus strengthening the indirect effect of individual needs on innovative work performance via corresponding forms of job crafting. Our findings suggest that both individual needs and work-related regulatory focus are related to why and how employees will choose to craft their jobs, as well as to the consequences job crafting will have in organizations. Although job crafting is vital to modern workplaces (Bakker et al., 2012; Leana et al., 2009; Wrzesniewski et al., 2013), previous research has provided only limited insights into why and how employees craft their jobs. Several frameworks have incorporated individual needs as motivators, but without specifying which particular need drives which types of crafting, and therefore, these frameworks only serve as a coarse guide as to what motivates job crafting. Furthermore, researchers have primarily focused on promotion-oriented forms of job crafting whereby employees seek to add to existing domains in the job, with very little attention given to the prevention-oriented forms, thus failing to reveal more nuanced means of crafting one's job. In this paper, we developed an extended framework for job crafting, drawing from regulatory focus theory (Higgins et al., 2001), to include both promotion- and prevention-oriented forms of four distinct types of job crafting (task, relationship, skill, and cognitive crafting). We developed a measure for this framework and tested a theoretically derived model linking different individual needs with specific types, and work-related regulatory focus (promotion- vs. prevention) with different forms, of job crafting. Finally, we showed that different forms of job crafting were differentially associated with overall innovative work performance. Based on our investigations across the three independent studies in this paper, we provide an answer for the why and how of individuals' engagement in job crafting. Below we describe how our findings inform both theory and practice.

Ones et al 2012

Examination of the Van Iddekinge, Roth, Raymark, and Odle-Dusseau (2012) meta-analysis reveals a number of problems. They meta-analyzed a partial database of integrity test validities. An examination of their coded database revealed that measures coded as integrity tests and meta-analyzed as such often included scales that are not in fact integrity tests. In addition, there were important deficiencies in their analytic approach relating to application of range restriction corrections and identification of moderators. We found the absence of fully hierarchical moderator analyses to be a serious weakness. We also explain why empirical comparisons between test publishers versus non-publishers cannot unambiguously lead to inferences of bias, as alternate explanations are possible, even likely. In light of the problems identified, it appears that the conclusions about integrity test validity drawn by Van Iddekinge et al. cannot be considered accurate or reliable. Our analysis of the Van Iddekinge et al. (2012) meta-analysis focused on the database, coding, and analytic approaches utilized. Given space constraints, our comment could only address some of the shortcomings. Van Iddekinge et al. meta-analyzed only a partial database of integrity test validities. Van Iddekinge et al. miscoded some data and included misleading data. When a metaanalysis includes erroneous input, its conclusions are suspect. Focusing only on the job performance criterion, there were 20 post-1993 studies that contributed 26 validity estimates to the analyses, constituting over one third of the data. As we showed in this comment, for most, there were serious coding and analysis errors. Van Iddekinge et al.'s (2012) meta-analysis also suffers from significant analytic problems, especially in identifying moderators. Their article is not, as they claim, an "updated" meta-analysis. Rather, it can be characterized as a flawed summary of the validities of integrity tests combined with non-integrity test validities. The methodological and measurement problems discussed here undermine the credibility of results, inferences, and conclusions. The cumulative evidence on integrity tests comprehensively summarized in meta-analyses elsewhere (e.g., Ones et al., 1993) shows that they predict a variety of criteria well. Operational validities of integrity tests for overall job performance are among the highest encountered from non-cognitive predictor domains (Ones, Viswesvaran, & Dilchert, 2005) and second only to cognitive ability tests (Schmidt & Hunter, 1998). In this comment, we highlighted some of the troubling issues that cast doubt on Van Iddekinge et al.'s (2012) results and conclusions. In light of these problems, the statements about integrity test validity by Van Iddekinge et al. cannot be considered accurate or reliable and should not be used to evaluate the utility of integrity tests for employee selection.

Dahlke, Sackett, & Kuncel 2019

Examines range restriction and form of criternion contammination (individual differences in course-taking patterns) on validity of SAT scores. We found that college performance is considerably more predictable than is suggested by college GPA's correlations with SAT scores after accounting for range restriction and differences in course-taking patterns that contribute to the noncomparability of GPAs. Whereas previous research found that SAT scores were less valid predictors of college academic performance for Black and Hispanic students than for White students, we found that validity differences for predicting first-year college performance from SAT scores were not significant after controlling for differential coursetaking and correcting for subgroup-specific range restriction. When predicting 4-year cumulative performance after making these corrections, however, we did detect significant White-Black validity differences. Overall, our results indicate that the effects of criterion contamination and differential range restriction are important for understanding differential validity. We emphasize the importance of attending to sources of artifactual variation that could be contributing to observed differences in validities among subgroups in future research on differential validity in school and work settings.

Parker, Andrei, Van DenBroeck

Few studies have systematically considered how individuals design work. In a replication study (N = 211, Study 1), we showed that students naturally tend to develop simplified, low variety work. In two further simulation studies, we quantitatively assessed participants' work design behaviors via two new measures ("enriching task allocation", "enriching work strategy selection"). As a comparison measure, we assessed individuals' tendency to choose individualistic rather than work design strategies ("person-focused strategy selection"). We then investigated how work design behaviors are affected by capacity (professional expertise, explicit knowledge, job autonomy) and willingness (life values). For a sample of human service professionals (N = 218, Study 2), participants scored higher on enriching task allocation and enriching work strategy selection if they had expertise as an industrial/organizational psychologist and if they had high autonomy in their own job. Explicit knowledge about work design predicted lower scores on person-focused strategy selection, and mediated the effects of professional expertise on this outcome. Individuals high in openness values scored higher on enriching work strategy selection, and those high in conservation values scored lower on enriching task allocation. These findings were replicated in Study 3 among working professionals (N = 602). We then showed that openness to change values predicted enriching work strategy selection via the more proximal processes of valence (valuing intrinsic work characteristics) and affect (positive affect when enriching others' work). This article opens up a new area of inquiry: how and why individuals design work for others in the way they do. Our initial descriptive study showed that 'naïve' work designers (students) continue to have a tendency to rely on a functional approach that leads to simplified jobs. A strikingly low proportion of the sample (less than three percent) reported giving any attention to human or psychological issues such as satisfaction or motivation when making their work design choices. These findings echo the conclusion of Campion and Stevens (1991, p. 188) that "a mechanistic or work simplification approach may be the most natural or predisposed orientation for untrained individuals". Even for a sample of professionals within the human services, almost one third allocated mostly repetitive, low autonomy tasks to an already simplified job. It seems that understanding how to design work that is motivating and healthy does not necessarily come naturally, which might be one contributing factor to the continued rather high proportion of low quality psychosocial work design in contemporary work places, despite much evidence as to their negative human and organizational effects.

Lievens et al 2015

In assessment centers (ACs), research on eliciting candidate behavior and evaluating candidate behavior have largely followed independent paths. This study integrates trait activation and trait rating models to posit hypotheses about the effects of behavior elicitation via situational cues on key assessor observation and rating variables. To test the hypotheses, a series of experimental and field studies are conducted. Only when trait-expressive behavior activation and evaluation models work in conjunction, increases in observability are coupled with increases in the interrater reliability, convergent validity, discriminant validity, and accuracy of AC ratings. Implications of these findings for AC theory and practice are formulated This study examined the interplay between behavior elicitation and evaluation. The results obtained across different exercises, settings, and studies add to our conceptual understanding of behavioral assessment and have several implications for improving current AC practice. This study presents an integrative framework that simultaneously considered behavior elicitation and assessor rating issues. Both experimental and field studies demonstrate the importance of the interplay between behavior elicitation and evaluation via situational cues in order to improve the quality of AC ratings. At a practical level, this study's recommendation for better aligning stimulus development with rating systems has implications for the design of exercises, assessor training, and rating instruments. These theoretical and practical implications should inspire both researchers and practitioners to work together in developing theory-driven strategies that further improve the domain of behavioral assessment.

Weng et al 2018

In recent years, situational judgment tests (SJTs) have made strong inroads in assessment practices. Despite the importance of scoring for the validity of SJTs, little attention has been paid to different SJT scoring methods. This study investigated the influence of scoring methods on the criterion-related validity of SJTs. We examined five different consensus scoring methods (i.e., raw, standardized, dichotomous, mode, and proportion scoring) and several integrated scoring methods for scoring the same SJT. Results showed that one of the most popular scoring approaches (raw consensus scoring) is associated with an extreme response tendency and yields the lowest scale validity of all scoring approaches examined. Moreover, the mean item validity of midrange items was good only when they were scored by the mode consensus method. Thus, this study extends previous work (McDaniel et al., 2011) by deepening our understanding of how different scoring methods improve the validities of SJTs. Our findings suggest that using scoring methods that control the influence of extreme response tendency on the scores of SJTs yields higher validities. Finally, this study is the first to suggest that scoring SJTs with integrated methods yielded higher mean item validities than using any single method. Our study introduces various alternatives to the SJT scoring literature, namely two scoring methods (i.e., mode and proportion consensus) and one integrated scoring approach. It also compares the criterion-related validity of the two introduced scoring methods with raw consensus, standardized, and dichotomous consensus scoring, and tests the performance of an integrated scoring strategy. We go beyond previous studies not only by demonstrating the mechanism as to how raw scoring method yields low criterion-related validity, but also by showing the advantages of both alternative scoring methods and an integrated scoring approach. Specifically, this study provides evidence suggesting that the scoring methods that fail to rule out the effect of extreme response tendencies lead to low criterion-related validity, which may lead to unfair and unintended discrimination effects against Blacks who tend to use more extreme responses in SJTs. Moreover, two newly introduced methods and the integrated scoring strategy yield higher scale validities than other typical SJT scoring methods. Thus, our findings offer meaningful theoretical and practical implications for understanding the effectiveness of scoring methods of SJTs as key assessment instruments in the context of career decision making and counseling. We hope that our results stimulate more interest in these important issues.

Levashina et al 2014

In the 20 years since frameworks of employment interview structure have been developed, a considerable body of empirical research has accumulated.We summarize and critically examine this literature by focusing on the 8 main topics that have been the focus of attention: (a) the definition of structure; (b) reducing bias through structure; (c) impression management in structured interviews; (d) measuring personality via structured interviews; (e) comparing situational versus past-behavior questions; (f) developing rating scales; (g) probing, follow-up, prompting, and elaboration on questions; and (h) reactions to structure. For each topic, we review and critique research and identify promising directions for future research. When possible, we augment the traditional narrative review with meta-analytic review and content analysis. We concluded that much is known about structured interviews, but there are still many unanswered questions. We provide 12 propositions and 19 research questions to stimulate further research on this important topic. Within each section above we have summarized recent findings and discussed recommendations for future research in many promising areas. Although much is known about structured interviews, there are many unanswered questions. Table 6 summarizes some of the major findings and areas needing additional studies identified in our review. Structured employment interviews are an important area of research because they are more valid than unstructured interviews, can improve decision making, and they are widely used in practice. In addition, they are easy to use, the techniques arewell known, and they are simple and low-cost to implement. We hope that our review stimulates further research on this important topic.

Bracken et al 2016

In the 25+ years that the practice of 360° Feedback has been formally labeled and implemented, it has undergonemany changes. Some of these have been positive (evolution) in advancing theory, research, and practice, and others less so (devolution). In this article we offer a new definition of 360° Feedback, summarize its history, discuss significant research and practice trends, and offer suggestions for all user communities (i.e., researchers, practitioners, and end users in organizations)moving forward. Our purpose is to bring new structure, discussion, and some degree of closure to key open issues in this important and enduring area of practice. In the conclusions section of their powerful article, London et al. (1997) provided a list of signs that multisource (their term) feedback has become part of an organization's culture. Given the date of the article, it is amazingly prescient, aspirational, and discouraging at the same time: a) Collect ratings at regular intervals; b) use feedback to evaluate individuals and make organizational decisions about them; c) provide feedback that is accompanied by (norms); d) encourage or require raters, as a group, to offer ratees constructive, specific suggestions for improvement; e) encourage ratees to share their feedback and development plans with others; f) provide ratees with resources . . . to promote behavior change; g) are integrated into a human resource system that selects, develops, sets goals, appraises, and rewards the same set of behaviors and performance dimensions . . . ; and h) track results over time. (London et al., 1997, p. 181) Here, almost 20 years later, we would be hard pressed to come up with a list of any significant length of organizations that have accomplished these things for a sustainable period (though a few do exist). Maybe some of you will say that it is because it is just not a good idea or practical.Maybe there are success stories of which we are not aware.Maybe we are not trying hard enough.

Derfler-Rozin et al 2018

In this paper, we explore referral-based hiring practices and show how a referrer's power (relative to the hiring manager) influences other organizational members' support (or lack thereof) for who is hired, through perceptions of the hiring manager's motives and morality. We apply principles derived from the literature on attribution of motives to research on relational power to delineate a model that explains employees' moral evaluations of and reactions to referral practices based on the power relationship between a referrer and a hiring manager. Specifically, we predict that employees are more likely to see the acceptance of a referral from a higher- (as opposed to a lower-) power referrer as a way for the hiring manager to gain more power in the relationship with the referrer, thereby attributing more self-interested motives and more counterorganizational motives to the hiring manager in such situations. These motives are then associated with harsher moral judgments of the hiring manager, which in turn lead to less support for the hiring decision. We find support for our model in two experimental studies and two field studies. We discuss implications for the literature on referral practices, ethics, and observers' reactions to power dynamics. This paper uses power and ethics lenses to examine a common and important organizational phenomenon: employees' acceptance of referral practices. Consistent with the ethics literature, we document different employees' reactions to the same outcome/ act of hiring, depending on different moral attributions employeesmake.Becausemoral attributions are malleable, and may or may not be grounded in reality—that is, the hiring manager may or may not have acted out of self-interest or counter to the organization's benefit—it behooves us to examine such attributions to better inform organizations on how to leverage the benefits of referral practices without paying unnecessary costs associated with the use of these practices.Giventheprevalence of referralpractices,we hope this research will inspire other management scholars to examine additional micro-mechanisms that affect employees' support for such practices.

Van Iddekinge et al 2012 (Meta)

Integrity tests have become a prominent predictor within the selection literature over the past few decades. However, some researchers have expressed concerns about the criterion-related validity evidence for such tests because of a perceived lack of methodological rigor within this literature, as well as a heavy reliance on unpublished data from test publishers. In response to these concerns, we metaanalyzed 104 studies (representing 134 independent samples), which were authored by a similar proportion of test publishers and non-publishers, whose conduct was consistent with professional standards for test validation, and whose results were relevant to the validity of integrity-specific scales for predicting individual work behavior. Overall mean observed validity estimates and validity estimates corrected for unreliability in the criterion (respectively) were .12 and .15 for job performance, .13 and .16 for training performance, .26 and .32 for counterproductive work behavior, and .07 and .09 for turnover. Although data on restriction of range were sparse, illustrative corrections for indirect range restriction did increase validities slightly (e.g., from .15 to .18 for job performance). Several variables appeared to moderate relations between integrity tests and the criteria. For example, corrected validities for job performance criteria were larger when based on studies authored by integrity test publishers (.27) than when based on studies from non-publishers (.12). In addition, corrected validities for counterproductive work behavior criteria were larger when based on self-reports (.42) than when based on other-reports (.11) or employee records (.15). The goal of this study was to provide an updated understanding of the criterion-related validity of integrity tests. Overall, the results reinforce some of the concerns that have been raised about the general quality of studies that comprise the integrity test literature and the validity evidence based upon this research. Indeed, when we estimate validity on the basis of studies whose conduct is consistent with professional standards for test validation, and whose results focus on the validity of integrity tests for predicting individual work behavior, the validity evidence appears to be somewhat less optimistic than that suggested by earlier reviews. With the notable exception of self-report CWB criteria, most of the corrected validity estimates for integrity tests are smaller than .20, and many estimates are closer to .10. Thus, although integrity tests yield small subgroup differences and low correlations with cognitive ability, the present results suggest the criterion-related validity of these tests generally is quite modest. We hope our findings may be informative to researchers and practitioners who wish to consider integrity tests for research purposes or personnel selection.

Buehl et al 2018

Interviews are a prevalent technique for selection and admission purposes. However, interviews are also viewed as potentially fakeable, raising the question of whether interviewees' faking behavior impairs the quality of selection decisions. To address these concerns, our study examined whether interviewees can actually improve their interview score by faking and the role that interviewee ability factors play in interview faking.We also explored the effect of faking on criterion-related validity with regard to successfully predicting interviewees' task and contextual performance.We conducted simulated interviews in an honest and an applicant instruction condition using a within-subjects design. In line with our hypotheses, interviewees were able to improve their interview scores when asked to respond as an applicant. The size of the improvement of these interview scores correlated with interviewees' cognitive ability and their ability to identify the targeted interview dimensions. Concerning the effects of faking on criterion-related validity, we found that academic performance was better predicted in the applicant instruction condition whereas contextual performance was better predicted in the honest condition. Thus, it appears that claims that Bfaking impairs criterion-related validity^ are too simplified and that we have to consider the kind of criterion predicted. The aim of our study was to shed light on the relationship between faking in interviews and criterion-related validity. Our results revealed that it is possible to improve interview scores by faking. Our results also showed that faking can have positive effects on criterion-related validity in terms of predicting task performance but not with regard to predicting contextual performance. These findings, therefore, reveal that the question whether faking impairs criterion-related validity cannot be answered with a simple yes or no but that we need to differentiate between the facets of performance that we want to predict.

Dierdorff & Jensen 2018

Job crafting theory purports that the consequences of revising one's work role can be simultaneously beneficial and detrimental. Previous research, however, has almost exclusively emphasized the beneficial outcomes of job crafting. In the current study, we proposed dysfunctional consequences of crafting for performance-related outcomes in the form of a U-shaped relationship between job crafting and performance effectiveness (managerial ratings of job proficiency and peer ratings of citizenship behavior). We further predicted that elements of the task context (autonomy and ambiguity) and the social context (interdependence and social support) moderate these curvilinear relationships. Consistent with previous research, job crafting displayed positive and linear effects on work-related attitudes (job satisfaction and affective commitment). Consistent with our predictions, moderate levels of crafting were associated with dysfunctional performance-related outcomes and features of work context either exacerbated or dissipated these dysfunctional consequences of job crafting for individuals. We sought to bring to bear much needed empirical evidence to one of the central postulations of job crafting theory. Taken collectively, our results reveal a more nuanced and complex picture of the consequences of job crafting than has been depicted in prior scholarship. As theorized in the seminal work by Wrzesniewski and Dutton (2001), our study confirms that job crafting simultaneously holds both functional and dysfunctional outcomes for those who engage in such idiosyncratic and agentic revisions to role enactment. Further still, job crafting is clearly a contextually embedded phenomenon and our results show that task and social features of the environment in which individuals work will impinge upon the relationships between job crafting and the consequences such actions.

Klein et al 2015

Legal and fairness concerns necessitate that organizations consider group mean-score differences on assessment tools used at various stages of the human resources process. Too often, research and applied investigations fail to comprehensively examine potential group differences on our most important predictor tools. Cognitive ability tests are widely used in making personnel decisions, and thus it is important for human resource professionals to be aware of age differences on these measures. As employees remain in the workforce longer, it is important to be cognizant of how selection systems may impact older workers. This study found that older executives performed slightly worse on tests of GMA and figural reasoning, and generally much worse on tests of inductive reasoning, which assess fluid intelligence. Older executives do seem to have an advantage, however, when it comes to some tests of verbal ability, a type of crystallized intelligence. This is in contrast to Avolio and Waldman's (1994) finding with respect to measures of verbal ability as measured by the GATB, a finding that does not appear to generalize to managerial and executive assessment. Including a measure of verbal ability in cognitive ability composites should help organizations reduce the risk of creating adverse impact against older individuals, and is particularly relevant for high-complexity jobs and positions for which such applicants are more common. Even when overall scores on cognitive ability test batteries are used in making personnel decisions, awareness of the intricate patterns of age differences on the specific tests constituting the composite is crucial to responsibly estimate adverse impact potential with the goal of avoiding age discrimination.

Bernerth et al 2012

Many organizations use credit scores as an employment screening tool, but little is known about the legitimacy of such practices. To address this important gap, the reported research conceptualized credit scores as a biographical measure of financial responsibility and investigated dispositional antecedents and performance-related outcomes. Using personality data collected from employees, objective credit scores obtained from the Fair Isaac Corporation, and performance data provided by supervisors, we found conscientiousness to be positively related and agreeableness to be negatively related to credit scores. Results also indicate significant relationships between credit scores and task performance and organizational citizenship behaviors. Credit scores did not, however, predict workplace deviance. Implications for organizations currently using or planning to use credit scores as part of the screening process are discussed. By conceptualizing credit scores as a biographical measure of financial responsibility, this study provides insights into a measure that has been traditionally viewed as little more than a simple and accessible metric. We investigated dispositional antecedents and work-related outcomes that organizations typically try to assess and predict during the employment screening process. Results linking personality to credit scores and credit scores to supervisorrated performance are particularly important for organizations that have used, are currently using, or are considering using credit scores for screening purposes. Ultimately, these organizations will be compelled to present evidence in favor of such practices, and we hope this research provides the foundation to what may be a long and rich exploration in years to come.

Kuncel & Sacket 2014

Ongoing concern about the construct validity of assessment center dimensions has focused on postexercise dimension ratings (PEDRs) that are consistently found to reflect exercise variance to a greater degree than dimension variance. Here, we present a solution to this problem. Based on the argument that PEDRs are an intermediate step toward an overall dimension rating, and that the overall dimension rating should be the focus of inquiry, we demonstrate that correlated sources of dimension variance accumulate and increasingly displace uncorrelated sources of both systematic variance and error. Viewing overall dimension ratings as a composite of PEDRs, we show dimension variance will commonly quickly overtake exercise-specific variance as the dominant source of variance as ratings from multiple exercises are combined. We embed our results in a new framework for categorizing different levels of construct variance dominance, and our results indicate that with as few as two exercises, dimension variance can reach our lowest level of construct variance dominance. However, the largest source of dimension variance is a general factor. We conclude that the construct validity problem in assessments centers never existed as historically framed, but the presence of a general factor may limit interpretation for developmental purposes. Here we have argued and presented evidence that the concept of composites and accumulated shared variance resolves the long enduring controversy about construct validity for assessment centers. In individual PEDRs, the predominant pattern is that exercisespecific is the largest source of variance. However, when aggregating ratings of a given dimension to form an overall dimension rating, the role of exercise-specific variance quickly diminishes. Although the various sources of assessment center data that we presented differ in the degree to which dimension factors grow in importance as rating are aggregated across exercises, and in the extent to which dimensions-general or exercise-general factors capture an overall performance factor, all of the data sources consistently show this pattern: once one shifts ones focus from an individual PEDR to an overall dimension rating, exercise-specific variance no longer dominates. Dimension variance commonly very rapidly accumulates to become the largest source of variance and, in some scenarios, reaches what we termed the full dominance of variance. However, the results also reveal that although dimension specific variance accumulates, it does so at a much slower pace than dimension general variance. In many centers, final assessment center dimension ratings are dominated by dimension variance, but this source of variance is general across dimensions, making both intra-individual and inter-individual dimension comparisons problematic unless enough exercises are used to ensure that exercise specific is solidly the second largest source of variance. The primary message is that construct validity questions are best focused on the overall ratings used as the basis for organizational action and for feedback, rather than on postexercise dimension ratings.

Salas et al 2012 (READ)

Organizations in the United States alone spend billions on training each year. These training and development activities allow organizations to adapt, compete, excel, innovate, produce, be safe, improve service, and reach goals. Training has successfully been used to reduce errors in such high-risk settings as emergency rooms, aviation, and the military. However, training is also important in more conventional organizations. These organizations understand that training helps them to remain competitive by continually educating their workforce. They understand that investing in their employees yields greater results. However, training is not as intuitive as it may seem. There is a science of training that shows that there is a right way and a wrong way to design, deliver, and implement a training program. The research on training clearly shows two things: (a) training works, and (b) the way training is designed, delivered, and implemented matters. This article aims to explain why training is important and how to use training appropriately. Using the training literature as a guide, we explain what training is, why it is important, and provide recommendations for implementing a training program in an organization. In particular, we argue that training is a systematic process, and we explain what matters before, during, and after training. Steps to take at each of these three time periods are listed and described and are summarized in a checklist for ease of use. We conclude with a discussion of implications for both leaders and policymakers and an exploration of issues that may come up when deciding to implement a training program. Furthermore, we include key questions that executives and policymakers should ask about the design, delivery, or implementation of a training program. Finally, we consider future research that is important in this area, including some still unanswered questions and room for development in this evolving field. Training research has come a long way. Today it is empirical in nature, and theoretically based. Moreover, it is grounded in the science of learning, has been applied to training in a variety of settings and populations, and has spawned innovative strategies and techniques. Training is now viewed as a system that is essential to promote learning and enhance on-the-job performance. It is not just an event that occurs in a classroom. We hope that training research can increasingly inform and guide the design of effective training. And so we conclude as we began and note again that well designed training works, and that what the organization does around it matters.

McLarty & Whitman 2016

Purpose Drawing from core self-evaluations (CSE) theory, we argue and demonstrate that disposition plays an important role in explaining the way job applicants respond to testing procedures in the selection process. We demonstrate that CSE predicts job candidate reapplication intentions, acceptance intentions, and recommendation intentions—even after controlling for test performance. Moreover, we show that CSE moderates the relationship between perceived fairness and applicant behavioral intentions. Design/Methodology/Approach Drawing from a sample of 194 applicants for the position of police officer, this research uses data at four different time periods to explain the impact that applicant CSE has on outcomes in a highstakes (i.e., civil service) testing environment. Findings Our results indicate that behavioral intentions resulting from selection processes are attributable at least in part to applicant CSE and that self-serving attributions are not the only relevant driving factor. We also show that CSE influences the relationship between perceptions of fairness and behavioral intentions. Implications Theoretically, this manuscript explains why and shows how CSE is a driving force behind intention formation. This research provides practitioners with insight to the formation of applicant reactions and intentions showing that important perceptions about the organization can be impacted by CSE. We also demonstrate that CSE impacts selection test performance. Originality/Value This is the first study to examine the impact of CSE on applicant responses related to the formation of organizationally relevant outcomes As opposed to many studies that have taken a situational approach to understand how applicants react to the selection process, this study examines these issues from a dispositional perspective. Specifically, we found that CSE has a positive influence on behavioral intentions—even after applicants found out about their selection test performance and whether or not they would be offered the position. Additionally, we show that low levels of CSE strengthen the positive relationship between fairness perceptions and behavioral intentions. By theoretically linking disposition and behavioral intentions, this study adds to both the applicant reactions and CSE literatures and helps provide a better understanding of the impact that personality has in the hiring process.

Schmidt et al 2015

Purpose Grounded in person-environment fit theory, this field experiment was designed to test the effects of job advertisements emphasizing information about demands- abilities (D-A) or needs-supplies (N-S) fit on the size and quality of the applicant pool. The wording used in 56 actual job ads was manipulated to emphasize D-A or N-S fit, and data were collected about application behavior and applicant quality based on ratings of the resumes submitted by 991 applicants. Other study hypotheses were tested using survey data collected from a subsample (n = 91). Findings Job ads emphasizing N-S fit, rather than D-A fit, elicited more applications (relative to job ad views) and a higher quality applicant pool. Analyses of survey data provided support for mediated and moderated effects that provide insight into how and for whom N-S fit information in job ads is ultimately linked to greater attraction. Implications The findings indicate that recruiting organizations can craft job ads to emphasize specific types of fit and favorably affect applicants' perceived fit, attraction, and application behavior, as well as the quality of the applicant pool. Originality/value This study is one of only a few field experiments containing manipulations of the content of job ads in the recruitment literature. The distinction between two important fit constructs that have received surprisingly little empirical attention in recruitment contexts was found to have effects on application behavior and applicant quality—two critically important, yet rarely examined outcomes. We conducted a field experiment to test hypotheses derived from theory on P-E fit about the effects of manipulating the wording in job ads to emphasize either D-A or N-S fit. We found support for the two primary study hypotheses about the effects of the experimental manipulation on two critically important recruitment outcomes: application behavior and the quality of the applicant pool based on the ratings of resumes submitted by 991 applicants. Other hypotheses received support from the analyses of survey data collected from a subsample of applicants, providing insight into the process through which information about N-S fit in a job ad is ultimately associated with higher perceptions of N-S fit and applicant attraction. Support was also found for a boundary condition for the effect of N-S fit perceptions on attraction, which was stronger among applicants who had higher perceived marketability. We discuss the implications of these and other findings for recruitment theory and practice, as well as study limitations and directions for future research.

Russel et al 2017

Purpose Our objective was to generate, define, and evaluate behavioral dimensions of ethical performance at work that are common across United States occupations. Design/Methodology/Approach This project involved three studies. Study 1 involved (a) qualitative review of published literature, professional codes of ethics, and critical incidents of (un)ethical performance and resulted in (b) behavioral dimensions and ethical performance rating scales. The second and third studies used a retranslation methodology to evaluate the ethical performance dimensions from Study 1. The behavioral dimensions were linked to the performance determinants (personal attributes) in Study 3. Findings Study 1 resulted in draft dimension definitions and rating scales for 10 ethical performance dimensions. In Studies 2 and 3, retranslation data provided strong support for 10 behavioral dimensions of ethical performance at work. Results from Study 3 shed light on possible relationships among the performance dimensions based on their underlying performance determinants. Implications Communicating an organization's ethical standards to employees is important because some ethical breakdowns can be attributed to simply failing to recognize an ethical matter (in: DeCremer, Managerial ethics: Managing the psychology of morality, Routledge, New York, 2011). Definitions of ethical behavior in the workplace provide a tool for researchers, employers, and employees to communicate about ethical situations and a foundation for folding ethics into employee training and performance management. Originality/Value These studies provide a taxonomy of ethical performance at work that generalizes to a diverse array of occupations and industries, and dimensions and rating scales have value for performance management, training/curriculum development, job analysis, predictor development and/or validation, and additional research. These studies break new ground by developing a taxonomy of ethical performance at work that generalizes well to a diverse array of occupations and industries, and moving forward, can serve as a foundation upon which to develop theoretically grounded assessments. Moreover, our use of comprehensive qualitative reviews coupled with quantitative evaluation represents a comprehensive approach to taxonomy development.

Cerasoli et al 2018 (Meta)

Purpose Over the past two decades, research has shown a growing consensus that 70% to 90% of organizational learning occurs not through formal training but informally, on-thejob, and in an ongoing manner. Despite this emerging consensus, primary data on the nature and correlates of informal learning remains sparse. The purpose of this study was to provide an integrative definition of informal learning behaviors (ILBs) and to synthesize existing primary data through meta-analysis to explore ILB correlates. Design/Methodology/Approach Given that there has been little systematic treatment of ILBs, we defined their construct domain and tested relationships suggested by our research questions with antecedents (personal factors, situational factors) and outcomes (attitudes, knowledge/skill acquisition, performance) using random effects meta-analyses (k = 49, N = 55,514). Findings Our results showed both personal and situational antecedent factors to be predictive of ILBs, as well as ILB- outcome relationships. Implications Findings indicate that engagement in ILBs for working adults is linked to valued criteria such as attitudes (ρ = .29), knowledge/skill acquisition (ρ = .41), and performance (ρ = .42). We provide suggestions for future research and actionable advice for organizations to support the development of ILBs. Originality/Value Although hundreds of studies and over a dozen meta-analyses have explored the nature and effectiveness of formal learning in the workplace, our work is the first attempt to conceptualize a unified definition of ILBs and to aggregate Despite the acknowledgment that a great deal of organizational learning occurs outside of formal training, companies have devoted relatively little attention and few resources towards discovering ways to foster IL in working adults (Berg & Chyung, 2008). Given the effects observed in the current study, we echo the sentiments expressed by Marsick et al. (1999) that B...informal learning from experience cannot be left completely to chance...^ (pp. 93-94). Fortunately, we also identified several factors that organizations can leverage to promote and institutionalize informal learning. To support this organizational need, research should continue to explore and clarify the antecedents and consequences of ILBs. This review and meta-analysis helped highlight what is currently known about ILBs and raised some additional questions. We encourage researchers to explore these questions in future research.

Stoughton et al 2015

Purpose Social networking websites such as Facebook allow employers to gain information about applicants which job seekers may not otherwise share during the hiring process. This multi-study investigation examined how job seekers react to this screening practice. Design/Methodology Study 1 (N = 175) employed a realistic selection scenario examining applicant reactions to prospective employers reviewing their social networking website. Study 2 (N = 208) employed a simulated selection scenario where participants rated their experience with a proposed selection process. Findings In Study 1, social networking website screening caused applicants to feel their privacy had been invaded, which ultimately resulted in lower organizational attraction. Applicants low in agreeableness had the most adverse reactions to social networking website screening. In Study 2, screening again caused applicants to feel their privacy had been invaded, resulting in lower organizational attraction and increased intentions to litigate. The organization's positive/negative hiring decision did not moderate the relationship between screening and justice. Implications The results suggest organizations should consider the costs and benefits of social media screening which could reduce the attractiveness of the organization. Additionally, applicants may need to change their conceptualization of social networking websites, viewing them through the eyes of a prospective employer. Originality/Value This investigation proposed and tested an explanatory model of the effects of screening practices on organizational outcomes demonstrating how electronic monitoring, privacy, and applicant reactions can be integrated to better understand responses to technological innovations in the workplace. This investigation provides an initial examination of the effects of social networking website screening on job applicants' perceptions. Based on anecdotal accounts in the popular press (Goldberg 2010; Levinson 2011; McNichol 2010), this practice is quite common. This examination applies theory from the electronic performance monitoring and privacy literature to the selection context by proposing a model of job applicant reactions to employers' use of social networking sites for screening purposes (see Fig. 1), in addition to investigating the conditions under which these relationships may be altered (e.g., individual differences, hiring decision, different screening methods).

Feiler & Powel 2016

Purpose The aim of this study was to investigate (a) the behavioral cues that are displayed by, and trait judgments formed about, anxious interviewees, and (b) why anxious interviewees receive lower interview performance ratings. The Behavioral Expression of Interview Anxiety Model was created as a conceptual framework to explore these relations. Design/Methodology/Approach We videotaped and transcribed mock job interviews, obtained ratings of interview anxiety and interview performance, and trained raters to assess several verbal and nonverbal cues and trait judgments. Findings The results indicated that few behavioral cues, but several traits were related to interviewee and interviewer ratings of interview anxiety. Two factors emerged from our factor analysis on the trait judgments—Assertiveness and Interpersonal Warmth. Mediation analyses were performed and indicated that Assertiveness and Interpersonal Warmth mediated the relation between interview anxiety and interview performance. Speech rate (words spoken per minute) and Assertiveness were found to mediate the relation between interviewee and interviewer ratings of interview anxiety. Implications Overall, the results indicated that interviewees should focus less on their nervous tics and more on the broader impressions that they convey. Our findings indicate that anxious interviewees may want to focus on how assertive and interpersonally warm they appear to interviewers. Originality/Value To our knowledge, this is the first study to use a validated interview anxiety measure to examine behavioral cues and traits exhibited by anxious interviewees. We offer new insight into why anxious interviewees receive lower interview performance ratings.

Bauer et al 2016 (Meta)

Purpose The extant research has not been consistent in the way motivation is conceptualized and measured in learning contexts, with prior research utilizing five different types of motivation derived from three theoretical frameworks—self-determination theory, expectancy theory, and the expectancy-value model. The purpose of the present study was to examine whether type of motivation impacts the motivation-training outcome relationships. Design/Methodology We conducted a meta-analysis investigating the impact of motivation type (i.e., intrinsic motivation, motivation to learn, motivation to transfer, expectancy motivation, and task value) on four training outcomes. The review of the literature yielded 136 independent samples and a total of 25,012 trainees. Relative weights analysis was also used. Findings Results suggest that all types of motivation had stronger relationships with trainee reactions than with declarative knowledge, initial skill acquisition, or transfer. Yet, there was variability in the strength of the motivation- training outcome relationships across motivation type. Implications We recommend that motivation to learn be used to predict trainee reactions, declarative knowledge, and initial skill acquisition; motivation to transfer should be measured when predicting distal post-training outcomes (i.e., transfer of training). Although this recommendation may seem intuitive, clearly prior research/practice has used other motivation types in the prediction of these training outcomes. Accordingly, we advise that measures of motivation to learn and motivation to transfer be used more uniformly. Originality/Value This is the first study to metaanalytically test whether the relationship between motivation and training outcomes varies based on the type of motivation utilized. The extant research has not been consistent in the way motivation is conceptualized and measured in learning contexts, with prior research utilizing five different types of motivation derived from three theoretical frameworks. The present meta-analysis examined whether the type of motivation influences the strength of the effect of motivation on four key training outcomes. Our findings suggest that when it comes to assessing and enhancing the various outcomes, not all motivation types are equivalent. First, we suggest that a single measure of motivation may not be adequate when multiple training outcomes are of interest. Second, we recommend that professional trainers and researchers focus their efforts on motivation to learn when reactions and/or learning outcomes are of interest, and on motivation to transfer for maximizing the outcome of transfer. In sum, we suggest that by introducing strategies designed to increase students' motivation to learn and motivation to transfer throughout the learning process, training practitioners can help facilitate positive learning outcomes and the actual transfer of critical skills into the workplace, leading to beneficial outcomes for individuals and organizations alike.

Ellington & Wilson 2016

Purpose The purpose of this study was to take an inductive approach in examining the extent to which organizational contexts represent significant sources of variance in supervisor performance ratings, and to explore various factors that may explain contextual rating variability. Design/Methodology/Approach Using archival field performance rating data from a large state law enforcement organization, we used a multilevel modeling approach to partition the variance in ratings due to ratees, raters, as well as rating contexts. Findings Results suggest that much of what may often be interpreted as idiosyncratic rater variance, may actually reflect systematic rating variability across contexts. In addition, performance-related and non-performance factors including contextual rating tendencies accounted for significant rating variability. Implications Supervisor ratings represent the most common approach for measuring job performance, and understanding the nature and sources of rating variability is important for research and practice. Given the many uses of performance rating data, our findings suggest that continuing to identify contextual sources of variability is particularly important for addressing criterion problems, and improving ratings as a form of performance measurement. Originality/Value Numerous performance appraisal models suggest the importance of context; however, previous research had not partitioned the variance in supervisor ratings due to omnibus context effects in organizational settings. The use of a multilevel modeling approach allowed the examination of contextual influences, while controlling for ratee and rater characteristics. Employee performance ratings are at least in part dependent on the supervisor/rater who produces them, as well as the work context in which they are produced. Given the many issues associated with ratings, there is currently a debate as to whether ratings should be abandoned altogether, or if continued efforts should be made to improve upon them as a component of performance management (Adler et al., in press). Going forward, it remains to be seen as to whether the latter goal can be achieved; however, if future efforts are to be made toward improving ratings, we believe that continuing to identify contextual influences in appraisal is a worthy endeavor. Although many questions remain regarding the nature of contextual rating variability, this line of research (among others) may help to better understand and hence improve ratings as a form of performance measurement.

Edelman & van Knippenberg 2017

Purpose The purpose of this study was to test whether we could train the regulation of affective displays of leaders in terms of the emotion regulation strategy of deep acting (displaying feelings one also experiences) and display of positive affect. We also tested whether this resulted in improved leadership effectiveness (i.e., a mediation model in which the training results in greater leadership effectiveness through improved emotion regulation). Design/Methodology/Approach Data were obtained from a field experiment. We randomly assigned N = 31 leaders (rated by N = 60 subordinates) to a control group without training or an experimental group with emotion regulation training. Before and 2 weeks after the intervention, deep acting (leader-rated) and positive affective displays and leadership effectiveness (subordinate-rated) were assessed. Findings The training had positive effects on deep acting, positive affective displays, and leadership effectiveness. Deep acting and positive affect mediated the relationship between the intervention and leadership effectiveness. Implications We discuss how this helps build the case both for an emotional labor approach to leadership and for the leadership development potential of such an emotional labor approach. Originality/Value The findings of this study represent the first causal evidence that leader emotion regulation can be trained, improved emotion regulation results in greater leadership effectiveness and is one of the first empirical studies that integrates emotional labor theory to leadership effectiveness. It is therefore important from a theory development perspective. Our findings add important causal evidence to the case for an emotional labor perspective on leadership. They also provide first evidence that leader emotion regulation can successfully be included in leadership training and development. Our study thus extends an invitation to leadership researchers as well as to practitioners in leadership education, training, and development to further develop the emotional labor perspective in leadership research and education.

Volpone et al 2015

Purpose The use of credit checks or credit scores in personnel selection has received widespread media attention of late. Though there is speculation that basing hiring decisions (even partially) on credit-related variables may produce or increase adverse impact, virtually no empirical literature exists to support or refute this claim. The present study explores the impact of using credit scores, in the context of a larger selection system, on adverse impact. Design/Methodology/Approach We conducted Monte Carlo simulations representing various real-world selection systems (i.e., multiple hurdle, multiple hurdle with cut-off score, single hurdle). In addition to applicant credit scores, each simulation included variables that organizations commonly use during selection (i.e., educational background, personality). Findings Results showed that in a majority of simulated hiring scenarios, using credit scores (as opposed to a random, race-neutral variable) widened the Black-White gap in hiring, producing more violations of the 4/5ths rule and statistically significant adverse impact. Implications These results imply that organizations should be cautious when using credit scores to evaluate potential or current employees for jobs. Originality/Value This is one of the first studies to provide empirical evidence of a relationship between credit scores in selection and adverse impact. The use of simulations helps organizations be proactive in regards to choosing selection practices. Our results in particular pinpoint the situations where implementing credit scores as part of a larger selection process might be most problematic in terms of adverse impact, thereby providing much needed guidance to those considering credit scores for their selection processes. In sum, despite the methodological limitations, there are several important conclusions that can be drawn from this study. Specifically, the results suggest that (a) when using simulated credit scores, fewer Black applicants are hired across nearly all scenarios compared to when simulated credit scores are not used, (b) this difference in hiring rates of Blacks when simulated credit scores are used resulted in more adverse impact as compared to a random variable (with no Black-White difference) being used, and (c) multiple hurdle systems that used a cut-score demonstrated lower levels of adverse impact as compared with multiple hurdle systems that used a top-down approach, but adverse impact rates were still meaningfully larger regardless of how simulated credit scores were used. As such, organizations should exercise caution when using credit scores during the hiring process due to concerns surrounding adverse impact.

Ingold et al 2015

Purpose This study aimed at shedding light on why situational interviews (SIs) predict job performance. We examined an explanation based upon the importance of interviewees' Ability to Identify Criteria (ATIC, i.e., to read the targeted interview dimensions) for SI performance. Design/Methodology/Approach Data were obtained from 97 interviewees who participated in a mock interview to train for future applications. This approach enabled us to conduct the SIs under standardized conditions, to assess interviewees' ATIC, and at the same time, to collect job performance data from interviewee's current supervisors. Findings We found that interviewees' ATIC scores were not only positively related to their interview performance, but also predicted job performance as rated by their supervisors. Furthermore, controlling for interviewees ATIC significantly lowered the relationship between performance in the SI and job performance. Implications Better understanding of the mechanisms that underlie the criterion-related validity of SIs is crucial for theoretical progress and improving personnel selection procedures. This study highlights the relevance of interviewees' ATIC for predicting job performance. It also underscores the importance of constructing interviews to enable candidates to show their criterion-relevant abilities. Originality/Value This study shows that interviewees' ATIC contributes to a better understanding of why the SI predicts job performance. In sum, the present study contributes to understanding the mechanisms by which the SI predicts job performance and highlights the insights that can be gained from research that focuses on factors contributing to interviewees' performance and job performance. The results support the idea that interviewees' ATIC measured in the SI predicts job performance, and that ATIC also helps explain the criterion- related validity of the SI. We look forward to future research on the nomological network of ATIC, interviewee- related factors, and research that takes an interactionist perspective on interviews to extend our understanding of the criterion-related validity of employment interviews and other selection procedures.

Waung et al 2017

Purpose This study contributes to the ecological validity of resume research by systematically examining the impression management (IM) content of actual resumes and cover letters and empirically testing the effect on applicant evaluation. Design/methodology/approach A content analysis of the frequency and intensity of IM tactic use in 60 resumes and cover letters was completed (Study 1). Next, an experiment was conducted in which IM tactic use was manipulated and the effect on applicant evaluation examined, using a sample of MTurk workers as evaluators (Study 2). Findings In Study 1, four self-promotion categories, three ingratiation categories, and one hybrid category were delineated. In Study 2, ingratiation and lower intensity selfpromotion were found to increase perceptions of job and organization fit. Implications Employers should be aware that resumes and cover letters contain IM tactics that may influence applicant evaluation. In addition, employment training programs might communicate the benefits of using ingratiation and lower intensity self-promotion, while emphasizing the importance of accurately conveying one's qualifications. Furthermore, the present taxonomy of IM resume content might be applied to resume database search engines to identify and index IM tactic use. Originality/value This research is the first to develop a taxonomy of IM tactics based on actual resumes and cover letters and may facilitate more comprehensive manipulations of IM tactic use and better integration of IM research across the selection process.

Padgett et al 2015

Purpose This study tested competing predictions about the impact of nepotistic hiring on perceptions of nepotism beneficiaries, focusing specifically on the performance attributions made about nepotism hires. Of particular interest is how the qualifications of the family member compared to other applicants impacts perceptions of the nepotism hire. Methodology Two experimental studies, using scenarios that simulated the hiring process, were conducted. Participants reviewed materials describing the hiring process for a manager and then completed a questionnaire assessing their perceptions of the person hired. Findings Results showed that successful performance of nepotism beneficiaries was attributed more to political skills and relationships with upper management and less to ability and effort than was the case for non-beneficiaries and that they were perceived as less competent and as having fewer characteristics of successful managers. These negative perceptions occurred regardless of the family member's qualifications. Originality/Value: This is one of the first studies to examine the consequences of nepotistic hiring for nepotism beneficiaries and the first study to examine how nepotistic hiring effects the performance attributions made about nepotism beneficiaries. It is also the only study to empirically examine how the qualifications of the nepotism beneficiary influence others' reactions to them. Despite these limitations, the results of these two studies suggest that giving preference in the hiring process to family members may have some unanticipated negative consequences for those who benefit from it and, indirectly, for the organization. These findings indicate a need to better understand the practice of nepotism so that familyowned firms and other organizations who might want to reap some of the potential benefits of hiring family members can be aware of both the positive and negative consequences associated with this practice and can manage the process effectively.

Lacerenza et al 2017

Recent estimates suggest that although a majority of funds in organizational training budgets tend to be allocated to leadership training (Ho, 2016; O'Leonard, 2014), only a small minority of organizations believe their leadership training programs are highly effective (Schwartz, Bersin, & Pelster, 2014), calling into question the effectiveness of current leadership development initiatives. To help address this issue, this meta-analysis estimates the extent to which leadership training is effective and identifies the conditions under which these programs are most effective. In doing so, we estimate the effectiveness of leadership training across four criteria (reactions, learning, transfer, and results; Kirkpatrick, 1959) using only employee data and we examine 15 moderators of training design and delivery to determine which elements are associated with the most effective leadership training interventions. Data from 335 independent samples suggest that leadership training is substantially more effective than previously thought, leading to improvements in reactions (.63), learning (.73), transfer (.82), and results ( .72), the strength of these effects differs based on various design, delivery, and implementation characteristics. Moderator analyses support the use of needs analysis, feedback, multiple delivery methods (especially practice), spaced training sessions, a location that is on-site, and face-to-face delivery that is not self-administered. Results also suggest that the content of training, attendance policy, and duration influence the effectiveness of the training program. Practical implications for training development and theoretical implications for leadership and training literatures are discussed. The current meta-analysis offers several contributions to the leadership and training literatures. First, our results suggest that leadership training is substantially more effective than previously thought, leading to improvements in perceptions of utility and satisfaction, learning, transfer to the job, organizational outcomes, and subordinate outcomes. Moreover, all but seven of the 120 effect sizes were positive and significantly different from zero, indicating that leadership training likely improves outcomes, regardless of its design, delivery, and implementation elements (i.e., leadership training is rarely a "failure;" cf. Myatt, 2012). Second, the current results suggest that leadership training is most effective when the training program is based on a needs analysis, incorporates feedback, uses multiple delivery methods (especially practice), uses spaced training sessions, is conducted at a location that is on-site, and uses face-to-face delivery that is not selfadministered. Third, our results also have a variety of practical implications for the development of training programs, which we have summarized in Table 8 and provided examples of in Table 9 in order to guide scientists and practitioners in the development of evidence-based leadership training programs. Finally, we note that although the current meta-analysis suggests leadership training is effective, it does not promote a one-size fits all approach; many of the moderators of leadership training effectiveness investigated in the current study were important for some criteria but not all, indicating that training program developers should first choose their desired criterion (or criteria) and then develop the training program based on this criterion.

Sacket et al 2017

Separate meta-analyses of the cognitive ability and assessment center (AC) literatures report higher criterion-related validity for cognitive ability tests in predicting job performance. We instead focus on 17 samples in which both AC and ability scores are obtained for the same examinees and used to predict the same criterion. Thus, we control for differences in job type and in criteria that may have affected prior conclusions. In contrast to Schmidt and Hunter's (1998) meta-analysis, reporting mean validity of .51 for ability and .37 for ACs, we found using random-effects models mean validity of .22 for ability and .44 for ACs using comparable corrections for range restriction and measurement error in the criterion. We posit that 2 factors contribute to the differences in findings: (a) ACs being used on populations already restricted on cognitive ability and (b) the use of less cognitively loaded criteria in AC validation research. Our findings are in dramatic contrast to prior meta-analytic conclusions about the relative relationships with job performance criteria for ability tests and ACs. Schmidt and Hunter (1998) reported mean criterion-related validity of .51 for ability and .37 for ACs. In contrast, we found mean corrected criterion-related validity of .22 for ability and .44 for ACs. Note that our corrected value of .44 for ACs is reasonably similar to the .37 value reported in Schmidt and Hunter (1998). This suggests that the subset of AC validity studies in which a head-to-head comparison with ability tests is possible is roughly comparable to the broader AC literature. Thus, differences in findings are largely due to differences in how ability tests function in the current versus prior work.

Christian et al 2010 (Meta)

Situational judgment tests (SJTs) are a measurement method that may be designed to assess a variety of constructs. Nevertheless, many studies fail to report the constructs measured by the situational judgment tests in the extant literature. Consequently, a construct-level focus in the situational judgment test literature is lacking, and researchers and practitioners know little about the specific constructs typically measured. Our objective was to extend the efforts of previous researchers (e.g., McDaniel, Hartman, Whetzel, & Grubb, 2007; McDaniel & Ngyuen, 2001; Schmitt&Chan, 2006) by highlighting the need for a construct focus in situational judgment test research.We identified and classified the construct domains assessed by situational judgment tests in the literature into a content-based typology.We then conducted a meta-analysis to determine the criterion-related validity of each construct domain and to test for moderators.We found that situational judgment tests most often assess leadership and interpersonal skills and those situational judgment tests measuring teamwork skills and leadership have relatively high validities for overall job performance. Although based on a small number of studies, we found evidence that (a) matching the predictor constructs with criterion facets improved criterion-related validity; and (b) videobased situational judgment tests tended to have stronger criterion-related validity than pencil-and-paper situational judgment tests, holding constructs constant. Implications for practice and research are discussed. In conclusion, we have highlighted the importance of a construct-based focus in SJT research. We urge researchers to present results at the construct level when possible (Arthur & Villado, 2008). Such information, as noted by Huffcutt et al. (2001), Arthur et al. (2003), and Roth et al. (2008) in their similar request with regard to interviews, assessment centers, and work samples, will provide future researchers and practitioners with better conceptual, theoretical, and practical understanding of SJTs.

Whetzel & Reeder 2015

Situational judgment tests (SJTs) occasionally fail to predict job performance in criterion-related validation studies, often despite much effort to follow scholarly recipes for their development. This commentary provides some plausible explanations for why this may occur as well as some tips for SJT development. In most cases, we frame the issue from an implicit trait policy (ITP) perspective (Motowidlo, Hooper, & Jackson, 2006a, 2006b) and the measurement of general domain knowledge. In other instances, we believe that the issue does nothave a direct tie tothe ITPconcept,but our experience suggests that the issue is of sufficient importance to include in this response. The first two issues involve challenges gathering validity evidence to support the use of SJTs, and the remaining issues deal more directly with SJT design considerations.

Roth et al 2013

Social media (SM) pervades our society. One rapidly growing application of SM is its use in personnel decision making. Organizations are increasingly searching SM (e.g., Facebook) to gather information about potential employees. In this article, we suggest that organizational practice has outpaced the scientific study of SM assessments in an area that has important consequences for individuals (e.g., being selected for work), organizations (e.g., successfully predicting job performance or withdrawal), and society (e.g., consequent adverse impact/diversity). We draw on theory and research from various literatures to advance a research agenda that addresses this gap between practice and research. Overall, we believe this is a somewhat rare moment in the human resources literature when a new class of selection methods arrives on the scene, and we urge researchers to help understand the implications of using SM assessments for personnel decisions. SM is a relatively new technology that has enabled organizations to use, or contemplate using, a new predictor; that is, assessments of individuals based on their SM information. As such, it is a unique and important occasion for organizational decision makers and researchers. The rapid increase in the use of SM information raises the question of whether organizations are able to mine a wealth of potentially useful information from this new technology. Or, the rush might result in "ghost towns" (of former users) with many complications. Unfortunately, there does not appear to be an agenda for research in this area. While the gold rush is likely to continue, it is up to researchers to answer key research questions and understand the process and results of SM assessments. We hope some of the ideas discussed herein help provide an operational and theoretical starting point for future research and understanding.

Call et al 2015

Stars—employees with disproportionately high and prolonged (a) performance, (b) visibility, and (c) relevant social capital— have garnered attention in economics, sociology, and management. However, star research is often isolated within these research disciplines. Thus, 3 distinct star research streams are evolving, each disconnected from the others and each bringing siloed theoretical perspectives, terms, and assumptions. A conceptual review of these perspectives reveals a focus on the expost effects that stars exert in organizations with little explanation of who a star is and how one becomes a star. To synthesize the stars literature across these 3 disciplines, we apply psychological theories, specifically motivation theories, to create an integrative framework for stars research. Thus, we present a unified stars definition and extend theory on the making, managing, and mobility of stars. We extend research about how and why employees may be motivated to become stars, how best to manage stars and their relationships with colleagues, and how to motivate star retention. We then outline directions for future research. Stars research has developed within traditional siloed research domains leading to poor construct clarity and inconsistent findings. The disconnected state of stars research also creates confusion regarding what we know, what we do not know, and what research opportunities exist. To address this confusion, we integrate stars research across economics, sociology, and management literatures. We extend knowledge about stars by applying individual motivation theories to create a typology and framework for stars research. Specifically, we present a unified star definition and develop arguments explaining the making, managing, and mobility of stars, thus, providing a platform for future research. We see this as an initial step in an interdisciplinary discourse that seeks to understand how star employees emerge and develop as well as to describe the various ways that stars influence, and are influenced by, their colleagues and organizations. We encourage further engaging in active stargazing so that we may better understand and capitalize on these shining examples.

Grand 2017

Stereotype threat describes a situation in which individuals are faced with the risk of upholding a negative stereotype about their subgroup based on their actions. Empirical work in this area has primarily examined the impact of negative stereotypes on performance for threatened individuals. However, this body of research seldom acknowledges that performance is a function of learning—which may also be impaired by pervasive group stereotypes. This study presents evidence from a 3-day self-guided training program demonstrating that stereotype threat impairs acquisition of cognitive learning outcomes for females facing a negative group stereotype. Using hierarchical Bayesian modeling, results revealed that stereotyped females demonstrated poorer declarative knowledge acquisition, spent less time reflecting on learning activities, and developed less efficiently organized knowledge structures compared with females in a control condition. Findings from a Bayesian mediation model also suggested that despite stereotyped individuals "working harder" to perform well, their underachievement was largely attributable to failures in learning to "work smarter." Building upon these empirical results, a computational model and computer simulation is also presented to demonstrate the practical significance of stereotype-induced impairments to learning on the development of an organization's human capital resources and capabilities. The simulation results show that even the presence of small effects of stereotype threat during learning/training have the potential to exert a significant negative impact on an organization's performance potential. Implications for future research and practice examining stereotype threat during learning are discussed. This research demonstrated that the presence of negative domain stereotypes interfered with the acquisition of cognitive learning outcomes and engagement during training activities for individuals facing a negative group stereotype. Results from a computer simulation further revealed that such learning deficiencies have the potential to accumulate over time and across organizational levels to generate substantial human capital deficiencies. Investigating the effects of ST on learning/training carries a number of conceptual and practical implications, including a better understanding of how ST experienced during performance differs from or is similar to ST experienced during learning, the need to evaluate training practices/procedures to ensure individuals learn and engage information in stereotyped content domains, and further explication of the boundary conditions, cognitive mechanisms, and consequences of ST beyond performance/ evaluative testing contexts. Finally, this research reveals the importance of exploring how ST effects manifest over time and in context rather than simply whether such effects can be elicited. Investigating the impact of ST effects on both shortand long-term longitudinal outcomes is a critical need for continued research in this area.

Hunt 2016

The company I work for is one of the leading providers of performance management technology (Jones & Wang-Audia, 2013). This technology is used by more than 3,000 organizations worldwide, including several of the companies mentioned in Adler et al. (2016). The technology is highly configurable. It is currently being used to support performance management processes with no annualmanager ratings, processes with traditional annual rating evaluations, processes that only evaluate competencies, processes that only evaluate goal accomplishment, processes that mix goals and competencies, processes that require forced-ranked comparisons between employees, processes that make no direct comparisons between employees, and much more. The capabilities of this and other human resources (HR) technology systems are allowing companies to radically rethink performance management because they enable companies to do things far differently from what was possible when they were constrained to more fixed electronic or paper forms (Hunt, 2011, 2015a). The result is an explosion in the diversity of approaches being taken toward performance management design. My company naturally believes in the value of using performance management technology, but we do not have a strong opinion on what sort of performance management process companies should use. For example, it does not matter to us whether customers do or do not choose to collect annual manager ratings. What does matter is that whatever performance management processes they use add value to their organization, as this directly affects the value they get from using our performance management technology. We frequently have discussions with our clients about the value and design of performance rating methods. This often includes questions about getting rid of performance ratings. Experience working with hundreds of companies around the world has taught us there is no one "best practice" when it comes to performance ratings or performance management in general (Hunt, 2015a).Methods that work in one company can fail in another. The following are some additional observations we've gained through our work that are relevant to the topic of "getting rid of performance ratings."

Madera et al 2018

The extent of gender bias in academia continues to be an object of inquiry, and recent research has begun to examine the particular gender biases emblematic in letters of recommendations. This current two-part study examines differences in the number of doubt raisers that are written in 624 authentic letters of recommendations for 174 men and women applying for eight assistant professor positions (study 1) and the impact of these doubt raisers on 305 university professors who provided evaluations of recommendation letters (study 2). The results show that both male and female recommenders use more doubt raisers in letters of recommendations for women compared to men and that the presence of certain types of doubt raisers in letters of recommendations results in negative outcomes for both genders. Since doubt raisers are more frequent in letters for women than men, women are at a disadvantage relative to men in their applications for academic positions. We discuss the implications and need for additional future research and practice that (1) raises awareness that letter writers are gatekeepers who can improve or hinder women's progress and (2) develops methods to eliminate the skewed use of doubt raisers The implications of the current research on letters of recommendations are particularly important because their use for academia is well established (Johnson et al., 1998; Landrum et al., 1994; Sheehan et al., 1998). Our studies show how bias in the letterwriting process can be propagated, even if evaluators do not necessarily display overt gender biases. The differences in word choice may seem negligible, but in fact, as our data show, doubt raisers have discernible penalties for women in academia (Eagly & Karau 2002; Eagly & Johannesen-Schmidt, 2001; Wood & Eagly, 2000). Awareness of and attention to these differences are critical areas of future research and application if we want to maximize fairness in occupations, such as academia, that rely on letters of recommendation.

Schmidt & Hunter 1998

This article discusses quantifying the dollar amount each employee produces as a performance indicator. (Liana) Is it possible to quantify this amount? Is there anything that might be missed when quantifying each employee's contribution? What would be the best method of quantifying an employee's dollar amount produced? If you successfully quantify the dollar amount each employee produces, are there any problems that may arise? (e.g., Nordstrom Employee working commission). Performance on the job was typically measured by supervisor ratings. What are our thoughts on this method? Other measures used were production records, sales records. (Liana). The article discusses reference checks, indicating that at the time of publication employers avoided providing negative information about former employees due to legal concerns (being sued). This measure provided a 12% increase in validity over the GMA measure. (Liana) Has this changed? Are employers now more willing to provide negative information about applicants? Do you believe reference checks are useful?

Schleicher et al 2019

This integrative conceptual review is based on a critical need in the area of performance management (PM), where there remain important unanswered questions about the effectiveness of PM that affect both research and practice. In response, we create a theoretically grounded, comprehensive, and integrative model for understanding and measuring PM effectiveness, comprising multiple categories of evaluative criteria and the underlying mechanisms that link them. We then review more than 30 years (1984 -2018) of empirical PM research vis-a`-vis this model, leading to conclusions about what the literature has studied and what we do and do not know about PM effectiveness as a result. The final section of this article further elucidates the key "value chains" or mediational paths that explain how and why PM can add value to organizations, framed around three pressing questions with both theoretical and practical importance (How do individual-level outcomes of PM emerge to become unit-level outcomes? How essential are positive reactions to the overall effectiveness of PM? and What is the value of a performance rating?). This discussion culminates in specific propositions for future research and implications for practice. This review sets forth a theoretically grounded, comprehensive, and integrative model for understanding and measuring PM effectiveness. In using this model as a framework for reviewing and synthesizing the empirical research in PM, we find that although there has been a great deal of empirical work on the relationship between aspects of PM and each evaluative criterion considered separately, very little work has examined the longer "value chains" of PM. This represents an important opportunity for future work. We believe that this model and review (including the propositions we develop) can be very helpful for advancing both research and practice in PM, moving the field from more simplistic questions like "Is PM effective?" and "What is the ultimate criterion for PM?" to more nuanced and fruitful inquiries regarding how PM creates value and for whom.

Hammer et al 2019

This randomized controlled trial involved the development and evaluation of a supervisor support training intervention in the civilian workforce called VSST: Veteran-Supportive Supervisor Training. A theoretically based intervention in the workplace is critical to ensuring a smooth transition for service members and their families to civilian life, leading to improved psychological and physical health and improved work outcomes among service members. Thirty-five organizations were recruited and randomized to the VSST training program or a waitlist control group. Within those organizations, 497 current or former (post 9/11) service member employees were asked to complete baseline and 3- and 9-month follow-up surveys covering work, family, and health domains. The computerized 1-hr training, and the behavior tracking that followed were completed by 928 supervisors from the participating organizations. Intervention training effects were evaluated using an intent-to-treat approach, comparing outcomes for service members who were in organizations assigned to the training group versus those who were in organizations assigned to the control group. Moderation effects revealed the intervention was effective for employees who reported higher levels of supervisor and coworker support at baseline, demonstrating the importance of the organizational context and trainee readiness. The results did not show evidence of direct effects of the intervention on health and work outcomes. Qualitative data from supervisors who took the training also demonstrated the benefits of the training. This study affirms and adds to the literature on the positive effects of organizational programs that train supervisors to provide social support, thereby improving health and work outcomes of employees who receive more support. This RCT was based on the development and evaluation of a computer-based supervisor training program to improve the health and work outcomes of our veterans who have transitioned to the civilian workplace after the post-9/11 conflicts. This training can be extended to support veterans transitioning to the workforce following deployments to any conflicts in the future, and is based on leadership, training, and social support theories. Results show that when veterans reported that their supervisors and coworkers were more supportive at baseline, the training had more beneficial effects on health and work outcomes. We suggest that future research and practice focus on further development of supervisor social support training. We also urge for more research on postdeployment veteran transition efforts into the civilian workforce to better support our service members.

Ali et al 2016

This study addresses how job seekers' experiences of rude and discourteous treatment—incivility— can adversely affect self-regulatory processes underlying job searching. Using the social- cognitive model (Zimmerman, 2000), we integrate social- cognitive theory with the goal orientation literature to examine how job search self-efficacy mediates the relationship between incivility and job search behaviors and how individual differences in learning goal orientation and avoid-performance goal orientation moderate that process. We conducted 3 studies with diverse methods and samples. Study 1 employed a mixedmethod design to understand the nature of incivility within the job search context and highlight the role of attributions in linking incivility to subsequent job search motivation and behavior. We tested our hypotheses in Study 2 and 3 employing time-lagged research designs with unemployed job seekers and new labor market entrants. Across both Study 2 and 3 we found evidence that the negative effect of incivility on job search self-efficacy and subsequent job search behaviors are stronger for individuals low, rather than high, in avoid-performance goal orientation. Theoretical implications of our findings and practical recommendations for how to address the influence of incivility on job seeking are discussed. This study bridges the research streams on incivility and job searching and highlights the importance of understanding contextual factors that can influence job seeker motivation and job search success. To our knowledge, our study is the first to extend the incivility construct beyond the organizational boundaries by articulating the ways in which it affects self-regulatory processes that are important for job hunting and employment attainment. Future studies should build on these findings by examining compensatory strategies that job seekers can utilize to reduce the negative experiences of incivility and incorporating other self-regulatory constructs that further shed light on the process by which incivility affects job searching.

Pinto & Ramalheira 2017

This study examines whether the academic performance and the participation in extracurricular activities affect the perceived employability of business graduates using an experimental between-subjects factorial design. Eight fictitious résumés of business graduates varying in terms of academic performance, participation in extracurricular activities and gender were rated by 349 Portuguese working adults. The results showed that a high academic performance combined with the participation in extracurricular activities resulted in higher perceived employability, whereas the participation in extracurricular activities combined with amodest academic performance resulted in lower job suitability but nearly identical high ratings of personal organization and time management, and learning skills. These inferences were unaffected by either applicant gender or respondent's characteristics. The findings highlight the prominence of academic performance that combined with extracurricular activities can be a valuable distinctiveness approach to ease the entrance in the labour market of business graduates.

Hartwell & Champion 2016

This study explores normative feedback as a way to reduce rating errors and increase the reliability and validity of structured interview ratings. Based in control theory and social comparison theory, we propose a model of normative feedback interventions (NFIs) in the context of structured interviews and test our model using data from over 20,000 interviews conducted by more than 100 interviewers over a period of more than 4 years. Results indicate that lenient and severe interviewers reduced discrepancies between their ratings and the overall normative mean rating after receipt of normative feedback, though changes were greater for lenient interviewers. When various waves of feedback were presented in later NFIs, the combined normative mean rating over multiple time periods was more predictive of subsequent rating changes than the normative mean rating from the most recent time period. Mean within-interviewer rating variance, along with interrater agreement and interrater reliability, increased after the initial NFI, but results from later NFIs were more complex and revealed that feedback interventions may lose effectiveness over time. A second study using simulated data indicated that leniency and severity errors did not impact rating validity, but did affect which applicants were hired. We conclude that giving normative feedback to interviewers will aid in minimizing interviewer rating differences and enhance the reliability of structured interview ratings. We suggest that interviewer feedback might be considered as a potential new component of interview structure, though future research is needed before a definitive conclusion can be drawn.

Champion et al 2017

This study proposes that reaching applicants through more diagnostic recruitment sources earlier in their educational development (e.g., in high school) can lead them to invest more in their occupation-specific human capital (OSHC), thereby making them higher quality candidates. Using a sample of 78,157 applicants applying for jobs within a desirable professional occupation in the public sector, results indicate that applicants who report hearing about the occupation earlier, and applicants who report hearing about the occupation through more diagnostic sources, have higher levels of OSHC upon application. Additionally, source timing and diagnosticity affect the likelihood of candidates applying for jobs symbolic of the occupation, selecting relevant majors, and attending educational institutions with top programs related to the occupation. These findings suggest a firm's recruiting efforts may influence applicants' OSHC investment strategies. The attraction of qualified individuals to firms is one of the primary outcomes of interest in recruitment scholarship (Barber, 1998; Breaugh, 2013; Chapman et al., 2005; Uggerslev, Fassina, & Kraichy, 2012). Insofar as research has focused on identifying factors that enable firms to attract high-quality applicants, however, theoretical approaches have remained fairly "timeless" in their orientations (Ployhart & Hale, 2014), and ignored how information communicated through recruitment sources enables applicants to enhance their employability. That is, they have largely neglected that applicants require time to develop their talent prior to application, and that the timing and diagnosticity of recruitment sources to which individuals are initially exposed may have important implications for a given firm's ability to enable such development. Therefore, the purpose of this study was to present a conceptual approach that considers how various research perspectives might help to explain how source timing and diagnosticity influence the development of applicant OSHC.

Hoffman et al 2015

This study uses meta-analysis and a qualitative review of exercise descriptions to evaluate the content, criterion-related, construct, and incremental validity of 5 commonly used types of assessment center (AC) exercises. First, we present a meta-analysis of the relationship between 5 types of AC exercises with (a) the other exercise types, (b) the 5-factor model of personality, (c) general mental ability (GMA), and (d) relevant criterion variables. All 5 types of exercises were significantly related to criterion variables ( .16 -.19). The nomological network analyses suggested that the exercises tend to be modestly associated with GMA, Extraversion and, to a lesser extent, Openness to Experience but largely unrelated to Agreeableness, Conscientiousness, and Emotional Stability. Finally, despite sparse reporting in primary studies, a content analysis of exercise descriptions yielded some evidence of complexity, ambiguity, interpersonal interaction, and fidelity but not necessarily interdependence. This study presents a large-scale analysis of the criterion-related validity, nomological network, and content of managerial simulation exercises. The findings indicate that, although exercise scores might be used as a supplement to more traditional dimensionbased scores, replacing dimensions with exercises would be a mistake. Future research is encouraged to more closely consider the scoring and interpretation of AC exercises, their characteristics, and the extent to which they adequately capture prosocial and prorelational behaviors.

Harman et al 2015 (Read)

Training reactions are the most common criteria used for training evaluation, and reaction measures often include opportunities for trainees to provide qualitative responses. Despite being widely used, qualitative training reactions are poorly understood. Recent trends suggest commenting is ubiquitous (e.g., tweets, texting, Facebook posts) and points to a currently untapped resource for understanding training reactions. In order to enhance the interpretation and use of this rich data source, this study explored commenting behavior and investigated 3 broad questions: who comments, under what conditions, and how do trainees comment? We explore both individual difference and contextual influences on commenting and characteristics of comments in 3 studies. Using multilevel modeling, we identified significant class-level variance in commenting in each of the 3 samples of trainees. Because commenting has only been considered at the individual level, our findings provide an important contribution to the literature. The shared experience of being in the same class appears to influence commenting in addition to individual differences, such as interest in the topic (Studies 1 and 2), satisfaction (Studies 2 and 3), and entity beliefs (Study 3). Furthermore, we demonstrated that item wording may have an impact on commenting (Study 3) and should be considered as a potential lever for training professionals to influence commenting behavior from trainees. Training professionals, particularly those who regularly administer training evaluation surveys, should be aware of nonresponse to open-ended items and how that may impact the information they collect, use, and present within their organizations. As noted by Kraiger (2002), feedback, marketing, and decision making are the three main goals of training evaluation. Capturing trainee reactions related to the quality of training and instruction can provide valuable information for feedback (program or instructional improvement) as well as for program marketing and decision making. Although the value of collecting trainee reactions is debated (e.g., Long, DuBois, & Faley, 2008), reactions are widely used, and many organizational stakeholders find them of at least some value (ASTD, 2009). Many training evaluation surveys ask trainees to respond to open-ended questions as well as provide quantitative ratings of reaction items. The response rate is typically very different for quantitative and qualitative items with fewer trainees providing comments. This raises potential representativeness issues with comments (e.g., Rogelberg & Stanton, 2007), and suggests caution should be taken until more is known about commenting, such as who makes comments, what individual and contextual factors influence commenting, and the nature of comments made. It has been our experience in working with training directors, managers, and supervisors for several years that they are always keenly interested in reading comments from trainees, but they do not necessarily interpret comments with appropriate caution. Training professionals should be aware of nonresponse to openended items and how that may impact the information they collect, use, and present within their organizations. Our studies show that in training contexts, aggregating commenting data across classes could hide potentially important class-level differences. It is important not to overgeneralize comments beyond the appropriate level. Our research should be considered an important first step in gaining a deeper understanding of trainee commenting on training evaluation surveys by providing empirical evidence that can inform the use of qualitative reactions in organizations for the three goals noted by Kraiger (2002). Previous work looking at commenting has been done with organizational climate surveys or 360-degree feedback, but the training context needs to be investigated because it differs in important ways that may impact commenting. Related to AET, training comments are often made within the context of a discrete training event (i.e., event focused), not the job or the overall organizational experience. Training can be totally separated from other work responsibilities— one's manager and organization's climate cannot be so easily separated. Because trainees are being asked to provide feedback posttraining (typically) to inform future training that will not directly benefit them (they typically will not participate in the same training course again), they may be less likely to provide in-depth feedback, as if there were a direct future benefit for them. Because the results likely would have no impact on their day-to-day jobs, the commenting and the characteristics of comments provided may differ substantially. Additionally, organizational training is increasingly being delivered online through vendors (Burgess & Russell, 2003), and learners are increasingly going online for self-directed learning (Brown, 2001). As commenting has become ubiquitous online (e.g., Facebook), instructional or training review sites have emerged with user comments in addition to quantitative ratings (e.g., Ratemyprofessor, MoocAdvisor). Just as comments on training evaluation surveys can impact decisions related to training, comments on social media or review sites have the same potential to impact the adoption or selection of learning opportunities. Comments made via social media and review sites should be investigated empirically, and the findings from our three studies can inform this needed research.

Olsen & Martins 2016

We draw on the values literature from social psychology and the acculturation literature from cross-cultural psychology to develop and test a theory of how signals about an organization's diversity management (DM) approach affect perceptions of organizational attractiveness among potential employees. We examine the mediating effects of individuals' merit-based attributions about hiring decisions at the organization, as well as the moderating effects of their racioethnicity and the racioethnic composition of their home communities. We test our theory using a withinsubject policy-capturing experimental design that simulates organizational DM approaches, supplemented with census data for the participants' home communities. Results of hierarchical linear modeling (HLM) analyses suggest that the manipulated instrumental value for diversity leads to higher perceptions of organizational attractiveness, in part through heightened expectations of merit-based hiring decisions. Further, the manipulated assimilative and integrative DM approach signals are positively related to organizational attractiveness and the effect of integrative DM is strongest for racioethnic minorities from communities with especially high proportions of Whites and Whites from communities with especially low proportions of Whites. We used research on acculturation and values to examine how different DM signals may influence potential employees' perceptions of organizations. We found that these signals affect potential recruits' perceptions in general and in interactions with other factors. First, our study suggests that organizations signaling an instrumental value for diversity will see positive effects on potential recruits' perceptions of organizational attractiveness, in part through their merit-based attributions. Second, and of particular importance, is our finding of a significant three-way interaction among integrative DM signal, potential recruits' racioethnic group membership, and racioethnic composition of their home community. This research highlights the importance not just of the content of DM signals in influencing perceptions of the organization, but also of the demographic makeup of the contexts of potential employees toward whom the signals are directed. As such, it advances our theoretical understanding of the effects of DM programs, and provides actionable implications for management practitioners seeking to attract and retain a diverse workforce.

Aguinis et al 2018

We examined the gender productivity gap in STEM and other scientific fields specifically among star performers. Study 1 included 3,853 researchers who published 3,161 articles in mathematics. Study 2 included 45,007 researchers who published 7,746 articles in genetics. Study 3 included 4,081 researchers who published 2,807 articles in applied psychology and 6,337 researchers who published 3,796 articles in mathematical psychology. Results showed that (a) the power law with exponential cutoff is the best-fitting distribution of research productivity across fields and gender groups, and (b) there is a considerable gender productivity gap among stars in favor of men across fields. Specifically, the under representation of women is more extreme as we consider more elite ranges of performance (i.e., top 10%, 5%, and 1% of performers). Conceptually, results suggest that individuals vary in research productivity predominantly due to the generative mechanism of incremental differentiation, which is the mechanism that produces power laws with exponential cutoffs. Also, results suggest that incremental differentiation occurs to a greater degree among men and certain forms of discrimination may disproportionately constrain women's output increments. Practically, results suggest that women may have to accumulate more scientific knowledge, resources, and social capital to achieve the same level of increase in total outputs as their male counterparts. Finally, we offer recommendations on interventions aimed at reducing constraints for incremental differentiation among women that could be useful for narrowing the gender productivity gap specifically among star performers. Adopting a falsification epistemological approach and using the distribution pitting methodology to implement it, we examined the research productivity of 59,278 researchers who have published at least one article in the most cited journals in mathematics, genetics, applied psychology, and mathematical psychology from 2006 to 2015. Results revealed that productivity 57 distributions follow the power law with exponential cutoff for both women and men. This finding points to incremental differentiation as the likely dominant generative mechanism for the production of star performers because power laws with exponential cutoffs are generated via incremental differentiation. As another unique contribution, results showed that the right-tails of the productivity distributions are significantly lighter for women compared to men—across all the scientific fields we examined. This finding indicates that the gender productivity gap is even more extreme for star performers and, more specifically, under representation of women is more and more extreme as we move higher and higher along the productivity continuum. Taken together, our results make a contribution to our understanding of the emergence of star performers and the predominant reason for the existence of a gender productivity gap among star performers: gender-based differences in accumulation rates, which are better explained by gender discrimination than gender-based differences in abilities or career and life choices. Our results also suggest that interventions aimed at reducing constraints for incremental differentiation among women can be useful for narrowing the gender productivity gap specifically among star performers. Narrowing this gap is especially important among stars because they are highly visible and play a powerful role in shaping people's attitudes and organizational policies. Overall, based on the finding that women are even more underrepresented among stars, our results highlight the urgent need to address the gender productivity gap in STEM and other scientific fields.

Kooji et al 2017

We introduce 2 novel types of job crafting— crafting toward strengths and crafting toward interests—that aim to improve the fit between one's job and personal strengths and interests. Based on Berg, Dutton, and Wrzesniewski (2013), we hypothesized that participating in a job crafting intervention aimed at adjusting the job to personal strengths and interests leads to higher levels of job crafting, which in turn will promote person-job fit. Moreover, we hypothesized that this indirect effect would be stronger for older workers compared with younger workers. Results of an experimental field study indicated that participating in the job crafting intervention leads to strengths crafting, but only among older workers. Strengths crafting was, in turn, positively associated with demands-abilities and needs-supplies fit. Unexpectedly, participating in the job crafting intervention did not influence job crafting toward interests and had a negative effect on crafting toward strengths among younger workers. However, our findings suggest that some types of job crafting interventions can indeed be an effective tool for increasing person-job fit of older workers. This study introduced two novel types of job crafting; crafting toward strengths (JC-strengths) and crafting toward interests (JCinterests). In addition, we tested a job crafting intervention aimed at stimulating participants to craft their job to improve its fit with their personal interests and strengths. We found initial evidence for a positive indirect effect of the job crafting intervention on person- job fit via JC-strengths among older workers. Although we expected that the job crafting intervention would be more beneficial for older workers, we did not expect to find a negative effect of the job crafting intervention on JC-strengths and in turn on PJ-fit for younger workers. A speculative explanation for this unexpected effect might be that younger employees react differently to the increased awareness of a potential PJ-misfit induced by the job crafting workshop. As younger employees are less dominant, self-confident, conscientious, and self-controlling (Roberts et al., 2006) and more likely to engage in learning (Maurer, 2001), they may tend to use skill development as a way of addressing their PJ-misfit, leading to a lower need to engage in job crafting behavior than before the intervention. However, we cannot substantiate this explanation with our data and future research will have to shed more light on this issue. Besides the unexpected effects on younger workers, we also did not find that the job crafting intervention was more beneficial for older employees' level of JC-interests. Possibly, since older workers are more loyal and committed to the organization (Ng & Feldman, 2010) and more likely to engage in organization citizenship behavior (Ng & Feldman, 2008), the job crafting intervention motivated them to make better use of their strengths to serve their organization, but did not encourage them to make changes to try to make their jobs more interesting. Finally, although JC-strengths was positively related to both NS- and DA-fit, JC-interests was only positively related to NS-fit. Possibly, the activities that employees find interesting are not necessarily those they are good at, especially not for employees with a strong growth mindset (Dweck, 2006).

Denisi & Murphy (2017)

We review 100 years of research on performance appraisal and performance management, highlighting the articles published in JAP, but including significant work from other journals as well. We discuss trends in eight substantive areas: (1) scale formats, (2) criteria for evaluating ratings, (3) training, (4) reactions to appraisal, (5) purpose of rating, (6) rating sources, (7) demographic differences in ratings, and (8) cognitive processes, and discuss what we have learned from research in each area. We also focus on trends during the heyday of performance appraisal research in JAP (1970-2000), noting which were more productive and which potentially hampered progress. Our overall conclusion is that JAP's role in this literature has not been to propose models and new ideas, but has been primarily to test ideas and models proposed elsewhere. Nonetheless, we conclude that the papers published in JAP made important contribution to the filed by addressing many of the critical questions raised by others. We also suggest several areas for future research, especially research focusing on performance management. Our title included a question mark suggesting potential doubts about whether the substantial body of research published in the last 100 years in JAP has made a substantial contribution to our understanding of performance appraisal and performance management. The answer is both "yes" and "no." It should be clear that we have come a long way from examining rating scale formats to determine their effects on rating errors, and JAP has contributed substantially to this progress. We have certainly learned that the specific format of the rating scale used is not the most important consideration in developing appraisal systems and that traditional error measures are not the best way to evaluate such systems. We have learned that demographic characteristics may have less influence on ratings than we had believed, that some rater cognitive processes are related to appraisal decisions, and that it is possible to train rates to do a better job. Certainly, these accomplishments can be considered progress. However, perhaps the most significant progress we have made during this time is to come to better appreciate the critical influence of the context in which performance appraisal occurs on the process and outcomes of appraisal (Murphy & DeNisi, 2008), and the role of JAP in this area is smaller and more indirect. Performance appraisal is used for a variety of purposes in organizations (Cleveland et al., 1988), and these purposes influence the way performance is defined (e.g., task performance vs. contextual performance; Podsakoff, Ahearne, & MacKenzie, 1997) and the way raters and ratees approach the task of performance appraisal (Murphy & Cleveland, 1995). The appraisal effectiveness model proposed by Levy and Williams (2004) summarizes much of the research on the role of social context and emphasizes the importance of rate reactions to appraisals and the acceptability of ratings, and some of the work summarized in this review has appeared in JAP. However, most of the research published in JAP has been decontextualized, examining different facets of the rating process (e.g., cognitive processes, rating scales, rater training) in isolation, and it has become clear that we will not make progress in understanding how or why appraisals succeed without considering why appraisals are done in the first place, and how the climate, culture, norms, and beliefs in organizations shape the appraisal process and the outcomes of appraisals. Contextualizing performance appraisal research implies paying attention to when and why performance appraisal is carried out and the contextual variables that are likely to be important range from quite distal (e.g., national cultures) to quite proximal (e.g., supervisor-subordinate relationships). For example, there may be aspects of national culture (or organizational culture) that make it less acceptable to give anyone negative feedback, and this may put pressure on raters to intentionally inflate ratings. In fact, we know little about how culture and societal norms really affect appraisal decisions and processes; JAP has made few contributions here. There is descriptive research that indicates that different practices and policies are more likely in some parts of the world than in others (e.g., Varma, Budhwar, & DeNisi, 2008), but we do not fully understand how cultural norms may make certain practices more or less effective. Also, we need more research on the effectiveness of individual-level performance management techniques in different cultures. The archive for this issue also includes a model of various factors that might affect performance appraisal processes and changes in individual performance. This model is adapted from Murphy and DeNisi (2008). At the most fundamental level, the question mark in our title really refers to the uncertainty in moving from the level of individual-level performance to firm-level performance. DeNisi and Smith (2014) concluded that although we have learned a great deal about how to improve individual performance through appraisal and performance management programs, there is no evidence to show that improving individual-level performance will eventually lead to improvements in firm-level performance. As noted earlier, it has always been implied or assumed that improving individual-level performance would eventually lead to improvements in firm-level performance. The ongoing failure to establish a clear link between individual and performance that leads us to raise questions about overall progress in this field. Even if we succeed in using performance appraisal, feedback, and other components of performance management to improve individual job performance, it is not clear that this will lead to more effective organizations. We believe that identifying how (if at all) the quality and the nature of performance appraisal programs contribute to the health and success of organizations is a critical priority. JAP is not alone in its failure to address this priority; the research literature in the organizational sciences simply has not grappled with this question in a credible way. In conclusion, we believe that JAP has made some worthwhile contributions to our understanding of performance appraisal and performance management. More important, JAP can and should have a critical role in the future progress of our field. JAP has always placed a strong emphasis on rigorous empirical test of theories and models, and this is an orientation that is not universally shared across journals in this domain. As a consequence, we believe that JAP should be a natural home for rigorous tests of performance management programs and their components. It is disconcerting to see how much discussion of performance management exists, and how little evidence there is about how it actually works. It is our hope that JAP can take a lead in combining the concern for the organizational performance, often shown in other parts of the organizational sciences, with its traditional concern for scientific rigor to produce a better understanding of how and why performance appraisal and performance management actually function in organizations, and how attempts to evaluate and improve individual performance influence the lives of employers and employees and the organizations in which they are found. Specifically, we believe that JAP should seek to publish research that (a) is conducted in organizations settings, (b) involves processes and outcomes with real stakes for participants (for example, studies of performance appraisals that are used for promotion and pay decisions), (c) includes assessments of both distal and proximal context variables, and (d) includes assessments of performance/success at a range of levels, including individual, group, and firm performance measures when possible. Research on cognitive processes focused on how raters form judgments, but as Murphy and Cleveland (1995) point out, there is a difference between judgments and actual ratings, and future research also needs to focus on the reasons why raters might not choose to provide ratings consistent with their judgments. All of this will require something that JAP once routinely did, but that is now challenging and rare, carrying out research in organizations or in cooperation with practitioners. For decades, we have bemoaned the gap between research and practice. It is time to stop the moaning and start the process of rebuilding the essential links between our research as psychologists and the topic we claim to care about—understanding behavior in organizations.

Koch et al 2015 (Meta)

Gender bias continues to be a concern in many work settings, leading researchers to identify factors that influence workplace decisions. In this study we examine several of these factors, using an organizing framework of sex distribution within jobs (including male- and female-dominated jobs as well as sex-balanced, or integrated, jobs). We conducted random effects meta-analyses including 136 independent effect sizes from experimental studies (N 22,348) and examined the effects of decision-maker gender, amount and content of information available to the decision maker, type of evaluation, and motivation to make careful decisions on gender bias in organizational decisions. We also examined study characteristics such as type of participant, publication year, and study design. Our findings revealed that men were preferred for male-dominated jobs (i.e., gender-role congruity bias), whereas no strong preference for either gender was found for female-dominated or integrated jobs. Second, male raters exhibited greater gender-role congruity bias than did female raters for male-dominated jobs. Third, gender-role congruity bias did not consistently decrease when decision makers were provided with additional information about those they were rating, but gender-role congruity bias was reduced when information clearly indicated high competence of those being evaluated. Fourth, gender-role congruity bias did not differ between decisions that required comparisons among ratees and decisions made about individual ratees. Fifth, decision makers who were motivated to make careful decisions tended to exhibit less gender-role congruity bias for male-dominated jobs. Finally, for male-dominated jobs, experienced professionals showed smaller gender-role congruity bias than did undergraduates or working adults. Findings regarding all hypotheses and research questions are summarized in Table 7. Our results supported the predictions made by role congruity theory (Eagly & Karau, 2002; Kark & Eagly, 2010), which suggests that the greater the incongruence between stereotypical gender traits and the gender stereotype of a job, the greater the gender bias, particularly for masculine jobs (represented by male-dominated jobs in our study). Our findings suggest that women may be more likely to face discrimination in maledominated environments, whereas, on average, neither gender has an advantage in female-dominated or integrated environments. The extent to which a job is female dominated is negatively related to occupational salary and prestige (Glick, 1991; Lyness, 2002), so women may tend to face the most discrimination in jobs that generally produce the highest pay and status. In our examination of rater gender, we found that male raters tended to favor males, regardless of the sex distribution within the job. The finding that male raters exhibited stronger gender-role congruity bias than female raters for male-dominated jobs is consistent with the idea that men may be sensitive to changes in the traditional gender hierarchy and may disapprove of women working in male-dominated, high-status occupations. Because the workplace has historically been the male domain, males may feel as though their roles are being threatened by women entering the workforce, especially when women seek male-dominated jobs. It may also be that males, compared to females, tend to see maledominated positions as more masculine or tend to adhere more strongly to gender stereotypes. Our results show that female raters did not exhibit a large bias for male-dominated jobs. This finding could have resulted partially from the tendency of women (compared to men) to hold less traditional stereotypes about women, to see women as having more masculine traits, and to view some traditionally masculine jobs as more feminine (e.g., Brenner et al., 1989; Koenig et al., 2011; Massengill & DiMarco, 1979; Schein, 2001), leading women to be more likely than men to believe that women are compatible with masculine or male-dominated roles. Rater gender analyses also revealed a surprising pro-male bias by both male and female raters for female-dominated jobs. This finding is consistent with a "glass escalator" effect, where men in female-dominated professions enjoy advantages such as being more likely to be hired, to be promoted, and to earn pay raises than women in the same occupations (see Williams, 1992). Explanations for this effect include men being steered toward more masculine positions or specialties within female-dominated occupations, which include managerial and administrative roles that tend to be higher paying and more prestigious Our results offer limited support for the claim that providing more individuating information decreases gender bias in workplace decisions (e.g., Landy, 2008). Compared to when information was limited, one or more substantial pieces of information decreased bias, particularly for male-dominated jobs. However, mean effect sizes for the number of pieces of information suggested that bias did not always decrease when adding each additional piece of information. Perhaps more information leads to a higher cognitive load, resulting in the decision maker failing to consider the information and relying on stereotypes. The findings on the content of individuating information may also shed light on why we did not discover the expected pattern of bias for the amount of information. If information was ambiguous regarding a ratee's potential for success in a job, it did not tend to reduce gender-role congruity bias. Even if a decision maker had a large amount of information about a ratee, if that information was ambiguous, gender-role congruity bias did not decrease. This supports the idea that individuating information must be highly diagnostic to counteract stereotypes These results do not support predictions about a backlash effect where highly competent ratees are punished for violating traditional stereotypes. In the case of female-dominated jobs, males were rewarded rather than punished for being highly competent in gender-incongruent jobs. This appears consistent with what Kunda and Thagard (1996) called a contrast effect, in which unambiguous information indicating a counterstereotypical trait of a ratee leads the ratee to be viewed as extreme on that trait. For male-dominated jobs, neither gender was favored when ratees were highly competent. This appears to be an encouraging sign for competent women wishing to enter male-dominated professions. Our findings failed to provide support for the shifting standards model (Biernat, 2003), which predicts smaller gender-role congruity bias for individual ratings than for comparative ratings. We found no substantial differences between individual and comparative ratings. It appears that the within-gender comparisons for individual, subjective rating scales thought to induce the shifting standards pattern (e.g., "this person is competent . . . for a woman") are not consistently made. One encouraging finding from our study comes from the moderator analyses on characteristics expected to increase participants' motivation to make careful decisions, particularly for maledominated jobs. When participants felt accountable for their decisions, believed their decisions had real-life consequences, or were reminded of equity norms, they tended to make less biased decisions about male-dominated jobs than when none of these features were present. This finding provides support for the idea that when held accountable, decision makers are more careful and thorough about processing information, leading to more accurate decisions. Our findings suggest that increasing feelings of accountability or highlighting equity norms in an organization may help to reduce gender bias in decision making, specifically for male-dominated jobs. We found some evidence indicating that professionals with experience and/or training in organizational decision making exhibit less bias than untrained working adults or undergraduate students. For male-dominated jobs, experienced professionals tended to show smaller gender-role congruity bias than working adults or undergraduates, providing some support for ideas expressed by those who question the generalizability of findings from laboratory studies with undergraduate participants to real-life employment settings (e.g., Landy, 2008). It may be that experienced decision makers have learned to avoid stereotypical thinking or are more aware of norms that discourage them from appearing biased.

McCausland et al 2015

Purpose The purpose of this study was to investigate if chronological age sparks negative expectancies thus initiating a self-fulfilling prophecy in technology training interactions. Design/Methodology/Approach Data were obtained from undergraduate students (age B 30) paired in 85 trainer- trainee dyads and examined through the actor-partner interdependence model. Trainer and trainee age (younger or older) were manipulated in this laboratory experiment by presenting pre-selected photographs coupled with voice enhancing software. Findings As compared to younger trainees, ostensibly older trainees evoked negative expectancies when training for a technological task, which ultimately manifested in poorer training interactions and trainer evaluations of trainee performance. Implications Identifying a connection between chronological age and negative expectancies in technology training advances our theoretical understanding of sources contributing to older trainees' poorer performance in workforce training programs. This study provides evidence of a negative relationship between trainees' chronological age and trainers' expectations for trainee success and subsequent training evaluations. Such knowledge offers initial support for a ''train-the-trainer'' intervention through educating trainers on the potential dangers of age-based stereotypes, which could help to reduce age-based performance discrepancies. Originality/Value This is the first study to manipulate age during training thus isolating the influence of agebased stereotypes on training experiences. Given that potential age-related performance decrements in capability and motivation can be eliminated as explanations, this evidence of poorer interactions and outcomes for older workers is critical. By 2018, 24 % of the United States workforce will be composed of individuals age 55 or older (U.S. Bureau of Labor Statistics 2008). The aging workforce can be attributed to a number of health, societal, and economic trends. Since 1950, life expectancy has increased by approximately 12 years and is currently estimated at 78 years older (Arias 2012). Additionally, retirement is no longer an all-or-nothing event. Workers may transition out of the workforce more slowly through part-time or ''bridge employment'' (Kim and Feldman 1994). Lastly, unpredictable economic conditions may require working later in life. Combined, these trends have resulted in the most age diverse workforce in modern history and this shift does not represent a temporary flux, but a new standard. Increases in age diversity are simultaneously occurring with dramatic increases in the use of technology. Consequently, the workplace is transforming into a domain that is almost unrecognizable from the past decade. To remain competitive, organizations must adopt a continuous learning philosophy (Noe 2010) so that the knowledge and skills of employees ''keep pace'' with ever evolving technological capabilities. The most common activity from this philosophy is, perhaps, formal training. Research suggests that training, when implemented and tailored appropriately, produces noteworthy gains (Aguinis and Kraiger 2009; Noe 2010). Unfortunately, the benefits derived from training may be guarded by additional barriers for certain workers. At the onset of this study, the authors posed the following question: Why do age-related differences emerge in technology training? Likely, there are multiple factors contributing to the full explanation (e.g., mean-level declines in cognitive functioning and fewer technology-related experiences), but the current investigation ignores these employee characteristics and focuses exclusively on trainer beliefs evoked from older employee characteristics. This study finds, when lacking relevant information to base expectations, trainers may utilize widely available stereotypes, which translate to poorer training quality and ultimately facilitate expectancy confirmation serving to disadvantage stereotyped employees. This finding is particularly concerning because training is intended to eliminate competency gaps, not contribute to them. Differing performance levels can then justify future personnel decisions (e.g., promotion, training, and performance appraisal), which, once again, penalize stereotyped employees. In short, researchers and practitioners alike must devote future efforts to better understand the role of stereotypes in workplace interactions and outcomes.

DeOrtenitiis, Iddekige, Ployhart, & Heetderks (2018)

At some point, hiring managers in all organizations face the decision of whether to fill open positions with internal candidates (e.g., through promotions) or to hire external candidates (e.g., from competitors or new entrants into the labor market). Despite this ubiquitous choice, surprisingly little research has compared the effectiveness of internal and external selection or has identified situations in which 1 approach may be better than the other. The authors use theory on human capital resources to predict differences between internal and external hires on manager- and unit-level outcomes. Analysis of data from a quick-service retail organization (N 3,697) suggested that internally hired managers demonstrated higher levels of individual job performance and commanded lower starting salaries than externally hired managers. At the unit-level, operations led by internal hires demonstrated higher performance on organization-specific criteria (i.e., service performance), whereas no internal- external differences were found on more general criteria (i.e., financial performance). They also found some evidence that differences in unit service performance decreased over time (but did not diminish completely) as external hires improved at a slightly faster rate than internal hires. Overall, these findings underscore the complexity of the recurring "build or buy" decision. The results also suggest that internal hires generally outperform external hires, both individually and collectively, and they do so for less money. The present study examined a staffing issue that has received surprisingly little scholarly attention despite its considerable importance to practitioners and organizations. The overall results suggest that organizations should strongly consider filling managerial positions by promoting from within. Indeed, internal hires generally outperform external hires and cost less in terms of salary and other incentives (e.g., promotions). Yet, there are situations in which selecting internal candidates or external candidates may be less consequential, such as when the most valued performance criteria are general rather than organization-specific. We hope these results provide useful information to organizations faced with internal- external staffing decisions. We also hope this study will provide a springboard for future research on this important but largely neglected topic.

Judge et al 2013

Integrating 2 theoretical perspectives on predictor- criterion relationships, the present study developed and tested a hierarchical framework in which each five-factor model (FFM) personality trait comprises 2 DeYoung, Quilty, and Peterson (2007) facets, which in turn comprise 6 Costa and McCrae (1992) NEO facets. Both theoretical perspectives—the bandwidth-fidelity dilemma and construct correspondence— suggest that lower order traits would better predict facets of job performance (task performance and contextual performance). They differ, however, as to the relative merits of broad and narrow traits in predicting a broad criterion (overall job performance). We first meta-analyzed the relationship of the 30 NEO facets to overall job performance and its facets. Overall, 1,176 correlations from 410 independent samples (combined N 406,029) were coded and meta-analyzed. We then formed the 10 DeYoung et al. facets from the NEO facets, and 5 broad traits from those facets. Overall, results provided support for the 6-2-1 framework in general and the importance of the NEO facets in particular. In reviewing the literature on the relationships of direct measures of the broad Big Five traits to job performance, Hurtz and Donovan (2000) commented, Although these theoretically meaningful relations are rather low in magnitude at the broad dimension level of the Big Five, the magnitude of these correlations might be enhanced if the most relevant specific facets of these broad dimensions could be specified. (pp. 876-877) Through applying two related taxonomic structures of lower order traits to three job performance criteria and developing a 6-2-1 framework that includes broad and narrow traits, this study suggests that specific facets do indeed have something to add to the prediction of job performance. Overall, our results suggest that it is time to reconsider the dominant way in which personality is assessed. Hierarchical approaches such as the 6-2-1 framework developed here appear to have much to offer.

Roth, Goldberg, & Thatcher 2017

Organizational researchers have studied how individuals identify with groups and organizations and how this affiliation influences behavior for decades (e.g., Tajfel, 1982). Interestingly, investigation into political affiliation and political affiliation similarity in the organizational sciences is extremely rare. This is striking, given the deep political divides that exist between groups of individuals described in the political science literature. We draw from theories based on similarity, organizational identification, and person-environment fit, as well as theoretical notions related to individuating information, to develop a model, the political affiliation model (PAM), which describes the implications of political affiliation and political similarity for employment decisions. We set forth a number of propositions based on PAM, to spur future research in the organizational sciences for a timely topic which has received little attention. We suggest that issues around political affiliation are extremely timely given the current very strong affect attached to political dissimilarity in which feelings are very negative toward members of opposing parties and getting more negative over time, as evidenced by the demonstrations following recent U.S. election results. By assembling the first model of how political affiliation similarity might influence organizational decisions in either applied psychology/management or political science we hope to prompt research on this potentially meaningful variable. We suggest that PAM forms a rich source for much needed research and we urge our colleagues to engage in such research in order to learn how this variable influences decisions in the organizational sciences.

Lindsey et al 2015

Purpose The purpose of this paper is to examine method, motivation, and individual difference variables as they impact the effectiveness of a diversity training program in a field setting. Design We conducted a longitudinal field experiment in which participants (N = 118) were randomly assigned to participate in one of three diversity training methods (perspective taking vs. goal setting vs. stereotype discrediting). Eight months after training, dependent measures on diversity-related motivations, attitudes and behaviors were collected. Findings Results suggest the effectiveness of diversity training can be enhanced by increasing motivation in carefully framed and designed programs. Specifically, selfreported behaviors toward LGB individuals were positively impacted by perspective taking. Training effects were mediated by internal motivation to respond without prejudice, and the model was moderated by trainee empathy. Implications These findings serve to demonstrate that diversity training participants react differently to certain training methods. Additionally, this study indicates that taking the perspective of others may have a lasting positive effect on diversity-related outcomes by increasing individuals' internal motivation to respond without prejudice. These effects may be particularly powerful for training participants who are low in dispositional empathy. Originality/Value This study is among the first to examine trainee reactions to diversity training exercises focused on different targets using different training methods. Additionally, we identify an important mediator (internal motivation to respond without prejudice) and boundary condition (trainee empathy) for examining diversity training effectiveness. The current study provides novel evidence regarding the relative effectiveness of three diversity training methods on diversity-related attitudes and behaviors. Specifically while the perspective taking method was most successful overall, we showed that diversity trainers and researchers may need to consider the empathy levels of the individuals in their sample before deciding which training method to use. Appealing to individuals who are low in empathy may be a fruitful opportunity for future research and practice.

Fell Konig & Kammerhoff 2016

Purpose This study questions whether applicants with different cultural backgrounds are equally prone to fake in job interviews, and thus systematically examines crosscultural differences regarding the attitude toward applicants' faking (an important antecedent of faking and a gateway for cultural influences) on a large scale. Design/Methodology/Approach Using an online survey, employees' (N = 3252) attitudes toward faking were collected in 31 countries. Cultural data were obtained from the Global Leadership and Organizational Behavior Effectiveness project (GLOBE). Findings Attitude toward faking can be differentiated into two correlated forms (severe/mild faking). On the country level, attitudes toward faking correlate in the expected manner with four of GLOBE's nine cultural dimensions: uncertainty avoidance, power distance, in-group collectivism, and gender egalitarianism. Furthermore, humane orientation correlates positively with attitude toward severe faking. Implications For international personnel selection research and practice, an awareness of whether and why there are cross-cultural differences in applicants' faking behavior is of utmost importance. Our study urges practitioners to be conscious that applicants from different cultures may enter selection situations with different mindsets, and offers several practical implications for international personnel selection. Originality/Value Cross-cultural research has been expected to answer questions of whether applicants with different cultural backgrounds fake to the same extent during personnel selection. This study examines and explains cross-cultural differences in applicants' faking in job interviews with a comprehensive sample and within a coherent theoretical framework.

Bell Tannenbaum, Ford, Noe, & Kraiger 2017 (Read)

Training and development research has a long tradition within applied psychology dating back to the early 1900's. Over the years, not only has interest in the topic grown but there have been dramatic changes in both the science and practice of training and development. In the current article, we examine the evolution of training and development research using articles published in the Journal of Applied Psychology (JAP) as a primary lens to analyze what we have learned and to identify where future research is needed. We begin by reviewing the timeline of training and development research in JAP from 1918 to the present in order to elucidate the critical trends and advances that define each decade. These trends include the emergence of more theory-driven training research, greater consideration of the role of the trainee and training context, examination of learning that occurs outside the classroom, and understanding training's impact across different levels of analysis. We then examine in greater detail the evolution of four key research themes: training criteria, trainee characteristics, training design and delivery, and the training context. In each area, we describe how the focus of research has shifted over time and highlight important developments. We conclude by offering several ideas for future training and development research. Private and public organizations spend vast amounts of money on training and development and almost every working adult will spend hours of their lives participating in learning experiences. There is both a business and personal imperative to better understand how humans learn at work and how best to design, implement, and support training and development activities. The state of knowledge regarding training and development has come a long way in the last 100 years, with research yielding many practical insights that can help guide practice (Salas, Tannenbaum, Kraiger, & Smith-Jentsch, 2012). The over 450 related articles published in JAP are a rich legacy, contributing to that growth in knowledge. We hope that by examining that legacy, researchers can continue to produce meaningful studies, further promoting effective learning in the workplace.

Hammer et al 2016

We tested the effects of a work-family intervention on employee reports of safety compliance and organizational citizenship behaviors in 30 health care facilities using a group-randomized trial. Based on conservation of resources theory and the work-home resources model, we hypothesized that implementing a work-family intervention aimed at increasing contextual resources via supervisor support for work and family, and employee control over work time, would lead to improved personal resources and increased employee performance on the job in the form of self-reported safety compliance and organizational citizenship behaviors. Multilevel analyses used survey data from 1,524 employees at baseline and at 6-month and 12-month postintervention follow-ups. Significant intervention effects were observed for safety compliance at the 6-month, and organizational citizenship behaviors at the 12-month, followups. More specifically, results demonstrate that the intervention protected against declines in employee self-reported safety compliance and organizational citizenship behaviors compared with employees in the control facilities. The hypothesized mediators of perceptions of family-supportive supervisor behaviors, control over work time, and work-family conflict (work-to-family conflict, family-to-work conflict) were not significantly improved by the intervention. However, baseline perceptions of family-supportive supervisor behaviors, control over work time, and work-family climate were significant moderators of the intervention effect on the self-reported safety compliance and organizational citizenship behavior outcomes. In summary, we conducted one of the only work-family intervention studies to date using a group-randomized design. We further believe that it is important to continue to find ways of improving the work environment in lower wage, hourly workforce settings. Our results demonstrate that STAR protected against declines in OCBs and safety compliance compared with control facilities. We did not identify mediating mechanisms related to increased FSSB and control over work time and decreased WFC. However, we did find significant and important moderators related to the organization's readiness to change that further buffered the declines in the outcomes. This study is important given the significance of work-family stress in the working population, the related negative effects of work-family stress on health and well-being of employees (see Hammer & Sauter, 2013, for a recent review), and the potential negative effects on work performance and return-on-investment outcomes for employers (Hammer et al., in press, for a review). Future research is needed to further understand the mechanisms through which the STAR intervention operates, the workplace moderators that impact STAR effectiveness, as well as a need for extending this intervention to further promote beneficial workplace, work-family, and health outcomes

Borman & Motowidlo 1997

This article distinguishes between task and contextual activities, and a taxonomy of contextual performance containing elements of organizational citizenship behavior and prosocial organizational behavior is offered. Evidence is presented demonstrating that supervisors weight roughly equally subordinate task and contextual performance when making overall judgments of their performance. This, along with data showing that personality successfully predicts contextual performance, provides an alternative explanation for recent meta-analytic findings that personality correlates moderately with overall performance. Personality may be predicting the contextual component of overall performance. Results from studies using the Hogan Personality Inventory confirm that correlations between personality and contextual criteria are higher than correlations between personality and overall performance. We argue that finding such links between predictors and individual criterion elements significantly advances the science of personnel selection. There are three major conclusions we would draw from the discussion here. First, in our judgment, the contextual performance domain is important-it seems conceptually, and to some extent at least, empirically distinct from the task performance domain. In fact, we believe that in the future (a) as global competition continues to raise the effort levels required of employees, (b) as team-based organizations become even more popular, (c) as downsizing continues to make employee adaptability and willingness to exhibit extra effort more of a necessity, (d) as customer service is increasingly emphasized, and (e) as fields of work in part at least replace jobs as the envelope of work, contextual performance will become more and more important in organizations. Second, research shows that experienced supervisors consider contextual performance on the part of subordinates when making overall performance ratings. A consistent finding is that supervisors weight contextual performance approximately as highly as task performance in making these judgments. And finally, very important for personnel selection, when contextual performance dimensions are included as criteria, personality predictors are more likely to be successful correlates. We have seen this fairly clear-cut pattern of validities, especially in Project A; the Motowidlo and Van Scotter (1994) research; and the Midili and Penner (1995) study, where personality predicts performance in the contextual domain. As discussed earlier, these results may help explain recent meta-analytic findings-for example Barrick and Mount's (1991) demonstration of consistent, moderate-level correlations between the Big 5 personality dimension Conscientiousness and job performance. The personality measures may be picking up on the contextual component of the criteria. This implies that where the contextual elements of performance can be measured separately, these personality-performance validities might be higher. In fact, that is our strong hypothesis. If, for example, we could conduct a meta-analysis of personality validities similar to Barrick and Mount, but with studies included only if the criteria were identified as contextual in nature, our prediction is that the results would be more favorable to personality predictors. In sum, we argued for the importance of contextual activities as performance criteria These criterion elements can be distinguished fiom task activities, and they appear important for organizational effectiveness, with orgmhationd aMf business trends likely to make contextual porfwmance even more impmmt in the future. In addition, evidence was presented demonstrating that measures of personality constructs correlate subsurnrially with contextud criterion measures. More broadly, evidence toward establishing empirical links between personality constructs and relatively specific criterion constructs contributes importantly to the science of personnel selection.

Taylor et al 2005 (Meta)

A meta-analysis of 117 studies evaluated the effects of behavior modeling training (BMT) on 6 training outcomes, across characteristics of training design. BMT effects were largest for learning outcomes, smaller for job behavior, and smaller still for results outcomes. Although BMT effects on declarative knowledge decayed over time, training effects on skills and job behavior remained stable or even increased. Skill development was greatest when learning points were used and presented as rule codes and when training time was longest. Transfer was greatest when mixed (negative and positive) models were presented, when practice included trainee-generated scenarios, when trainees were instructed to set goals, when trainees' superiors were also trained, and when rewards and sanctions were instituted in trainees' work environments. As with most meta-analyses, missing information was encountered in many original studies, leading to the exclusion of some studies from the present analysis. Missing data also precluded corrections for all theoretical aspects of measurement error and prevented further analyses, for example, refinements on the presentation of learning points (e.g., trainee-generated learning points), the modeling component (e.g., the inclusion of learning points in modeling videos, having trainees take notes during modeling, use of multiple rather than single models, mixed vs. positiveonly models for technical training), and behavioral rehearsal (e.g., spaced vs. massed practice, skill practice group size, use of video feedback). Insufficient information was most prevalent in unpublished reports provided by training firms. Although training firms' encouragement of their clients to collect evaluation information on training programs is laudable, more careful documentation of the organizational context in which the training was conducted, as well as basic descriptive statistics (group means and standard deviations; sample sizes; reliabilities of measures; and, in the case of repeated measures designs, pretest-posttest correlations), would be helpful. For many training outcomes, considerable variance in training effects across studies remained after removing variance due to sampling error, as indicated by large residual variances and low lower bound 90% credibility values. Large residual variances often remained even after studies were broken down by methodological variables, suggesting that moderator variables other than those assessed in the present study are likely to be responsible for the remaining variability of BMT effects across studies. We suspect that some between-studies variability in BMT effects on learning outcomes may be due to differences in the sensitivity of measures used and organization-level differences in trainees' (pretraining) levels of skills and motivation (Colquitt, LePine, & Noe, 2000). In the case of job behavior and results (productivity and climate) outcomes, residual variability may result from unmeasured, organization-level differences in work environment variables other than those assessed in the present study. Training practitioners may be tempted to calculate the return on investment of conducting BMT by applying training utility analysis (e.g., Cascio, 1989) and using the population effect size estimates for job behavior here, but doing so would result in an upwardly biased utility estimate. Measures of job behavior in virtually all studies included here were focused on specific aspects of trainees' performance that were expected to change as a result of training and did not assess trainees' performance across the entire performance domain of trainees' jobs. Consequently, these effect size estimates would overstate the effect that BMT would have on overall job performance to the extent of criterion deficiency in the performance measures used. The appropriate correction for utility analysis would involve either reducing obtained effect sizes by the extent of criterion deficiency or obtaining estimates of the standard deviation of job performance based on the dollar value of variation in only those aspects of performance measured in each study. Such information was not reported in the original studies included in this meta-analysis, and so utility estimates cannot be calculated directly. Nevertheless, returns on investment with BMT training, based on effect size estimates reported here, are likely to be high enough to justify the value of using behavior modeling to improve performance in organizational settings. Morrow et al. (1997) collected necessary information to correct managerial training (not BMT) effect sizes in one organization for criterion deficiency and demonstrated an average return on investment of 45%, based on an average effect size of 0.31 (uncorrected). Mathieu and Leonard (1987) estimated even greater returns for a BMT program with an effect size estimate of 0.31 for overall job performance. Assuming comparability of other variables between the training contexts of studies included here and in both Morrow et al.'s meta-analysis and Mathieu and Leonard's study (e.g., extent of criterion deficiency, dollar value of performance, and training costs), the similar effect size found here for BMT could be expected to produce similar financial returns. The utility of BMT is further supported by the finding of no decay in training effects on skill use over time. The body of published and unpublished evaluation research on BMT reviewed here has demonstrated the approach to be an effective, psychologically based training intervention that has been used to produce sustainable improvements in a diverse range of skills and posttraining behavior. We have highlighted features of training design that are associated with larger training effect sizes for learning and transfer outcomes, which can provide practical guidance for the application of BMT in the future.

Rockstuhl et l 2015

Although the term situational judgment test (SJT) implies judging situations, existing SJTs focus more on judging the effectiveness of different response options (i.e., response judgment) and less on how people perceive and interpret situations (i.e., situational judgment). We expand the traditional SJT paradigm and propose that adding explicit assessments of situational judgment to SJTs will provide incremental information beyond that provided by response judgment. We test this hypothesis across 4 studies using intercultural multimedia SJTs. Study 1 uses verbal protocol analysis to discover the situational judgments people make when responding to SJT items. Study 2 shows situational judgment predicts time-lagged, peer-rated task performance and interpersonal citizenship among undergraduate seniors over and above response judgment and other established predictors. Study 3 shows providing situational judgment did not affect the predictive validity of response judgment. Study 4 replicates Study 2 in a working adult sample. We discuss implications for SJT theory as well as the practical implications of putting judging situations back into SJTs. SJT scholars have repeatedly called for research to open the black box of situational judgment in SJTs (Ployhart, 2006; Schmitt & Chan, 2006; Whetzel & McDaniel, 2009). Our study responds to these calls and reinvigorates SJT research by highlighting the importance of putting situational judgments back into SJTs. Specifically, results of our verbal protocol analysis of SJT responses identified the dominant types of situational judgments made. More important, results of two time-lagged, multiple-source studies demonstrate the value of asking respondents to make both situational judgments and response judgments. Results consistently show that situational judgment predicts task performance and interpersonal OCB over and above response judgment and other established predictors. Overall, our results provide timely insights to situational judgment in SJTs and suggest promising benefits— both theoretical and practical—for future research on SJTs.

Jones & Stout 2015

Antinepotism policies are common in work organizations. Although cronyism appears to be commonplace as well, official policing of cronyism is less common. We argue that social connections in some crony relationships and apparently nepotic ones may add considerable value to organizations. We also argue that policing of nepotic relationships can be a form of unfair discrimination when the perception of inequity, rather than its reality, is being policed. Finally we consider effective approaches that simultaneously preserve the value of social connection, avoid the actual ethical breaches associated with some social connections, and avoid any unfair discrimination on the basis of group memberships (in this case, family and friends). Popular stereotypes about the nature and effects of social connections should not be the basis for sweeping, pervasive organizational policies. In fact, given (a) the actual evidence about the effectiveness of SCP, (b) the discriminatory effects of sweeping policies, and (c) the cultural differences in perspectives and practice, I-O psychologists have a professional responsibility to empirically evaluate both the nature of SCPs and their actual risks and benefits. Science-based practice and ethics both demand that I-O psychologists refrain from "engineering" such relationships in work organizations before carefully surveying them. Using the analogy of natural systems (e.g., rivers) and engineering solutions (e.g., dams), organizations trying to deliberately "dam" natural family and friend systems without empirical evidence about their forms and contexts are likely to experience effects that are at least ineffective and potentially catastrophic.

Pitesa & Thau 2018

Based on evolutionary theory, we predicted that cues of resource scarcity in the environment (e.g., news of droughts or food shortages) lead people to reduce their effort and performance in physically demanding work. We tested this prediction in a 2-wave field survey among employees and replicated it experimentally in the lab. In Study 1, employees who perceived resources in the environment to be scarce reported exerting less effort when their jobs involved much (but not little) physical work. In Study 2, participants who read that resources in the environment were scarce performed worse on a task demanding more (carrying books) but not less (transcribing book titles) physical work. This result was found even though better performance increased participants' chances of additional remuneration, and even though scarcity cues did not affect individuals' actual ability to meet their energy needs. We discuss implications for managing effort and performance, and the potential of evolutionary psychology to explain core organizational phenomena. Physical work is part of many jobs even today. We found that in physically demanding work, people reduce their level of effort and performance when they are exposed to cues of environmental resource scarcity. Such cues are frequent in media, conversations, and organizational communication, highlighting the importance of the effect we document for organizations. Our results are consistent with an evolutionary perspective, which we find to be a highly generative framework that can help give new answers to longstanding questions in organizational behavior. Finally, we believe that understanding the physical aspect of work is important and largely neglected by contemporary organizational behavior research. We thus hope that our work inspires a greater use of the evolutionary perspective in explaining employee behavior as well as a greater focus on the physical aspects of work.

Cucina et al 2018

Common criticisms of personality measures 1) low criterion validity and 2) effects of faking Item Empirical keying- involved differentially weighting each item in a personality inventory based on its relationship with the criterion. Option empirical keying involves determining the relationship between each response option and the criterion and assigning weights for endorsement of each response option accordingly. Rational keying involves unit weighting each item and using likert scoring for response options. Found evidence that emperical keying can increase the criterion validity of personality measures and reduce the effects of faking. but still not as good as General mental ability validity.

Hardy et al 2017

Conventional wisdom suggests that assessment length is positively related to the rate at which applicants opt out of the assessment phase. However, restricting assessment length can negatively impact the utility of a selection system by reducing the reliability of its construct scores and constraining coverage of the relevant criterion domain. Given the costly nature of these tradeoffs, is it better for managers to prioritize (a) shortening assessments to reduce applicant attrition rates or (b) ensuring optimal reliability and validity of their assessment scores? In the present study, we use data from 222,772 job-seekers nested within 69 selection systems to challenge the popular notion that selection system length predicts applicant attrition behavior. Specifically, we argue that the majority of applicant attrition occurs very early in the assessment phase and that attrition risk decreases, not increases, as a function of time spent in assessment. Our findings supported these predictions, revealing that the majority of applicants who quit assessments did so within the first 20 min of the assessment phase. Consequently, selection system length did not predict rates of applicant attrition. In fact, when controlling for observed system length and various job characteristics, we found that systems providing more conservative (i.e., longer) estimates of assessment length produced lower overall attrition rates. Collectively, these findings suggest that efforts to curtail applicant attrition by shortening assessment length may be misguided. Collectively, these findings support recommendations that the reliability and validity of assessment scores should be the primary drivers of assessment length, not concerns about applicant attrition rates (Ryan & Huth, 2008). This is not to say that attrition risk is unimportant as a criterion. Rather, we believe organizations need to get more creative when looking for ways to combat this behavior. Fortunately, our findings point to one relatively simple solution that can help combat attrition behavior during assessment. Specifically, we found that more conservative time estimates (i.e., estimates that are slightly longer than actual protocol length) were associated with reduced attrition rates across the systems in our sample. As such, we recommend that organizations favor slight overestimates when quoting applicants the total amount of time they need to set aside for completing the assessments early in the assessment phase. Communicating time expectations up front allows job-seekers to plan their time more effectively, increasing the likelihood that they will see the application through to completion.

Gorman & Rentsch 2009

Frame-of-reference training has been shown to be an effective intervention for improving the accuracy of performance ratings (e.g., Woehr & Huffcutt, 1994). Despite evidence in support of the effectiveness of frame-of-reference training, few studies have empirically addressed the ultimate goal of such training, which is to teach raters to share a common conceptualization of performance (Athey & McIntyre, 1987; Woehr, 1994). The present study tested the hypothesis that, following training, frame-of-reference- trained raters would possess schemas of performance that are more similar to a referent schema, as compared with control-trained raters. Schema accuracy was also hypothesized to be positively related to rating accuracy. Results supported these hypotheses. Implications for frame-of-reference training research and practice are discussed. Previous research has found consistently positive effects of frame-of-reference training for improving rating accuracy. Many researchers have recognized the need for a better understanding of the cognitive mechanisms involved in frame-of-reference training, and consequently, numerous studies have been devoted to examining cognitive issues such as rater memory and recall for performance- related information. Despite the encouraging results of these studies, they have neglected to account for the positive effects of frame-of-reference training in situations in which memory and recall are not relied on heavily (e.g., the recent interest in applying frame-of-reference training to assessment center rating situations; Schleicher et al., 2002). The use of direct structural assessment techniques for measuring performance schemas is highly appropriate for these contexts. The results of the present study are only the first step toward attaining a more complete picture of the complex cognitive mechanisms that underlie rating accuracy.

Kuncle et al 2013 (Meta)

In employee selection and academic admission decisions, holistic (clinical) data combination methods continue to be relied upon and preferred by practitioners in our field. This meta-analysis examined and compared the relative predictive power of mechanical methods versus holistic methods in predicting multiple work (advancement, supervisory ratings of performance, and training performance) and academic (grade point average) criteria. There was consistent and substantial loss of validity when data were combined holistically— even by experts who are knowledgeable about the jobs and organizations in question—across multiple criteria in work and academic settings. In predicting job performance, the difference between the validity of mechanical and holistic data combination methods translated into an improvement in prediction of more than 50%. Implications for evidence-based practice are discussed. The results of this meta-analysis demonstrate a sizable predictive validity difference between mechanical and clinical data combination methods in employee selection and admission decision making. For predicting job performance, mechanical approaches substantially outperform clinical combination methods. In Lens Model language, the Achievement Index (clinical validity) is substantially lower than the Ecological Validity. This finding is particularly striking because in the studies included, experts were familiar with the job and organizations in question and had access to extensive information about applicants. Further, in many cases, the expert had access to more information about the applicant than was included in the mechanical combination. Yet, the lower predictive validity of clinical combination can result in a 25% reduction of correct hiring decisions across base rates for a moderately selective hiring scenario (SR .30; Taylor & Russell, 1939). That is, the contribution our selection systems make to the organization in increasing the rate of acceptable hires is reduced by a quarter when holistic data combination methods are used. Yet, this is an underestimate because we were unable to correct for measurement error in criteria or range restriction. Corrections for these artifacts would only serve to increase the magnitude of the difference between the methods. Despite the results obtained here, it might be argued that one great advantage of the clinical method is that frequent changes in jobs or circumstances will lead to a situation where the equation is no longer appropriate while a clinical assessment can accommodate the change in circumstances. There are three problems with this argument. First, there is no empirical evidence supporting this scenario in the literature. The performance dimensions of jobs have remained quite stable over time. For example, early evaluations of the dimensional structure of the job of managers yielded much the same dimensions as contemporary models (e.g., Borman & Brush, 1993; Campbell, Dunnette, Lawler, & Weick, 1970; Flanagan, 1951). Second, linear models are quite robust to changes in weights. That is, unless the weights suddenly change from positive to negative (another situation that has never been observed in the literature), the overall predictive power of the composite remains strong (Dawes, 1979). Finally, if such a situation were to occur, the use of an expert's subjective weights, integrated into a modified equation, would still outperform the clinician. Small N situations are also sometimes raised as a concern. It is sometimes argued that these settings prevent the use of mechanical methods. This is not the case. Each predictor can be weighted by evidence from the literature (e.g., dominance is typically a moderately valid predictor of leadership effectiveness). The advent of validity generalization and considerable number of meta-analyses in the literature provides some solid ground for differential weighting. Alternatively, expert judgment (preferably aggregated across multiple experts) can be used to set weights (e.g., our stock in-basket should get only nominal attention given the job level and functional area). These values can then be used to weight and combine the assessment results. The field would benefit from additional research that investigates specific, and hopefully controllable, features of the assessment, assessee, and decision process that contribute to reduced predictive power. It is possible that assessors are overly influenced by aspects of candidate's personality or demeanor that are not associated with subsequent job performance. Such evidence could be used for assessor training to reduce such systematic errors and could be combined with methods to increase the use of effective predictors and data combination methods (Kuncel, 2008). Although the results presented here are wholly consistent with a broader literature, ongoing research is important for expanding on the modest number of studies presenting evidence of this comparison. The file drawer problem could also be present although we expect that, given common practice, results would tend to skew in favor of mechanical rather than holistic judgment.

Dimotakis et al 2017

In this field study we examined both positive and negative developmental feedback given in managerial assessment centers in relation to employees' self-efficacy for their ability to improve their relevant skills assessed in the centers, the extent to which they sought subsequent feedback from others at work, and the career outcome of being promoted to a higher level position within the organization. We found that feedback was related to self-efficacy for improvement which was in turn positively related to feedback seeking, which was positively linked to the career outcome of promotion (e.g., feedback leads to self-efficacy for improvement leads to feedback seeking leads to promotion). In addition, we tested boundary variables for the effects of feedback in this model. Both social support for development and implicit theory of ability moderated the effects of negative feedback on self-efficacy. Having more support and believing that abilities can be improved buffered the detrimental impact of negative feedback on self-efficacy. We discuss implications for theory, future research and practical implications drawing upon literature on assessment centers, feedback and feedback seeking, employee development and career success. Assessment centers can be costly to implement for management development purposes, and there is ambiguity in the research literature regarding whether—and under what conditions—there is a beneficial effect of assessment center feedback for organizations and those who are assessed. Research that brings together assessment center feedback, development, psychological mediators or moderators, and career outcomes is rare in the literature and can therefore currently offer significant utility to both theory building and practice. We combined ideas from research on assessment centers, feedback and feedback seeking, employee development and career success to test a model of the relationships among positive and negative feedback from the centers, self-efficacy for the ability to improve relevant skills, feedback seeking behavior and promotion into a higher level job. Positive and negative feedback was related to self-efficacy for improvement such that positive feedback related to higher self-efficacy and negative feedback related to lower efficacy and differences in self-efficacy for improvement subsequently related to greater feedback seeking. Seeking feedback was positively related to career outcomes in the form of being promoted into higher level positions. We uncovered key boundary variables for the effects of feedback in this model, including social support for development and implicit theory of abilities - more support for development in the workplace and believing that it is possible to improve abilities resulted in negative feedback having less of a detrimental effect on self-efficacy.

Champion et al (2016)

Looks at advantages and disdvantages of usig text mining and predictive computer modeling as an alternative to human raters in a selection context. 1) it is possible to program or train a computer to emulate human raters when scoring AR (Accomplishment records) 2)the computer program was capable of producing scores that were reliable as human raters 3) scores produced by computer appeared to demonstrate construct validity 4) computer scoring can result in cost saving 5) the computer would suggest only a small number of different hiring decisions than a three rater hiring panel would.

Pritchard et al 2008 (Meta)

Meta-analytic procedures were used to examine data from 83 field studies of the Productivity Measurement and Enhancement System (ProMES). The article expands the evidence on effectiveness of the intervention, examines where it has been successful, and explores moderators related to its success. Four research questions were explored and results indicate that (a) ProMES results in large improvements in productivity; (b) these effects last over time, in some cases years; (c) the intervention results in productivity improvements in many different types of settings (i.e., type of organization, type of work, type of worker, country); and (d) moderator variables are related to the degree of productivity improvement. These moderator variables include how closely the study followed the original ProMES methodology, the quality of feedback given, whether changes were made in the feedback system, the degree of interdependence of the work group, and centralization of the organization. Implications based on these findings are discussed for future use of this intervention, and the system is discussed as an example for evidence-based management. The single most important overall conclusion from this research is that productivity improvements were very large with the ProMES intervention. Such large improvements suggest that there is great potential in organizations that is not being utilized. In one sense, this is a tragedy because untapped potential is wasted. The problem does not lie with the individuals themselves because the individuals in the studies were the same before and after the productivity improvements. The difficulties must be related to the jobs and the organizational practices. People are working in jobs that severely limit what they can contribute. In another sense, this state of affairs is not a tragedy, because it offers a hopeful challenge for the future. These findings suggest that changes can be made that unlock this huge potential. The task is to discover ways to make these changes. ProMES is one approach, but other approaches can be developed to use this potential and improve the work lives of people in organizations.

Motro & Ellis 2017

Our experiment is aimed at understanding how employee reactions to negative feedback are received by the feedback provider and how employee gender may play a role in the process. We focus specifically on the act of crying and, based on role congruity theory, argue that a male employee crying in response to negative performance feedback will be seen as atypical behavior by the feedback provider, which will bias evaluations of the employee on a number of different outcome variables, including performance evaluations, assessments of leadership capability, and written recommendations. That is, we expect an interactive effect between gender and crying on our outcomes, an effect that will be mediated by perceived typicality. We find support for our moderated mediation model in a sample of 169 adults, indicating that men who cry in response to negative performance feedback will experience biased evaluations from the feedback provider. Theoretical and practical implications are discussed. The expression of emotion at work is an impactful event, particularly when the expression violates our expectations of appropriate behavior. Unfortunately, our expectations are not without bias and, when it comes to crying, we feel it is less appropriate for men to engage in such behavior. When they do, we label them as atypical and downgrade their standing on a number of outcomes that are personally and professionally significant for the employee. Our findings contribute to the literature on gender, role congruity, and the expression of emotion at work; findings that highlight potential discrimination against men in performance feedback settings.

Speer et al 2018

Performance narratives are qualitative text descriptions of an employee's work performance. Despite containing rich information that can be leveraged by practitioners and researchers, few efforts have systematically examined performance narratives. This study investigated whether performance narratives can automatically and reliably be scored into meaningful performance dimensions. Using the Great Eight as a conceptual framework, a custom dictionary was developed and comments were scored via automated text mining. This dictionary, labeled the Great Eight Narrative Dictionary, was then validated against a set of convergent measures to establish construct validity evidence for the derived narrative scores. Inter-rater agreement in linking word phrases to performance dimensions was high, and the derived performance dimensions had acceptable internal consistency. Narrative scores also displayed evidence of construct validity, with an expected pattern of correlations with text scores from an alternative text mining dictionary and with developmental performance ratings made using traditional numerical formats. Collectively, findings support the use of the Great Eight Narrative Dictionary to score performance narratives, and the dictionary is provided openly to facilitate future use. Narrative data are being leveraged with greater frequency in the organizational sciences, and there is much utility in systematically analyzing narrative text in performance contexts. This study sought to examine whether reliable performance themes could be identified within narratives, and results were encouraging. Reliability and construct validity evidence were obtained, and the output of this endeavor is an open-source dictionary to score performance narratives using the Great Eight. We encourage researchers to use this resource and to conduct additional research to improve upon it, as well as to research boundary conditions that are likely to influence the quality of derived scores.

Oostrom et al 2016

Purpose The present study examined two theoretical explanations for why situational interviews predict workrelated performance, namely (a) that they are measures of interviewees' behavioral intentions or (b) that they are measures of interviewees' ability to correctly decipher situational demands. Design/Methodology/Approach We tested these explanations with 101 students, who participated in a 2-day selection simulation. Findings In line with the first explanation, there was considerable similarity between what participants said they would do and their actual behavior in corresponding workrelated situations. However, the underlying postulated mechanism was not supported by the data. In line with the second explanation, participants' ability to correctly decipher situational demands was related to performance in both the interview and work-related situations. Furthermore, the relationship between the interview and performance in the work-related situations was partially explained by this ability to decipher situational demands. Implications Assessing interviewees' ability to identify criteria might be of additional value for making selection decisions, particularly for jobs where it is essential to assess situational demands. Originality/Value The present study made an effort to open the 'black box' of situational interview validity by examining two explanations for their validity. The results provided only moderate support for the first explanation. However, the second explanation was fully supported by these results. Although situational interviews are a valid predictor of job performance, the underlying reasons for why they predict performance have not been resolved, yet. The present study made an effort to open the 'black box' of interview validity by examining two explanations for their validity, namely (a) that the situational interview measures interviewees' behavioral intentions (e.g., Latham 1989; Latham et al. 1980) and (b) that situational interviews measures whether interviewees are able to correctly decipher the situational demands they are faced with in social situations (cf. Kleinmann et al. 2011). We provided the first direct test of the behavioral intentions explanation of situational interview validity. In support of this explanation, we found considerable similarity in what interviewees say they would do and their actual behavior in corresponding situations. Furthermore, we replicated Sue-Chan et al.'s (1995) finding of a positive relationship between self-efficacy and interview performance. In addition, we found that self-efficacy was also positively related to performance on the job simulation. Yet, this last finding would also have been predicted by the second explanation. In contrast to the behavioral intentions explanation, our results indicated that perceived control did not affect situational interview performance and that neither self-efficacy nor control moderated the relationship between situational interview performance and performance on the job simulation. Although we found that the content of interviewees' answers to the situational interview questions was similar to their behaviors when confronted with the same situations in a job simulation, the validity for the situational interview was just as high when the situations in the interview and in the job simulation did not correspond. If situational interviews do capture intentions, their validity should have been higher for corresponding situations compared to non-corresponding situations. Hence, we believe our findings stress that situational interviews are measuring some valuable performance-related information beyond or in addition to behavioral intentions. Our results supported the role of ATIC for situational interview validity: ATIC was a significant predictor of performance in situational interviews and job simulations. Furthermore, ATIC explained part of the validity of the situational interview, so that the correlation between situational interview performance and performance in the simulations dropped when ATIC was partialled out from this relationship. These findings add to the evidence that the assessment of situational demands explains part of the validity of these selection instruments (e.g., Ingold et al. 2015; Jansen et al. 2013). For the ATIC explanation for situational interview validity, it did not matter whether interviewees' actual behaviors were in line with the intentions they conveyed during the interview, because ATIC reflects a general ability that helps individuals to better read the situational demands in varying social situations, including selection and job contexts. Our results supported this view of ATIC as a more general ability, as ATIC from the interview predicted behavior equally well in corresponding as well as non-corresponding situations in the job simulation.

Henle et al 2018

Resume fraud is pervasive and has detrimental consequences, but researchers lack a way to study it. We develop and validate a measure for empirically investigating resume misrepresentations purposely designed to mislead recruiters. In study 1, an initial set of items designed to measure three theorized resume fraud dimensions (fabrication, embellishment, omission) are rated for content validity. In study 2, job seekers complete the measure and its factor structure is evaluated. In study 3, another sample of job seekers is surveyed to verify the measure's factor structure and to provide evidence regarding construct validity. In study 4, working adults who recently conducted a job search are surveyed to determine which individuals are more likely to commit resume fraud and whether resume fraud relates to critical work behaviors. We confirm the three-factor structure of our measure and offer evidence of construct validity by showing that socially desirable responding, Machiavellianism, moral identity, conscientiousness, emotional stability, and agreeableness are related to resume fraud. Additionally, we find that resume fraud predicts reduced job performance and increased workplace deviance beyond deceptive interviewing behavior. Resume fraud is rarely studied despite the negative impact it can have on job-related outcomes. Researchers can use this measure to explore further the antecedents and outcomes of resume fraud and to advise recruiters on how to minimize it. We develop a measure focusing on intentional resume misrepresentations designed to deceive recruiters. This is one of the first studies to examine the antecedents and outcomes of resume fraud. Resume fraud, common among job seekers, can cost organizations their reputations, damage their performance, deteriorate their ethical cultures, and subject them to legal liabilities. To minimize resume fraud impacts on selection processes and on subsequent job performance, managers must be able to identify which job seekers are more likely to intentionally distort their resumes and when they are likely to do so. We hope that our resume fraud measure will encourage continued research into this critical issue.

Berry & Zhao 2015

The longstanding conclusion that cognitive ability test scores overpredict the job performance of African Americans has appropriately been called into question on two main bases: bias in the typical intercept test in predictive bias studies and a lack of control for indirect range restriction. The current study presented a method that draws on the strengths of past research (i.e., meta-analytic estimates) while addressing the two weaknesses of past predictive bias research in employment settings. The results of the current study support the conclusion that cognitive ability test scores do not generally underpredict the job performance of African Americans. This represents some of the strongest evidence to date that cognitive ability tests are not predictively biased against African Americans.

Champion et al 2011

The purpose of this article is to present a set of best practices for competency modeling based on the experiences and lessons learned from the major perspectives on this topic (including applied, academic, and professional). Competency models are defined, and their key advantages are explained. Then, the many uses of competency models are described. The bulk of the article is a set of 20 best practices divided into 3 areas: analyzing competency information, organizing and presenting competency information, and using competency information. The best practices are described and explained, practice advice is provided, and then the best practices are illustrated with numerous practical examples. Finally, how competency modeling differs from and complements job analysis is explained throughout. Best practices: 1. consider organizational context 2. Linking competency models to organizational goals and objectives 3.Start at the top Using rigorous job analysis methods to develop competencies 5. Considering future-oriented job requirements 6. Using additional unique methods 7. Defining the anatomy of a competency (The language of competencies) 8.Define levels of proficiency on competencies 9. Using organizational language 10. Include both fundamental (cross-job) and technical (job-specific) competencies 11. Using competency libraries 12. Achieving the proper level of granularity (number of competencies and amount of detail) 13. Using diagrams, pictures, and heuristics to communicate competency models to employees 14. Using organiational development techniques to ensure competency modeling acceptance and use 15.Using competencies to develop human resource systems (e.g., hiring, appraisal, promotion, compensation) 16. Using competencies to align the human resource systems 17.Using competenies to develop a practical "Theory" of effective job performance tailored to the organization 18. using information technology to enhance the usability of competency models 19. Maintaining the currency of competencies over time 20. Using competency modeling for legal defensibility (e.g., test validation) Whether competency modeling is anything new is a source of debate among I-O psychologists. The term "competency" can be traced back in the applied psychology literature nearly 40 years (e.g., McClelland, 1973). In addition, many aspects of competency modeling have been practiced by job analysis researchers for years. Perhaps what is new is how competency modeling brings together so many of these best practices into one program. The result is an impact on organizations far surpassing that of traditional job analysis and may provide a platform and opportunity for I-O psychologists and our colleagues to elevate our talent discussions in the organizations we serve. We hope that describing these best practices in this paper and illustrating them through the experiences of several large organizations will promote good practice around competencies.We believe the practical advice and examples contained here can guide and inspire more effective and efficient use of competencies. In addition, we hope that the principles and approaches outlined around analyzing, organizing, presenting, and using competency information may guide and inspire greater empirical research on competencies.

Lievens et al 2018

This paper proposed response format as an as-yet unexplored driver of minority-majority differences and contributed to response format and SJT research by examining two newer written Constructed response (CR) formats in SJTs: written and audiovisual formats. Importantly, this paper found substantial reductions in minority-majority differences for these newer CR formats. Across the studies, ds were reduced from .92 (MC format) to .09 (audiovisualCR format). Importantly, the audiovisual CR format also met the provision of "less adverse impact with equal validity." Employers often search for such selection alternatives and this study shows how response format modifications constitute a viable alternative in the interpersonal domain.

Morris et al 2015

Though individual assessments are widely used in selection settings, very little research exists to support their criterion-related validity. A random-effects meta-analysis was conducted of 39 individual assessment validation studies. For the current research, individual assessments were defined as any employee selection procedure that involved (a) multiple assessment methods, (b) administered to an individual examinee, and (c) relying on assessor judgment to integrate the information into an overall evaluation of the candidate's suitability for a job. Assessor recommendations were found to be useful predictors of job performance, although the level of validity varied considerably across studies. Validity tended to be higher for managerial than nonmanagerial occupations and for assessments that included a cognitive ability test. Validity was not moderated by the degree of standardization of the assessment content or by use of multiple assessors for each candidate. However, higher validities were found when the same assessor was used across all candidates than when different assessors evaluated different candidates. These results should be interpreted with caution, given a small number of studies for many of the moderator subgroups as well as considerable evidence of publication bias. These limitations of the available research base highlight the need for additional empirical work to inform individual assessment practices. Overall, the results indicate that individual assessments are useful predictors of job performance, especially for managerial positions. However, the wide range of validities suggests that not all assessment practices are equally valid. Additional research is needed to identify the features of individual assessments that are most effective. We hope that this work will prompt more researchers and practitioners to conduct validation studies on individual assessments and to publish their results. Building the empirical literature on individual assessment validity will provide a foundation for improving the effectiveness of this important area of practice.

Windscheid et al 2016

To attract a gender diverse workforce, many employers use diversity statements to publicly signal that they value gender diversity. However, this often represents a misalignment between words and actions (i.e., a diversity mixed message) because most organizations are male dominated, especially in board positions. We conducted 3 studies to investigate the potentially indirect effect of such diversity mixed messages through perceived behavioral integrity on employer attractiveness. In Study 1, following a 2 2 design, participants (N 225) were either shown a pro gender diversity statement or a neutral statement, in combination with a gender diverse board (4 men and 4 women) or a uniform all-male board (8 men). Participants' perceived behavioral integrity of the organization was assessed. In Study 2, participants (N 251) either read positive or negative reviews of the organization's behavioral integrity. Employer attractiveness was then assessed. Study 3 (N 427) investigated the impact of board gender composition on perceived behavioral integrity and employer attractiveness using a bootstrapping procedure. Both the causal-chain design of Study 1 and 2, as well as the significance test of the proposed indirect relationship in Study 3, revealed that a diversity mixed message negatively affected an organization's perceived behavioral integrity, and low behavioral integrity in turn negatively impacted employer attractiveness. In Study 3, there was also evidence for a tipping point (more than 1 woman on the board was needed) with regard to participants' perceptions of the organization's behavioral integrity. Organizations generally tend to portray themselves as being concerned about or committed to gender diversity, ideally to reflect their commitment to a positive ethical organization (Verbos, Gerard, Forshey, Harding, & Miller, 2007), and pragmatically to try to attract a more gender diverse workforce (Schaubroeck, Ganster, & Jones, 1998). However, as noted by Avery and Johnson (2008), merely stating that a company is committed to diversity is not enough, because stakeholders look for signals and evidence refuting or supporting such claims. In this study, we outlined the central role of women on boards for organizations that seek to attract a gender diverse workforce by publicly espousing their commitment to gender diversity. Our findings suggest that firms "walking the talk" in gender diversity management might be rewarded, while those sending diversity mixed messages might alienate applicants. Without reaching a critical mass of women (at least two) on the executive board, corporate gender diversity statements can be perceived as mere lip service. The negative effects of appointing no or few women to the board while espousing to value gender diversity support the conclusion that actions speak louder than words.

Vial, Brescoll & Napier 2018

Two studies evaluated the lay belief that women feel particularly negatively about other women in the workplace and particularly in supervisory roles. The authors tested the general proposition, derived from social identity theory (Tajfel & Turner, 1979, 2004), that women, compared to men, may be more supportive of other women in positions of authority, whereas men would respond more favorably to other men than to women in positions of authority. Consistent with predictions, data from an online experiment (n 259), in which the authors randomly assigned men and women to evaluate identical female (vs. male) supervisors in a masculine industry, and a correlational study in the workplace using a Knowledge Networks sample (n198) converged to demonstrate a pattern of gender in-group favoritism. Specifically, in Study 1, female participants (vs. male participants) rated the female supervisor as higher status, were more likely to believe that a female supervisor had attained her supervisory position because of high competence, and viewed the female supervisor as warmer. Study 2 results replicated this pattern. Female employees (vs. male employees) rated their female supervisors as higher status and practiced both in-role and extra-role behaviors more often when their supervisor was female. In both studies, male respondents had a tendency to rate male supervisors more favorably than female supervisors, whereas female respondents tended to rate female supervisors more favorably than male supervisors. Thus, across both studies, the authors found a pattern consistent with gender in-group favoritism and inconsistent with lay beliefs that women respond negatively to women in authority positions. As expected, experimental (Study 1) and correlational (Study 2) data converged to show that women (vs. men) viewed female supervisors as higher status. In Study 1 we also found, as predicted, that women (vs. men) rated a hypothetical female supervisor more positively on two fundamental dimensions of social cognition— competence and warmth (Fiske et al., 2007). Study 2 complemented and extended these findings by revealing that female employees versus male employees engaged in more positive behaviors when working for a woman, meeting basic job requirements more often, and going above and beyond those requirements more frequently than male employees when working for a female supervisor. Further, across measures, men had a stronger preference for gender in-group (vs. out-group) supervisors relative to women, in line with past work showing that in-group bias tends to be stronger in high status (vs. low status) groups (Bettencourt et al., 2001). As predicted, women's slight preference for female supervisors coupled with men's stronger preference for male supervisors led to a respondent gender gap in ratings of female supervisors, and no gender difference for male supervisors. These findings are consistent with social identity theory and dovetail past research on in-group favoritism (Dasgupta, 2004; Greenwald & Pettigrew, 2014; Tajfel & Turner, 1979, 2004). They also challenge the popular (but problematic) belief that women are unsupportive of female supervisors (Bryan, 2008; Schupak, 2012). Instead, the studies reported here suggest that women are generally supportive of female leaders. However, our findings do not preclude the possibility that, although discrimination in the United States is illegal, individuals making hiring decisions could feel justified not to place a woman in a supervisory role based on the belief that male employees would resist a female supervisor—a belief that is not inconsistent with our findings. Compared to women, men showed consistently lower acceptance of female supervisors, and many times this gap was driven by male respondents' preference for a male (vs. female) supervisor. In Study 1, men randomly assigned to evaluate a female (vs. male) supervisor viewed the female supervisor as marginally less competent and as deserving significantly lower status. Thus, a female leader in charge of an all-male work unit may be at a significant disadvantage. Further, although female respondents were consistently more supportive of female supervisors compared to male respondents, in-group favoritism was weaker among women (vs. men), and our findings cannot rule out the possibility that, under certain circumstances, women may withdraw their support for female supervisors, and exhibit attitudes and behaviors indistinguishable from men's. For example, given that people often derogate in-group members who deviate from prescriptive group norms and behaviors (Pinto, Marques, Levine, & Abrams, 2010; Rullo, Presaghi, & Livi, 2015) women might resist female supervisors who adopt masculine management styles or who behave in explicitly dominant ways (Williams & Tiedens, 2016), as these behaviors are proscribed for women (Prentice & Carranza, 2002). Future research should examine if perceptions of supervisors' masculinity impact women's support for female supervisors more strongly than it does for men (e.g., Kark, Waismel-Manor, & Shamir, 2012). Our findings are in line with evidence that women (vs. men) tend to harbor more positive implicit and explicit attitudes toward female authority (Richeson & Ambady, 2001; Rudman & Kilianski, 2000; see also Koch, D'Mello, & Sackett, 2015). However, past experiments have often failed to find participant gender differences in attitudes toward female leaders (e.g., Brescoll, 2011; Heilman et al., 2004; Rudman et al., 2012). It is possible that the nature of the work context or the specific responses tested (e.g., compensation and promotion vs. behaviors representing the employee-supervisor relationship) may play a critical moderating role. Further, whereas many past investigations have focused exclusively on hypothetical targets, raising generalizability questions, we contribute evidence that respondent gender can moderate attitudes toward female supervisors both in minimal experimental paradigms (Study 1) as well as in ongoing supervisory relationships (Study 2). Similarly, this is the first investigation that we know of to examine whether male and female employees behave differently depending on the gender of their supervisor, making an important contribution to various literatures and supporting the view that gender bias can have far-reaching consequences in organizational contexts, as we recently proposed (Vial, Napier, & Brescoll, 2016). However, one limitation is that we only examined employee behavior based on correlational data. Additional experimental work is necessary to establish whether the behavioral differences that we observed in Study 2 were due specifically to employee/supervisor gender. Further, although the patterns of employee behavior were consistent with predictions, employee self-reports and supervisor reports of employee behavior may be biased in different ways. For example, employees might overreport positive work behaviors, and both employee and supervisor recall may be influenced by gender stereotypes (e.g., expectations that women help more than men; Farrell & Finkelstein, 2007; Heilman & Chen, 2005). One important next step is to examine more objective measures of employee behavior (e.g., experience-based sampling). Study 2 may also be limited by the lack of industry information, as men and women are distributed differently across industries, and it is thus possible that female-female (vs. male-male) pairs were more likely to come from industries where cooperation and caring behaviors might be common (e.g., health care, education; Eagly & Steffen, 1984; Koenig & Eagly, 2014). Although Study 1 was focused in a masculine industry and found converging results (i.e., suggesting that our findings would generalize across more feminine and more masculine industries), future investigations need to examine whether the behavioral effects we uncovered would hold regardless of industry or field. It could be argued that Study 2 findings may be due to other factors that tend to co-occur with gender. For instance, female managers appear to be more transformational and democratic than male managers (Eagly & Johnson, 1990; Van Engen & Willemsen, 2004), and there is some evidence that female (vs. male) employees respond more positively to this style (Ayman et al., 2009). Study 2 was also potentially limited by participant self-selection, as employees who overall feel closer to their supervisors may be more comfortable asking them to join the study.5 However, the convergence of correlational and experimental data, which consistently revealed the same interaction patterns, suggests that these issues do not necessarily compromise our conclusions. Nevertheless, future investigations should examine whether our findings would generalize to non-White or mixed-race employee-supervisor pairs, as the majority of employees and supervisors in Study 2 were White (see the online supplementary materials). Although the sample was drawn from a nationally representative panel, it is possible that non-White employees may feel uncomfortable approaching White supervisors about joining the study, resulting in a disproportionately White sample. In line with social identity theory, we would expect that greater demographic similarity between supervisor and employee would lead to more positive employee behaviors. These limitations notwithstanding, across two studies, women compared to men were more supportive of female supervisors, and this difference appeared to be driven by an overall preference for same-gender (vs. other-gender) supervisors, which was more robust among male respondents compared to female respondents. These findings suggest that gender intergroup dynamics play a more complex role in attitudes toward female supervisors than is often portrayed in the literature.

Cottrell et al 2015

We examined the extent to which specific developmental conditions could account for Black-White gaps in cognitive test scores, to theoretically explain the origins of adverse impact. Results showed that Black-White gaps were large at every time point from 54 months to 15 years of age, but that the gap did not grow (or shrink) over time. Finally, we accounted for the association between race and cognitive test scores using our covariates, and provided a parsimonious theoretical model that explicates relations among the covariates, offering a 3-Step Model of the relations among the racially disparate conditions that appear to give rise to the race gap in test scores. This study therefore attempts to fill a hole in adverse impact theory by pinpointing how cognitive test score gaps might arise.

Sackett & Schmidt 2012 (Meta)

We react to the Van Iddekinge, Roth, Raymark, and Odle-Dusseau (2012a) meta-analysis of the relationship between integrity test scores and work-related criteria, the earlier Ones, Viswesvaran, and Schmidt (1993) meta-analysis of those relationships, the Harris et al. (2012) and Ones, Viswesvaran, and Schmidt (2012) responses, and the Van Iddekinge, Roth, Raymark, and Odle-Dusseau (2012b) rebuttal. We highlight differences between the findings of the 2 meta-analyses by focusing on studies that used predictive designs, applicant samples, and non-self-report criteria. We conclude that study exclusion criteria, correction for artifacts, and second order sampling error are not likely explanations for the differences in findings. The lack of detailed documentation of all effect size estimates used in either meta-analysis makes it impossible to ascertain the bases for the differences in findings. We call for increased detail in meta-analytic reporting and for better information sharing among the parties producing and meta-analytically integrating validity evidence. Both meta-analyses (Ones et al., 1993; Van Iddekinge et al., 2012a) concluded that integrity tests display meaningful relationships with job performance and counterproductive work behavior, though Ones et al. reported substantially larger effect sizes, particularly in the case of predictions of job performance. Given the data available, it is impossible to ascertain why these estimates differ. There are perhaps two plausible explanations. One is the inability of Van Iddekinge et al. to obtain a large portion of the available data base on the validity of integrity tests. A second possible reason is that effect sizes were calculated differently by the two groups of authors. The latter possibility underscores the necessity of reporting the nature of such calculations when reporting meta-analyses. We believe this interchange highlights several broader issues both for the integrity literature per se and for the progress of our discipline in general. We summarized several broader issues raised by this interchange including the need for developing better modes (and perhaps climate) for data sharing across members of the academic community as well as between academics and members of the practice community who have multiple proprietary concerns. We also think that more attention should be directed to the constructs addressed by the various measures of integrity and how they might impact the integrity test-outcome relationships. Questions about the quality of the primary data base that is the subject of metaanalyses and the nature and appropriateness of corrections to observed relationships were also mentioned as areas that should be addressed by scholars interested in this area of research.

Krumm et al 2014

Whereas situational judgment tests (SJTs) have traditionally been conceptualized as low-fidelity simulations with an emphasis on contextualized situation descriptions and context-dependent knowledge, a recent perspective views SJTs as measures of more general domain (context-independent) knowledge. In the current research, we contrasted these 2 perspectives in 3 studies by removing the situation descriptions (i.e., item stems) from SJTs. Across studies, the traditional contextualized SJT perspective was not supported for between 43% and 71% of the items because it did not make a significant difference whether the situation description was included or not for these items. These results were replicated across construct domains, samples, and response instructions. However, there was initial evidence that judgment in SJTs was more situational when (a) items measured job knowledge and skills and (b) response options denoted context-specific rules of action. Verbal protocol analyses confirmed that high scorers on SJTs without situation descriptions relied upon general rules about the effectiveness of the responses. Implications for SJT theory, research, and design are discussed. As "situational" is an explicit part of the term SJT, it has been traditionally assumed that test takers require a contextual description for solving SJT items. This study did not take this assumption for granted and sought to examine how "situational" judgment on SJT items actually is. We found that even without a contextual description test takers could solve on average more than half of the items of various existing SJTs. This result does not support the traditional contextualized perspective underlying SJTs for a large set of SJT items. There was initial evidence that judgment in SJTs became more situational when (a) items measured job knowledge and skills and (b) response options denoted context-specific rules of action. Conceptually, these results bring up questions regarding the interactionist assumptions underlying SJTs. At a practical level, they suggest that generic SJT items that approximate the work context might suffice for SJTs developed for entry-level, admissions, and recruitment purposes. Future SJT research should examine when and why contextualization matters for SJT performance, applicant perceptions, and validity.


Related study sets

UCONN Marcus. W. Garcia SOCI 1501 EXAM 2

View Set

Chapter 01: Professional Nursing Practice

View Set

Project Management Final Quizzes (8-13, no 9)

View Set

Net+ 4.3 Device Connection Facts

View Set

(Hard -- 30 questions) Solving Compound Inequalities from graph and equations

View Set

ECON-2302: Chapter 7 (Production and Costs)

View Set