H4. Performance Appraisal. Industrial and organizational Psychology

Ace your homework & exams now with Quizwiz!

Four Steps in Developing a Behavior-Focused Rating Form to Assess Job Performance

-Step 1: Perform job analysis to define job dimensions -Step 2: Develop descriptions of effective and ineffective job performance from critical incidents -Step 3: Have knowledgeable judges place descriptions into job dimensions -Step 4: Have knowledgeable judges rate the effectiveness of the descriptions

Three concepts help explain this situation:

-criterion contamination, -criterion deficiency, and -criterion relevance.

Managers' views of subordinate motivation can be a factor in their ratings of job performance, but interestingly such views can be subject to cultural factors. DeVoe and Iyengar (2004) assessed manager perceptions of their employees as being ...

-intrinsically motivated (wanting to do a good job for its own sake) and -extrinsically motivated (working hard for rewards) and then linked those perceptions to ratings of job per- formance. American and Latin managers considered intrinsic motivation to be more important for performance than extrinsic motivation, whereas Asian managers considered both types of motivation to be equally important.

WHY DO WE APPRAISE EMPLOYEES?

-Administrative Decisions. -Employee Development and Feedback. -Research.

Werner (1994) conducted a study in which he asked experienced supervisors to rate the performance of secretaries as described in a series of incidents. One of the variables of interest in this study was the sort of information that the supervisors used in making their ratings. Werner found that the following dimensions were seen as most important:

-Attendance -Job knowledge -Work accuracy -Work quantity

There are several different types of behavior-focused rating forms. We will discuss three of them:

-Behaviorally Anchored Rating Scale (Smith & Kendall, 1963) BARS -Mixed Standard Scale (Blanz & Ghiselli, 1972) MSS -Behavior Observation Scale (Latham & Wexley, 1977) BOS All three of these scales provide descriptions of behavior or performance rather than traits, but they differ in the way they present the descriptions and/or the responses.

METHODS FOR ASSESSING JOB PERFORMANCE The job performance of individuals can be assessed in many ways. The most common procedures can be divided into two categories—

-objective performance measures and -subjective judgments. Objective measures are counts of various behaviors (e.g., number of days absent from work) or of the results of job behaviors (e.g., total monthly sales). Subjective measures are ratings by people who should be knowledgeable about the person's job performance. Usually supervisors provide job performance ratings of their subordinates.

Using objective measures to assess job performance has several advantages. -First, it can be easy to i...... -Second, the quantitative nature of objective measures makes it easy to ....... -Third, objective measures can be tied directly to ..... -Finally, objective measures can often be found in .......

-First, it can be easy to interpret the meaning of objective measures in relation to job performance criteria. For example, it is obvious that no absences in the past year is a good indicator of satisfactory attendance and that four work-related traffic accidents in the prior 6 months are an indicator of unsatisfactory driving performance. -Second, the quantitative nature of objective measures makes it easy to compare the job performance of different individuals in the same job. For attendance measures, comparisons can be made of individuals across different jobs as long as they all require that the person work on a particular schedule. -Third, objective measures can be tied directly to organizational objectives, such as making a product or providing a service. -Finally, objective measures can often be found in organizational records, so that special performance appraisal systems do not have to be initiated. These data often are collected and stored, frequently in computers, for reasons other than employee performance appraisal, making performance appraisal a relatively easy task to accomplish.

In this chapter, we are concerned with issues involved in the appraisal of employee job performance. Issues:

-First, there is the issue of the criterion or standard of comparison by which performance is judged and measured. Before we can appraise performance, we must have a clear idea of what good performance is. -Once we know that, we can address the second issue of developing a procedure to assess it. Performance appraisal is a two-step process of first defining what is meant by good performance (criterion development) followed by the implementation of a procedure to appraise employees by determining how well they meet the criteria. Before we discuss criteria and procedures for appraising performance, we look at the major reasons for engaging in this potentially time-consuming activity.

TABLE 4.5 Three Items for an MSS to Assess the Dimension of Relations With Other People -Good Performance.... -Satisfactory Performance... -Poor Performance....

-Good Performance Is on good terms with everyone. Can get along with people even when he or she doesn't agree with them. -Satisfactory Performance Gets along with most people. Only very occasionally does he or she have conflicts with others on the job, and these are likely to be minor. -Poor Performance Has the tendency to get into unnecessary conflicts with other people. Note: Each item is rated on the following scale: For each item on the scale, indicate if the employee is: -Better than the item -As good as the item -Worse than the item

Subjective Measures of Job Performance Subjective measures are the most frequently used means of assessing the job performance of employees. Most organizations require that supervisors complete performance appraisal rating forms on each of their subordinates annually. There are many types of rating forms that different organizations use to assess the performance of their employees. In this section, we discuss several different types.

-Graphic Rating Forms -Behavior-Focused Rating Forms Behaviorally Anchored Rating Scale (Smith & Kendall, 1963) Mixed Standard Scale (Blanz & Ghiselli, 1972) Behavior Observation Scale (Latham & Wexley, 1977)

Job performance measures can be classified as either objective or subjective. Objective measures are ..... Subjective measures are .....

-Objective measures are counts of the output of a job, such as the number of sales for a salesperson or the number of units produced for a factory worker. Subjective measures are ratings by supervisors (or other individuals who are familiar with the person's job performance). -Subjective measures are the more commonly used of the two methods, but they suffer from biases and errors attributable to human judgment. Two different approaches have been taken to reduce rating errors in subjective measures: design of the rating form and rater training.

Models of the Rating Process There have been several competing models of the cognitive processes that influence ratings of performance (e.g., DeNisi, Cafferty, & Meglino, 1984; Feldman, 1981). These models suggest that the rating process involves several steps (see Ilgen, Barnes-Farrell, & McKellin, 1993), including:

-Observing performance -Storing information about performance -Retrieving information about performance from memory -Translating retrieved information into ratings

Control of Rater Bias and Error Two approaches have been developed to control and eliminate rater bias and error. One approach is ...... The other is to ........

-One approach is to design better performance appraisal forms that will be resistant to these problems. -The other is to train raters to avoid rating errors. Although both approaches have shown promise, research studies have yielded conflicting results about their ability to reduce errors (Bernardin & Beatty, 1984).

Development of Behavior-Focused Forms Development of behavior-focused forms takes considerable effort from several people in an organization. Because such a form focuses on specific behaviors, it must be developed for a specific job or family of jobs. The process involves four steps and can take a long time to complete. Step 1 is a ..... Step 2 involves ..... Step 3 involves having ...... The final step is to ......

-Step 1 is a job analysis that identifies the specific dimensions of performance, such as making arrests and writing reports for a police officer. -Step 2 involves writing the descriptions of behaviors that vary in their effectiveness or ineffectiveness on the job. This can be done by collecting critical incidents from people who are knowledgeable about the job in question, such as employees who do the job or their supervisors. Critical incidents can provide examples that vary from extremely effective to extremely ineffective performance. -Step 3 involves having judges (knowledgeable people) sort the descriptions of behav- ior into dimensions to verify that the descriptions reflect the intended dimensions. -The final step is to have judges rate the descriptions of behavior along a continuum of effec- tiveness. With a BARS, these ratings allow for the placement of the descriptions along

TABLE 4.9 Six Points of a Legally Defensible Performance Appraisal System

1. Perform job analysis to define dimensions of performance 2. Develop rating form to assess dimensions from step 1 3. Train raters in how to assess performance 4. Have higher management review ratings, and allow employees to appeal their evaluations 5. Document performance, and maintain detailed records 6. Provide assistance and counseling to poor-performing employees prior to actions against them

The Mixed Standard Scale (MSS) provides the rater with a list of behaviors that vary in their effectiveness. For each statement, the rater is asked to indicate if:

1. The ratee is better than the statement 2. The statement fits the ratee 3. The ratee is worse than the statement

Werner and Bolino (1997) analyzed the outcomes of 295 U.S. court cases in which performance appraisals were challenged as being discriminatory. Performance appraisal systems that were based on a job analysis, gave written instructions to raters, offered employees the opportunity to have input, and used multiple raters were far less likely to result in an organization losing the case. For example, while overall organizations lost 41% of cases, those that used multiple raters lost only

11% of cases. The use of these four practices combined should result in a relatively safe performance appraisal system from a legal perspective.

There are two ways to deal with the complex nature of criteria. The composite criterion approach involves combining individual criteria into a single score. If employees receive a number to represent performance on each of four dimensions, a composite would be the average of the four dimension scores for each employee. If a person received the following performance scores on a 1-to-5 scale:

Attendance = 5 Professional appearance = 4 Work quality = 4 Work quantity = 5 his or her composite performance score would be the average of the dimension scores, or 4.5, computed as (5 + 4 + 4 + 5)/4. A grade point average would be a composite score for school performance.

Example of a Performance Appraisal Form With Eight Criterion Dimensions Rating Categories

Dimension -Poor -Fair -Adequate -Good -Outstanding Attendance Communicating with others Following directions Instructing others Motivating others Professional appearance Work quality Work quantity

TABLE 4.3 Examples of Objective Measures of Job Performance

Performance Measure -Absences Days absent per year -Accidents Number of accidents per year -Incidents at work (e.g., assaults) Number of incidents per year -Latenesses Days late per year -Productivity (e.g., sales) Dollar amount of sales

Cognitive Processes Underlying Ratings The development of sound performance appraisal methods requires that we understand the cognitive processes that affect rating behavior. I/O psychologists have studied these processes and have devised several models to explain ratings. Some of these models focus on ..... Others are concerned with .....

Some of these models focus on how people utilize information to make judgments. Others are concerned with how people's views of job performance influence their evaluation of an employee.

Five common objective measures of job performance are listed in Table 4.3. Each is an objective count of the number of behaviors or amount of work produced. Such data are usually found in organization records, but they can be collected specifically to assess performance.

Two of the measures are concerned with attendance—number of times absent and number of times late for work. Accidents include both automotive and nonautomotive, such as being injured by a machine in a factory. Incidents are the number of times the individual is involved in a work incident that is considered important for the particular job. For example, in a psychiatric inpatient facility incident reports record the number of times a staff person is assaulted by a patient. For a police officer, shooting incident reports become part of the employee's record. Productivity is the amount of work produced by an individual.

Several factors have been shown to relate to job performance ratings, although it is not entirely clear whether they result in rater bias or not.

Whether the rater likes the subordinate, rater mood, perceived motives of the employee for performance, cultural factors, and both rater and ratee race all affect ratings.

Perhaps the most promising is frame of reference training (Day & Sulsky, 1995), which attempts to provide

a common understanding of the rating task. Raters are given specific examples of behavior that would represent various levels of performance for each dimension to be rated. Results with this kind of training have thus far proven to be promising in increasing rating accuracy and providing the rater with a more accurate understanding of the criteria for good performance (Gorman & Rentsch, 2009). One limitation to this line of research is that it has mostly been conducted in laboratory settings with college students, so it is not clear how well the results would generalize to managers rating their employees in the field.

Feedback from multiple sources can be helpful for employees wishing to improve their performance. Managers receive 360-degree feedback from

a comparison of their self-ratings with those of peers, subordinates, and supervisors.

Another concern with halo has been explaining cognitive processes that would lead a rater to exhibit halo error. Several researchers have theorized that raters rely on

a general impression of the employee when making dimension ratings (Lance, LaPointe, & Fisicaro, 1994; Nathan & Lord, 1983). According to this view, salient pieces of information are used to form a general impression of an employee. The impression forms the basis of performance ratings. This suggests that raters may be better able to provide information about global performance than dimensions of performance.

A BARS performance evaluation form contains several individual scales, each designed to assess an important dimension of job performance. A BARS can be used to assess the same dimensions as

a graphic rating form. The major difference is that the BARS uses response choices that represent behaviors, while the graphic rating form asks for a rating of how well the person performs along the dimension in question. Thus both types of rating forms can be used to assess the same dimensions of performance for the same jobs.

The attendance measures are applicable to the majority of jobs because most have scheduled work hours. For jobs that are unstructured in terms of work schedule (e.g., college professor), attendance is not a criterion for job performance. The other three objective measures are specific to

a particular job. For example, the type of incidents recorded is a function of the nature of the job and job environment. Records of incidents of assaults by students might be kept for urban public school teachers, but they are not likely to be kept for college professors. Teachers are assaulted relatively frequently in large American cities, but college professors are rarely the target for violence.

Error-Resistant Forms to Assess Performance The behavior-focused rating scales, such as the BARS and the MSS, were originally developed in part to eliminate rating errors. The idea is that raters will be

able to make more accurate ratings if they focus on specific behaviors rather than traits. These behaviors are more concrete and require less idiosyncratic judgment about what they represent. For example, it should be easier to rate accurately how often a person is absent from work than the somewhat abstract trait of dependability.

Imagine that you are a manager for a large organization and you are given the task of determining how well your subordinates are doing their jobs. How would you go about appraising their job performance to see who is and who is not doing a good job? Would you watch each person perform his or her job? If you did watch people, how would you know what to look for? Some people might appear to work very hard but in fact

accomplish little that contributes to the organization's objectives. For many jobs, it might not be readily apparent how well a person is doing just by observing him or her unless you have a good idea of what constitutes good job performance. Performance is best appraised by measuring a person's work against a criterion or standard of comparison.

Although a pattern of similar ratings might indicate a rating error, it is possible that employee performance is consistent across dimensions. This means that halo patterns might

accurately indicate that dimensions of actual performance are related. This possibility has led to considerable discussion in the I/O literature about the meaning of halo (e.g., Balzer & Sulsky, 1992; Murphy, Jako, & Anhalt, 1993; Solomonson & Lance, 1997; Viswesvaran, Schmidt, & Ones, 2005). Part of this discussion concerns how to separate the error from "true" halo. True halo means that an employee performs at the same level on all dimensions.

True halo means that

an employee performs at the same level on all dimensions.

Criterion deficiency means that the actual criterion does not adequately cover the entire theoretical criterion. In other words, the actual criterion is

an incomplete representation of what we are trying to assess. This concept was referred to in Chapter 2 as content validity. For example, student achievement test scores in mathematics could be used as an actual performance criterion for elementary school teachers. It would be a deficient criterion, however, because elementary school teachers teach more than just mathematics. A less deficient criterion would be student scores on a comprehensive achievement test battery, including mathematics, reading, science, and writing.

In the United States and many other countries, performance appraisal is a legal process as well as a technical one. Organizations are required by U.S. law to avoid discrimination in their performance appraisal procedures. Failure to comply with such legal requirements can make

an organization subject to lawsuits. A number of specific practices, such as basing the system on a job analysis and providing rater training, reduce the chances that an organization will lose in court if challenged.

Graphic Rating Forms The most popular type of subjective measure is the graphic rating form, which is used to

assess individuals on several dimensions of performance. The graphic rating form focuses on characteristics or traits of the person or the person's performance. For example, most forms ask for ratings of work quality and quantity. Many include personal traits such as appearance, attitude, dependability, and motivation.

Level of Specificity Most jobs are complex and involve many different functions and tasks. Job performance criteria can be developed for individual tasks or for entire jobs. For some purposes, it may be better to assess performance on an individual task, such as making arrests for a police officer or selling products for a salesperson, whereas for other purposes the entire person's job performance is of interest. For developing an employee's skills, it is better to focus

at the individual task level so that feedback can be specific. The person might be told that he or she types too slowly or makes too many errors. This sort of specific feedback can be helpful for an employee who wishes to improve performance. For administrative purposes, overall job performance might be of more concern. The person who gets promoted might be the one whose overall performance has been the best.

Content of Subordinate Effectiveness If schemata affect job performance ratings, it is important that we understand the schemata of people who appraise performance. In other words, appraisal techniques might be improved if they were designed to effectively utilize the schemata of supervisors. If the dimensions on an appraisal form match the dimensions in supervisors' schemata about performance, it will

be easier for supervisors to do their ratings. There has been some research that is relevant to this issue.

Behavior-Focused Rating Forms The graphic rating forms just discussed focus on dimensions that are trait oriented, such as dependability, or on general aspects of performance, such as attendance. The behavior- focused forms concentrate on

behaviors that the person has done or could be expected to do. Behaviors are chosen to represent different levels of performance. For attendance, an example of a good behavior would be "can be counted on to be at work every day on time," whereas a poor behavior would be "comes to work late several times per week." The rater's job is to indicate which behaviors are characteristic of the person being rated. The way in which the form is scored is dependent on the particular type of form.

Allowing employees to have input into performance appraisals also has benefits beyond legal issues. Research has shown that giving employees the opportunity to sit down with supervisors and discuss appraisals openly can lead to

better attitudes (Korsgaard & Roberson, 1995). In one study, this occurred even though employees who were allowed input actually had lower ratings than those who were not (Taylor, Tracy, Renard, Harrison, & et al., 1995). Perceptions of fairness in this study even reduced employee intentions of quitting the job. To be effective and perceived as fair, performance appraisal systems should include Barrett and Kernan's (1987) six steps, as well as input by employees.

Contamination, Deficiency, and Relevance Our actual criteria are intended to assess the underlying theoretical criteria of interest. In practice, however, our actual criteria are imperfect indicators of their intended theoretical performance criteria. Even though an actual criterion might assess a piece of the intended theoretical criterion, there is likely some part of the theoretical criterion that is left out. On the other hand, the actual criterion can be

biased and can assess something other than the theoretical criterion. Thus the actual criterion often provides only a rough estimate of the theoretical criterion it is supposed to assess.

Criterion contamination refers to that part of the actual criterion that reflects something other than what it was designed to measure. Contamination can arise from

biases in the criterion and from unreliability. Biases are common when people's judgments and opinions are used as the actual criterion. For example, using the judgments of art experts as the actual criterion for the quality of someone's artwork can reveal as much about the biases of the judges as it does about the work itself. Because there are no objective standards for the quality of art, experts will likely disagree with one another when their judgments are the actual criterion for performance.

The composite approach is preferred for ... The multidimensional approach is preferred when....

comparing the performance of individual employees. It is easier to compare employees when each has a single performance score. The multidimensional approach is preferred when feedback is given to employees. It gives specific information about the various dimensions of performance rather than general feedback about overall performance.

There are two places in which we see technology having an impact on performance appraisal — monitoring of objective productivity and implementation of performance management systems. Many employees today work on computer systems, such as reservation specialists for airlines and telephone operators. The systems that allow them to

complete their job tasks are also capable of tracking productivity, and such data are routinely collected in many organizations. The use of computers allows for the easy analysis of performance across millions of employee-customer transactions, and it can be a built-in feature of the task software employees use to do their jobs each day.

A graphic rating form consists of a multipoint scale and several dimensions. The scale represents a

continuum of performance from low to high and usually contains from four to seven values. The scale in the table contains five scale points, ranging from "poor" to "outstanding," with "adequate" in the middle. The form also contains several dimensions of job performance along which the employee is to be rated. This form includes attendance and work quality. To use the form, a supervisor checks off his or her rating for each of the dimensions.

Objective measures are often deficient as indicators of job performance criteria. They tend to focus on specific behaviors, which may be only part of the criterion, and they may ignore equally important parts (Borman, Bryant, & Dorio, 2010). Measures of productivity focus on work quantity rather than quality. Although quantity might be more important in some jobs, it is difficult to imagine a job in which quality is not also somewhat important. Finally, what is reflected in an objective measure is not necessarily under the control of the individual being assessed (Borman et al., 2010). Differences in the productivity of factory workers can be caused by

differences in the machinery they use, and differences in the sales performance of salespeople can be caused by differences in sales territories. A person who is assaulted at work may have done nothing wrong and may have been unable to avoid the incident. A police officer who uses his or her weapon might have been forced into it by circumstances rather than poor job performance. In using objective measures to assess individuals, these other factors should be taken into account.

Barrett and Kernan (1987) suggested six components that should be part of a legally defensible performance appraisal system. As shown in Table 4.9, the system should begin with a job analysis to derive the dimensions of performance for the particular job. The job analysis will ensure that the dimensions chosen are job relevant. Raters should receive training in how the rating form is to be used to assess performance. To help minimize personal bias, upper management should review performance appraisals. Performance and the reasons for the employee action should be

documented and recorded. It is easier to take action against an employee when the performance, good or poor, has been documented for a long period of time. This eliminates the appearance that the latest appraisal was given to justify a particular action affecting an employee. Finally, it is a good idea to provide assistance and counseling to employees whose performance is unsatisfactory. This shows that the organization has done everything possible for an unsatisfactory employee before taking action against him or her.

Contextual Performance Criteria for most jobs concern tasks that are specifically required and are certainly listed in a job analysis of that job. However, it has been recognized that employees do a great deal more for organizations than what is required, and these extra behaviors are essential for organizations to function smoothly. Contextual performance consists of

extra voluntary things employees do to benefit their coworkers and organizations, such as volunteering to carry out extra tasks or helping coworkers (Borman, Buck, Hanson, Motowidlo, Stark, & Drasgow, 2001). Although not specifically required, contextual performance is noticed and appreciated by managers, and their ratings of subordinate performance will be affected by it (Johnson, 2001). This all suggests that contextual performance should be considered in developing criteria for jobs.

Research comparing the behavior-focused rating forms with other types of measures has

failed to find consistent evidence for greater accuracy.

The multidimensional approach is preferred when

feedback is given to employees. It gives specific information about the various dimensions of performance rather than general feedback about overall performance.

The idea that supervisors give better ratings to subordinates they like is supported by research (e.g., Ferris, Judge, Rowland, & Fitzgibbons, 1994). This has concerned some people that ratings might be biased and reflect favoritism. However, there is some evidence that liking can be the result of

good job performance, as supervisors like those who work well for them (Robbins & DeNisi, 1994). It is particularly important for a new employee to be seen as a good performer because that perception will likely lead to being liked by supervisors, which can result in receiving extra support that leads to even better performance in the future (Lefkowitz, 2000).

There are several dimensions of performance in an MSS, and each dimension has several behaviors associated with it. An example of three statements that reflect performance for the dimension of Relations with The three statements represent

good, satisfactory, and poor job performance along the dimension.

These within-form and across-form patterns are called

halo and distributional errors, respectively.

PERFORMANCE CRITERIA A criterion is a standard against which you can judge the performance of anything, including a person. It allows you to distinguish good from bad performance. Trying to assess performance without criteria is like

helping a friend find a lost object when the friend will not tell you what it is. You cannot be of much help until you know what it is you are looking for. In a similar way, you cannot adequately evaluate someone's job performance until you know what the performance should be.

Halo Errors Halo error occurs when a rater gives an individual the same rating across all rating dimensions, despite differences in performance across dimensions. In other words, if the person is rated as being outstanding in one area, he or she is rated outstanding

in all areas, even though he or she may be only average or even poor in some areas. For example, a police officer might be outstanding in completing many arrests (high quantity) but might do a poor job in paperwork. A supervisor might rate this officer high on all dimensions, even though it is not uniformly deserved. Similarly, if a person is rated as poor in one area, the ratings are poor for all areas, even though he or she may be satisfactory on some performance dimensions. This rating error occurs within the rating forms of individuals as opposed to occurring across the forms of different individuals.

Objective Measures of Job Performance Organizations keep track of many employee behaviors and results of behaviors. Human resource departments record the number of absences, accidents, incidents, and latenesses for each employee. Some organizations keep track of the productivity of each employee, as well. Productivity data must be collected if an organization has an

incentive system that pays employees for what they produce, such as a commission or piece rate.

Deadrick, Bennett, and Russell (1997) pointed out that employee performance tends to improve over time, at least early in an employee's tenure, and that factors that determine the performance of new employees are not necessarily the same as those that determine later performance improvement. Thus looking at people's performance over time will show that

it is variable and that the best performers don't necessarily remain the best performers in the long run.

WHY DO WE APPRAISE EMPLOYEES? The first question that we address is the rationale for organizations to appraise the performance of their employees. Performance appraisal can be a time-consuming chore that most managers and their subordinates dislike. Why then do most large organizations appraise employee job performance at least once per year? The reason is that

job performance data can benefit both employees and organizations. Performance data can be used for administrative decisions, employee development and feedback, and research to determine the effectiveness of organizational practices and procedures.

Employee Development and Feedback In order for employees to improve and maintain their job performance and job skills, they need

job performance feedback from their supervisors. One of the major roles of supervisors is to provide information to their subordinates about what is expected on the job and how well they are meeting those expectations. Employees need to know when they are performing well so that they will continue to do so, as well as when they are not so that they can change what they are doing. Even employees who are performing well on the job can benefit from feedback about how to perform even better. Feedback can also be helpful in telling employees how to enhance their skills to move up to higher positions.

U.S. government employees have been reinstated because of long records of satisfactory performance on the job. The United States is not the only country that has laws requiring administrative decisions to be based on job performance. In Canada, for example, the legal requirement that employee firing must be based on

job performance has been extended to private companies, as well as the government.

Many studies have compared the various behavior-focused rating forms with graphic rating forms, as well as with one another. These comparisons have found that sometimes the behavior-focused forms yield fewer errors (such as halo and leniency) than the graphic rating scales and sometimes they do not (Bernardin & Beatty, 1984; Latham, Skarlicki, Irvine, & Siegel, 1993). Furthermore, scales that merely ask raters to check whether or not individuals have engaged in specific behaviors may result in

less leniency than graphic rating scales (Yun, Donahue, Dudley, & McFarland, 2005). After reviewing the literature on rating forms, Borman et al. (2010) concluded that there is little advantage to using behavior-based scales over graphic rating scales. It seems that efforts to improve rater accuracy should focus on things other than the design of the rating instruments.

Computerized, web-based employee performance management systems help managers clarify goals and expectations, provide coaching and feedback, and evaluate performance (see the I/O Psychology in Practice case study in this chapter). Such systems automate the entire process, making 360-degree feedback systems economically feasible for large companies. Each target manager can

log onto the system and nominate the peers, subordinates, and others who will provide ratings. The system notifies individuals to do those ratings and then pulls together all of the rating information into a report. Consulting firms can be found that specialize in providing the computer services to conduct a 360-degree feedback project.

It has been well established that Black employees on average receive lower perfor- mance appraisal ratings than White employees (McKay & McDaniel, 2006). Interestingly the race of the rater seems to have no effect on ratings for Whites, but it does on ratings for Blacks. As shown by Stauffer and Buckley (2005), Black and White raters give similar ratings to Whites and rated Blacks

lower on average than Whites. However, that difference between the ratings is much larger for White than for Black raters. If it is presumed that Black raters would have less bias than White raters against Black employees, these findings suggest the possibility that White raters are biased against Black employees. Ofcourse, alterative explanations are that Black raters are biased in favor of Blacks and overrate them and that both Black and White raters are biased in favor of Whites and overrate them relative to Blacks. At this time, we don't know the extent to which bias is operating in these ratings either for or against Black and White employees.

The mood of the rater at the time of appraisal can affect ratings. In a laboratory study, Sinclair (1988) assigned participants to a condition in which their mood was experimentally manipulated to be more depressed or elated. They were then asked to read a description of a professor's behavior and rate that professor's performance. Results showed that participants in a depressed mood rated the professor's performance

lower than subjects in the elated mood condition. The depressed participants were also more accurate and exhibited less halo. Sinclair explained the results as reflecting the better information-processing ability of people when they are in depressed moods.

Unreliability in the actual criterion refers to errors in measurement that occur any time we try to assess something. Measurement error is part of the

measurement process and is comprised of random errors that make our measurement inaccurate. It is reflected in the inconsistency in measurement over time. If we were to assess the job performance of someone repeatedly over time, the measure of performance would vary from testing to testing even if the performance (theoretical criterion) remained constant. This means that our actual performance criterion measures will have less than perfect reliabilities.

Criterion Complexity Because most jobs involve multiple tasks and most tasks can be evaluated from several perspectives, criteria can become quite complex. Job performance even on a single task can usually be assessed along a quality dimension (how well the person does the task) and a quantity dimension (how much or how quickly the person does the task). The complexity of job performance means that

multiple criterion measures are necessary to assess performance adequately. These might involve only quality, only quantity, or both. It can be at the level of specificity of a single task or at the level of the person's entire job. The nature of the job and the purposes of the assessment information determine the nature of the criteria that are used, as well as the level of specificity.

Use of the frequency ratings has been criticized by Kane and Bernardin (1982). They point out that frequency of a behavior is ....

not a good indicator of performance because a given frequency might reflect good performance for one behavior and poor performance for another. They give as examples two behaviors for police officers. An 85% to 94% frequency of occurrence would be outstanding for obtaining arrest warrants but abysmal for being vindicated in the use of lethal force. Thus considerable judgment can be required in interpreting the meaning of frequency ratings with the BOS. Of course, judgment is required in interpreting many measures of job performance.

Both criteria can be quite different for some jobs. For others, the correspondence between the theoretical and actual criteria is quite close. For example, for an insurance salesperson, the theoretical criterion is to sell, and the actual criterion is a count of the sales the person made. For an artist, the correspondence is

not as close. The theoretical criterion of producing great works of art is matched to the actual criterion of asking art experts for an opinion about the person's work. In this case, there is room for subjectivity about who is deemed an art expert and about the expert judgments of what is and is not good art. As these cases illustrate, the criteria for different jobs may require quite different assessment approaches.

The multidimensional approach does

not combine the individual criterion measures. In the previous example, there would be four scores per employee.

Variability of performance over time is referred to as the dynamic criterion, although it is the performance and not the standard that changes. The dynamic criterion idea has generated some controversy among I/O psychologists, with some believing performance is stable and others suggesting it is not (Schmitt & Chan, 1998). On the one hand, Deadrick and Madigan (1990) provided data for sewing machine operators in a clothing factory that indicated performance was stable over short periods of time (weeks) but was

not very consistent over long periods of time (months). On the other hand, Vinchur, Schippmann, Smalley, and Rothe (1991) found that the job performance of manufacturing employees was reasonably stable over a 5-year time span.

Results have been more promising with types of training other than RET. Those training procedures teach raters how to

observe performance-relevant behavior and how to make judgments based on those observations. Hedge and Kavanagh (1988), for example, found that this observation training increased rating accuracy but did not reduce rating errors (see the Research in Detail box).

A new trend is for companies to go beyond the once per year evaluation in designing a comprehensive performance management system. In addition to the annual appraisal, such systems can include goal setting and periodic coaching and feedback sessions between the employee and supervisor. Whereas the annual review might be used for administrative purposes, the interim reviews would be used

only for feedback, thus reducing some of the anxiety and defensiveness employees experience when being evaluated for raises and promotions.

The use of multiple perspectives for manager feedback has been called 360-degree feedback (Baldwin & Padgett, 1993). A manager is evaluated by peers, subordinates, and supervisors on several dimensions of performance. In addition, the manager completes a rating of his or her own performance. Research has shown that people in these different positions show

only modest agreement in their ratings (Brett & Atwater, 2001; Carless, Mann, & Wearing, 1998; Fletcher & Baldry, 2000), suggesting that they provide different perspectives on a person's performance. Another advantage of using multiple raters is that the effects of the biases of individuals can be reduced. For example, it has been shown that people give higher ratings in 360-degree evaluations to those they like (Antonioni & Park, 2001). For example, the effects of favoritism on the part of the immediate supervisor are diminished when additional information from other raters is added to the appraisal. This can lead to increased trust of and better attitudes about the appraisal system on the part of those being evaluated (Mayer & Davis, 1999).

The basis for using job performance data for administrative decisions can be found in both contract and law. A union contract will often specify that job performance is the basis for particular administrative decisions, such as pay raises. A contract can also state that

performance appraisals will not be done. Civil service (government) employees in the United States can be fired only for unsatisfactory job performance or violation of work rules. Rule violations include assaulting a coworker, being convicted of a felony, falling asleep on the job, and not showing up for work when scheduled. Even so, many fired

In theory, it should be possible to make use of these cognitive models to help raters do a more accurate job of evaluating job performance. Jelley and Goffin (2001) attempted this with an experiment in which college students were asked to rate the performance of a videotaped college instructor using a BOS. Although results were somewhat inconsistent, the authors were able to find some accuracy increases after

priming the raters' memory. This was done by having them do some preliminary global ratings designed to stimulate recall of the performance observed. This approach shows some promise in helping improve ratings, but more research will be needed to determine whether these models will ultimately prove useful.

360-Degree Feedback In most organizations, the direct supervisor of each employee is responsible for assessing job performance. However, it can be helpful to get multiple perspectives on job performance (Furnham & Stringfield, 1994), and the use of multiple perspectives is becoming standard practice in the evaluation of managers and others (Rowson, 1998). Ratings by peers, self, and subordinates (for supervisors) can be a useful complement to supervisor ratings and can be helpful in

providing feedback for employee development (Maurer, Mitchell, & Barbeite, 2002). In particular, discrepancies between the ratings by self (the employee's own ratings of performance) and others can show those areas in which other people see the employee differently from how the employee views himself or herself.

There are many other possible criteria beyond work quality and quantity. Table 4.2 contains a performance appraisal form that has eight rather general criteria that are relevant to many jobs. For example, maintaining a professional appearance on the job is relevant when

public image is important. Many organizations expect employees who meet the public to display a certain image. This might involve a dress code that specifies the sort of clothing that is appropriate for work, such as a business suit. Factories can have dress codes that are concerned not with public image but with safety. Ties are often forbidden because they could get caught in machinery, resulting in a serious accident and injury.

The nature of some jobs requires that quality be the major focus, whereas for others quantity may take priority. In athletics, sometimes one or the other serves as the criterion for winning a competition. In gymnastics, quality is the criterion used. Judges score each gymnast's performance along a quality dimension, and the person with the highest score wins. In track and field events, the criterion is concerned with

quantity—jumping farthest, jumping highest, running fastest, or throwing farthest. The quality of jumping form or running style is not relevant, and so there are no judges to rate performance in these events. With jobs, there can be an emphasis on quality or quantity, often depending on the nature of the tasks involved. For a sales job, the emphasis is usually on the quantity of sales, whereas for a teacher it is on the quality of instruction.

The Behavior Observation Scale (BOS) asks

raters to indicate how often ratees perform each of the listed behaviors.

The Mixed Standard Scale (MSS) asks

raters to indicate if the individual's performance is worse than, as good as, or better than each of several items of performance behavior.

Several different types of rating forms have been devised to increase the accuracy of performance ratings. The Behaviorally Anchored Rating Scale (BARS) asks

raters to indicate which of several behaviors comes closest to representing the job performance of the individual.

Rater Bias and Error It is the nature of human judgment to be imperfect. When supervisors or other people make performance ratings, they are likely to exhibit

rating biases and rating errors. These biases and errors can be seen in the pattern of ratings, both within the rating forms for individuals and across rating forms for different people. These within-form and across-form patterns are called halo and distributional errors, respectively.

Rater training is another approach that has been attempted to reduce errors. Research has suggested that rater error training can

reduce rating accuracy, even if it is successful in reducing rating errors. Observation training that focuses on observing performance- related behavior and making judgments of performance has shown promise in increasing accuracy. At the present time, however, it would be premature to conclude that either approach will prove useful in helping supervisors provide accurate performance ratings.

Research Many of the activities of practicing I/O psychologists concern the improvement of employee job performance. The efforts of I/O psychologists can be directed toward designing better equipment, hiring better people, motivating employees, and training employees. Job performance data can serve as the criterion against which such activities are evaluated. To do so, one can conduct a

research study. -A common design for such a study involves comparing employee performance before and after the implementation of a new program designed to enhance it. -A better design would be an experiment in which one group of employees receives a new procedure, while a control group of employees does not. The two groups could be compared to see if the group that received the new procedure had better job performance than the control group that did not. Better job performance by the trained group would serve as good evidence for the effectiveness of the training program.

The process begins with observation of the employee by the supervisor. Next, observations of performance are stored in the supervisor's memory. When asked to rate performance, the supervisor must

retrieve information about the employee from his or her memory. The information is then used in some manner to decide what performance rating to give for each dimension of job performance.

The various models describe how humans process information at each step. One idea is that people use

schemata (categories or frames of reference) to help interpret and organize their experiences (Borman, 1987). Perhaps the best-known schema is the stereotype —a belief about characteristics of the members of a group. The characteristics can be favorable or unfavorable. For example, one stereotype might be that private-sector managers are hardworking.

-Attendance -Job knowledge -Work accuracy -Work quantity Werner suggested that these four dimensions might represent the characteristics that define the

schemata of his supervisors. He also suggested that supervisors should let subordinates know the content of their schemata. Subordinates are likely to attempt to perform well in those areas that the supervisor believes are important for good performance.

How could it be possible that the reduction of errors also results in a reduction in accuracy? One possible explanation lies in the nature of the rating errors. As noted earlier in this discussion, rater errors are inferred from the pattern of ratings. It is possible that the performance of individuals is

similar across different performance dimensions (true halo) or that all individuals in a supervisor's department perform their jobs equally well. Training raters to avoid the same ratings across either dimensions or people will result in their concentrating on avoiding certain patterns rather than on accurately assessing job performance. Bernardin and Pence (1980) suggested that RET might be substituting one series of rating errors for another.

Other Factors That Influence Job Performance Ratings So far we have discussed how the ratings of supervisors can be affected by their cognitive processes and by the design of the rating form (and training in how to use it). Other factors can also affect the ratings given by supervisors, including

supervisor feelings about the subordinate, supervisor mood, supervisor perceptions about subordinate motives for performance, cultural factors, and the race of both the rater and the ratee.

Administrative Decisions Many administrative decisions that affect employees are based, at least in part, on their job performance. Most large organizations use job performance as

the basis for many negative and positive actions. Negative actions toward an employee include both demotion and termination (firing), and some organizations have policies that require the firing of unsat- isfactorily performing employees. Positive actions include promotion and pay raises, and many organizations have merit pay systems that tie raises to the level of job performance.

The purpose of 360-degree systems is to enhance performance, especially for those individuals who are the most in need of performance improvement. These systems have been shown to have positive effects for some individuals but not all. Contrary to these systems' intended purpose, it is

the best and not the worst performers who seem to benefit most from 360-degree feedback (Bailey & Austin, 2006). Furthermore, Atwater and Brett (2005) found that those individuals who received low ratings from others and rated themselves low as well had the worst reactions to feedback, suggesting that if one knows his or her performance is poor, having those beliefs corroborated by others was not helpful.

Dynamic Criteria Criteria are usually considered as constant or static standards by which employee per- formance can be judged. Some I/O psychologists believe, however, that job performance itself is variable over time. This means that

the best performer on the job at one point in time will not be best at another point in time. Performance variability makes assessment difficult because performance will not be the same throughout the time period for which it is measured. If someone performs well for part of the year and not well for the other part, how should his or her performance be assessed?

Borman (1987) studied the content of U.S. Army officers' schemata of subordinate job performance. When asked to describe the differences in characteristics between effective and ineffective soldiers, these officers generated 189 descriptive items. Borman then used complex statistical analysis to reduce the 189 items to six meaningful dimensions. Effective soldiers were seen as having the following characteristics: -Working hard -Being responsible -Being organized -Knowing the technical parts of the job -Being in control of subordinates -Displaying concern for subordinates Borman concluded that these dimensions represent

the characteristics that officers use to judge soldiers' performance. He also noted that in his sample of experienced officers, there was good agreement about what constituted good job performance. These results suggest that experienced supervisors might have schemata that accurately represent effective performance. These six dimensions could be used as the basis for any of the rating forms we discussed earlier.

Characteristics of Criteria Actual Versus Theoretical Criterion Criteria can be classified as either actual or theoretical. The theoretical criterion is...

the definition of what good performance is rather than how it is measured. In research terminology, the theoretical criterion is a theoretical construct. It is the idea of what good performance is. The actual criterion is the way in which the theoretical criterion is assessed or operationalized. It is the performance appraisal technique that is used, such as counting a salesperson's sales.

THE IMPACT OF TECHNOLOGY ON PERFORMANCE APPRAISAL Advances in technology, particularly the web, have greatly expanded what is practical in performance appraisal. For large companies, the amount of data that is involved in monitoring performance can be staggering. For example, one of the difficulties with 360-degree feedback is

the logistics of organizing this large rating task. Each target manager must nominate several subordinates and several peers to provide ratings, do a self-rating, and get his or her supervisor's rating. In some organizations, this might represent 8 or more ratings completed per manager, and if there are 10,000 managers, there are 80,000 ratings to track and process. This is an expensive and difficult task for a company to do manually.

In an MSS (Mixed Standard Scale) , the statements for the various dimensions are presented in a random order. The rater is not told the specific dimensions associated with each behavior, although the nature of the behaviors is certainly clear. The original idea of Blanz and Ghiselli (1972) was that

the mixed order of presentation of the statements would make it more difficult for the raters to bias their ratings than is true of the other types of rating forms. When Dickinson and Glebocki (1990) compared responses to both the mixed and the sorted (by dimension) orders, they found that subjects responded similarly in their ratings with both orders. Thus it does not seem to matter if the dimensions are identified or if the statements are mixed up.

The Behaviorally Anchored Rating Scale (BARS) is a rating scale in which the response choices are defined in behavioral terms. The rater chooses the behavior that comes closest to describing

the performance of the person in question. The behaviors are ordered from bottom to top on the scale along the continuum of performance effectiveness.

LEGAL ISSUES IN PERFORMANCE APPRAISAL Many countries have laws that prohibit discrimination against minorities and women (as well as other groups) in the workplace. These laws cover organizational actions that affect the employment status of people, such as promotions and terminations. Such employee actions are often based at least in part on

the person's performance; therefore, the performance appraisal system of an organization can become the target for legal action. In many countries, it is illegal to discriminate in performance appraisal on the basis of certain non-performance-related factors, such as age, gender, mental or physical disability, or race.

Another type of schema is a prototype, which is a model of some characteristic or type of person. One might think of a particular fictional or real person as

the prototype of a good manager. Some people might consider Bill Gates, the founder of Microsoft, to be a prototype of a good corporate manager. A person who has the salient characteristics of the prototype might be thought of as a good manager. If the salient characteristics of the prototype are blond hair (or looking like Gates), managers who are blond (or look like Gates) might be seen as better in performance than their counterparts who have brown hair (or do not resemble Gates). The prototype is the standard used to assign people to the good manager category.

The particular methods used to assess performance should be based on

the purposes of the assessment information.

The continuation of good performance ratings can be influenced by supervisor expectations about performance independent of liking. Murphy, Gannett, Herr, and Chen (1986) found that judgments of performance were influenced by ....

the rater's expectations about the ratee's performance. People are likely to forget instances of behavior that do not fit their view of the person they are evaluating. Thus a person who is liked and performs well will continue to be seen as a good performer, even if performance has recently slipped. This can produce biased ratings when performance changes over time.

This scale is different from the MSS in that

the raters indicate frequency rather than comparing employee behavior with the item. In theory, it should indicate how often employees engage in performance-relevant behavior.

Criterion relevance is the extent to which the actual criterion assesses the theoretical criterion it is designed to measure, or its construct validity (see Chapter 2). The closer the correspondence between the actual and theoretical criteria, the greater

the relevance of the actual criterion. All of the actual criteria would seem to have some degree of relevance for assessing their intended theoretical criteria. Theoretical criteria can be quite abstract, such as producing great works of art; therefore, it can be difficult to determine the relevance of a criterion. As with the validity of any assessment device, relevance concerns the inferences and interpretations made about the meaning of our measurements of performance.

Distributional Errors Distributional errors occur when a rater tends to rate everyone

the same. -Leniency errors occur when the rater rates everyone at the favorable end of the performance scale. -Severity errors occur when the rater rates everyone at the unfavorable end of the performance scale. -Central tendency errors occur when a rater rates everyone in the middle of the performance scale. It is possible, however, that a distributional error pattern does not reflect errors. All ratees might have performed the same, leading to similar ratings.

The actual criterion is ...

the way in which the theoretical criterion is assessed or operationalized. It is the performance appraisal technique that is used, such as counting a salesperson's sales.

In the United States, there have been an increasing number of court challenges to performance-based employee actions, such as promotions and terminations (Latham et al., 1993). Organizations that have lost such cases have been unable to demonstrate to the court's satisfaction that

their performance appraisal systems did not discriminate against certain groups. Subjective methods are especially likely to evoke legal challenges because they allow room for supervisors to express prejudices against certain groups of people. It can be difficult for a supervisor to prove in court that his or her ratings were fair and unbiased when, for example, Blacks get lower performance ratings than Whites (McKay & McDaniel, 2006), as noted earlier.

Schemata might influence all four steps in the evaluation process. They might affect what behaviors a supervisor chooses to observe, how the behaviors are organized and stored in memory, how they are retrieved, and how they are used to decide on ratings. The use of schemata, however, does not necessarily imply that

they lead to inaccurate ratings. In many ways, the use of schemata can simplify experience so that it can be more easily interpreted. It is possible that this leads to accurate judgments about employee performance (Lord & Maher, 1989).

CHAPTER SUMMARY Job performance data have many organizational uses, including administrative decision making, employee development, employee feedback, and research. The first step in evaluating job performance is

to develop performance criteria that define good and poor performance. Once criteria are set, specific methods to measure them can be chosen.

Rater Training to Reduce Errors Rater training has also been attempted in many studies, with mixed results (Hedge & Kavanagh, 1988; Latham, 1986). At least some of the discrepancy in research findings may be the result of differences in the types of training that have been studied. Perhaps the most popular training is rater error training or RET. The objective of RET is

to familiarize raters with rater errors and to teach them to avoid these rating patterns. Although most studies have found that this sort of training reduces rating errors, it is often at the cost of rating accuracy (e.g., Bernardin & Pence, 1980; Hedge & Kavanagh, 1988). In other words, the raters might reduce the number of halo and leniency patterns in their ratings by forcing ratings to vary, whether or not they accurately reflect how well the person has done, but those ratings are less accurate in reflecting the true level of performance.

The Behavior Observation Scale (BOS) contains items that are based on critical incidents, making it somewhat like an MSS. A critical incident (Flanagan, 1954) is an event reflecting either effective or ineffective behavior by an employee. An example of a poor incident for a teacher would be "slapping a child who made a disrespectful comment." With the BOS, raters are asked

to indicate for each item the amount of time the employee engaged in that behavior. The developers of the scale recommend having raters indicate the percentage of time the employee does each behavior by using the following percentage options: 0% to 64% 65% to 74% 75% to 84% 85% to 94% 95% to 100%

Nathan and Tippins (1990) offered a different explanation of why halo errors are associated with greater accuracy in job performance ratings. They speculated that raters who exhibited less halo in their ratings might have given

too much weight to inconsequential negative events. For example, a supervisor might have given an otherwise reliable employee a low rating in attendance because he or she was sick for one week in the prior year. Raters who exhibited a halo pattern in their ratings paid less attention to such rare instances and tended to consider the person's usual performance. This may have resulted in ratings that were more accurate because they were influenced more by general perfor- mance than by rare instances of good or poor performance in one or more dimensions.

The productivity measure chosen must match the nature of the work done. The nature of productivity can be

very different from job to job. This makes it difficult to compare the performances of people who hold different jobs.

Unfortunately, objective performance measures also have several limitations. Many of the objective measures are not appropriate for all jobs. When jobs do not involve countable output, productivity is not a feasible measure of performance. Also, it is not always obvious

what number is considered satisfactory performance. For example, how many absences per year should be considered good performance? Data taken from records can be contaminated and inaccurate. Sometimes behaviors and productivity are attributed to the wrong person or are never recorded. People can also distort records by omitting bad incidents for individuals who are being favored, and employees might fail to report accidents and injuries.

Both types of measures can be useful, but studies have shown that

when both are used for the same employees, they don't always agree on the level of performance (Sundvik & Lindeman, 1998, see International Research), suggesting they likely reflect different aspects of job performance.


Related study sets

CH. 48 Caring for Clients With Ostomies

View Set

ENVS 2126: Practice ass#8 (Slawomir Lomnicki)

View Set

Demand and Supply, price ceiling, price floor

View Set

Biology: Virus and Bacteria Vocab

View Set

Wordly Wise 3000 book 12 lesson 20

View Set

Monday, November 13th - Summative Assessment

View Set