HRM 465 - Ch. 7 Quiz
Measurement error
Actual score = true score + error This type of error represents "noise" in the measure and measurement process. It's occurrence means that the measure did not yield perfectly consistent scores, or so-called true scores, for the attribute There is an element of error in any measure. There are no perfect measures
Statistical significance
DEF: Refers to likelihood a correlation exists in a population, based on knowledge of the actual value of r in a sample from that population Meaning if the organization were to use a selection measure based on a statistically significant correlation, the correlation is likely to be significant when used again to select another sample (ex: future job applicants) You are basically answering this question: is this correlation between the two variables specific to the data that I collected? Or can I assume that correlation for data to be collected in a different sample? Whether the correlation we reach in the sample is true for the population or just the sample size Is the correlation specific to the sample and can it be applied to the whole population
Scatter Diagrams
Used to plot the joint distribution of the two sets of scores R = 0.10 it is a very little relationship between the 2 sets of scores R = 0.25 shows a modest relationship between the scores R = 0.60 shows a somewhat strong relationship between the two sets of scores
Practical significance
[For practical significance you look at the sign (tells you the direction of the relationship), then look at the value of r in absolute terms and calculate the variance • the closer the number is to the absolute number of 1, the stronger it is] Refers to size of correlation coefficient The greater the degree of common variation between two variables, the more one variable can be used to understand another variable We use r2 in explaining the variance
Procedures to calculate reliability estimates are?
- Coefficient alpha - Interrater agreement -Test-retest Reliability -Intrarater agreement
Deficiency error
Failure to measure some aspect of attribute assessed Ex: if knowledge of programming languages should involve Java and our test does not have any items (or an insufficient number of items) covering this aspect, the test is deficient I missed to evaluate a key point/components like organization or communication or word processing
Coefficient alpha
Internal consistency • Should be least .80 for a measure to have an acceptable degree of reliability [this means that 80% of the variation in scores is "true" variation and 20% is "error"] • Ex: measures consistency between 5 Q's o If the questions are measuring the same information o .7 or .9 is ok • It needs to be a positive correlation between each • Coefficient alpha depends on just 2 things - the number of items and the amount of correlation between them (so to increase the internal consistency all you have to do is increase both items and amount of agreement between the items (raters))
Contamination error
Occurrence of unwanted or undesirable influence on the measure and on individuals being measured Ex: electricity turned off, result of a factor out of your control
Z Scores
converts the raw score to standard deviation unit o You need three data points in order to calculate the Z score for every candidate: Raw score Mean Standard deviation
Variability (interpreting scores)
describes the spread of data around the midpoint [how spread out the data is] We describe variability in terms of RANGE (lowest to highest); the problem with ranges is that they are affected by outliers. We also use VARIANCE and STANDARD DEVIATION. STANDARD DEVIATION is the average distance of a set of scores from the mean. (the positive square root of the variance) Standardizing scores STANDARD SCORES are converted raw scores that indicate where a person's score lies in comparison to a referent group Z SCORE is a standard score that indicates the distance of a score from the mean in standard deviation units (you are standardizing the unit; therefore you are able to compare different scores from different tests)
Use of measures in staffing (steps - 6)
o 1. Choose and define attribute - Using the Report summary for the Call Center Worker, choose an important job attribute to measure during the selection process (attribute or construct mean the same here) o 2. Develop measure of attribute - Select/develop a measure of the attribute; in other words, what is a tool that you can use in order to be able to know a candidate's score on that attribute o 3. Measure the attribute - in other words, the candidate is now taking the test o 4. Determine number or score - What is the score of the candidate o 5. Make evaluation - Evaluate the score (what does it mean to have a 3/5 score on a structured interview; or to be at 75th percentile on a conscientiousness measure) o 6. Make a decision
Criterion deficiency vs. contamination
o A component of the job of Administrative support is word processing. In the selection system used to hire an Administrative Support Staff, there is not a test to assesses competency in performing word‐processing tasks. o Is that an example of criterion deficiency or contamination? o Explain how this (deficiency or contamination) would affect criterion-related validity of the selection system.
CENTRAL TENDENCY (interpreting scores)
o A. Central tendency describes the midpoint, or the center, of the data We describe central tendency using the mean: a measure of central tendency reflecting the average score; the median, the middle score, or the point below which 50 percent of the scores fall; and the mode the most commonly observed score Central - how does the data behave in the middle
Which procedures ASSESS RELIABILITY WITHIN A SINGLE TIME PERIOD??
o Coefficient alpha and interrater agreement - ASSESS RELIABILITY WITHIN A SINGLE TIME PERIOD
Correlation between Scores
o Correlation is the strength of a linear relationship between two variables o Correlation between 2 variables does not imply causation between them o Correlation coefficient: Value of r summarizes both • Strength of relationship between two sets of scores and • Direction of relationship Values can range from r = -1.0 to r = 1.0 • If r = -1 it has an inverse relationship, if one increases the other decreases and vis-versa • the sign tells you if the variables change in the same direction (+) or opposite directions (-). Interpretation - Correlation between two variables does not imply causation between them • A correlation simply states how two variables covary or relate to each other; it says nothing about one variable necessarily causing the other one!!! Correlation formula
Criterion-Related Validation
o Criterion Measures: measures of performance on tasks and task dimensions o Predictor Measure: it taps into one or more of the KSAOs identified in job analysis o Predictor-Criterion Scores: must be gathered from a sample of current employees or job applicants o Predictor-Criterion Relationship: the correlation must be calculated.
Measurement
o Def: the process of assigning numbers to objects to represent quantities of an attribute of the objects The process of assigning numbers according to a rule or convention to aspects of people, jobs, job success, or aspects of the staffing system (words in bold are what can be measured) o Measures are the methods or techniques for describing and assessing attributes of objects (e.g.: job performance ratings- it is the tool (measure) used to assess performance (attribute)
Quality of Measures: Reliability
o Definition: Consistency of measurement of an attribute A measure is reliable to the extent it provides a consistent set of scores to represent an attribute o Reliability of measurement is of concern Both within a single time period [a panel so there are different raters] and between time periods For both objective and subjective measures o [Compare scores within T1 or T2: focusing on a single attribute at a single moment in time] o [Comparison between T1 and T2 is looking at the stability]
Quality of Measures: Validity
o Definition: Degree to which a measure truly measures the attribute it is intended to measure (there are different types of validity) Accuracy of measurement measure verbal or knowledge [• Remember, once we establish a measure, we assume it is an indicator of the attribute. Validity is understanding how much of the measure is really the attribute] Accuracy of prediction predict job performance A tool has higher validity when the measurement and prediction is accurate
SCORES
o Definition: Measures provide scores to represent amount of attribute being assessed o We want to start with raw scores (not very useful, need to be modified) o Scores are the numerical indicator of attribute o Using the measure, we collected raw scores o Raw scores are the unadjusted scores on a measure. They don't provide meaningful information. o So how do we go from a list of numbers to meaningful indicators?
Quality of Measures: Reliability
o Implications of reliability Standard error of measurement • Since only one score is obtained from an applicant, the critical issue is how accurate the score is as an indicator of an applicant's true level of knowledge o Relationship to validity Reliability of a measure places an upper limit on the possible validity of a measure A highly reliable measure is not necessarily valid Reliability does not guarantee validity - it only makes it possible
VALIDITY OF MEASURES IN STAFFING
o Importance of validity to staffing process Predictors must be accurate representations of KSAOs to be measured Predictors must be accurate in predicting job success o Validity of predictors explored through validation studies o Three types of validity: Face validity - non-expert takes the exam; personality test seem irrelevant to candidates Content validity - seeking evaluation/judgment of someone who knows the info and is an expert in the field Criterion-related validity - predictions Face and Content is what am I measuring
Measurement: Standardization
o Involves Controlling influence of extraneous factors on scores generated by a measure Ensuring scores obtained reflect the attribute measured o Properties of a standardized measure: CONTENT IS IDENTICAL for all objects measured (e.g.: everybody is asked the same interview questions) ADMINISTRATION of measure is IDENTICAL for all objects (e.g.: everyone is interviewed for 45 minutes; everyone is greeted similarly) RULES FOR ASSIGNING NUMBERS ARE CLEARLY SPECIFIED and agreed on in advance (e.g.: scoring key is developed before the interview and is used to evaluate answers)
What do the correlations mean? • If r between a structured interview and job performance ratings is .9
o It is a positive relationship meaning they move in the same direction. It is a strong relationship because it is close to 1 and 81% of overlap (they have a lot of common area)
MEASUREMENT: DIFFERENCES IN OBJECTIVE AND SUBJECTIVE MEASURES
o Objective measures Rules used to assign numbers to attribute are predetermined, communicated, and applied through a system Multiple choice, no room for interpretation Ex: OCEAN - get a number o Subjective measures Scoring system is more elusive, often involving a rater who assigns the numbers Includes raters, make the gray area less gray, requires a rater's (person) evaluation o We need both to see different aspects of applicants o Research shows these may not be strongly related, but purely objective measures can miss important parts of job performance
Interpreting the practical significance
o Practical significance: The value of r will be affected by the variability of each variable. If there is no variation in the set of scores, there won't be a correlation. (restriction of range: lack of variation in scores) • If everybody gets 85% then you cant see the variability/change so cant analyze the overlap We are assuming a linear relationship between the two variables; if that is not the case, then r is smaller than it should be Correlation does not mean causation: the two tests scores vary together (co-vary)
Prediction Process
o Predictors measured during the hiring process Interview to evaluate verbal communication Report writing test to evaluate written communication and performance management skills THIS ABOVE IS PREDICTED CRITERION o CRITERIA Performance Project management skills or verbal communication written communication o In the future, we can measure actual criterion actual performance - actual data
Quality of measures
o Reliability of measures o Validity of measures o Validity of measures in staffing o Validity generalization o If a tool is reliable then it is valid o The quality of the decisions made and the actions taken is unlikely to be any better than the quality of the measures on which they are based
Importance and use of measures
o Results of measurement process: Scores become indicators of attribute Initial attribute and its operational definition are transformed into a numerical expression of attribute In other words the number is no longer just a score but a description of the attribute. E.g.: if you score 20/20 on the exam, then we can say that you have excellent knowledge of HR. again: knowledge of HR is the attribute. And the exam is the measure.
Statistical significance.... significance level& important NOTE
o Significance level is expressed as p < value Interpretation -- If p < .05, there are fewer than 5 chances in 100 of concluding there is a relationship in the population when, in fact, there is not [there is a relatively small probability and usually leads to the conclusion that a correlation is indeed statistically significant Im ok with a 5% error, p < .05 • If I got a population of 100 and if I'm wrong 5% of the time, it is ok. **o Sample size does matter [if you have too big of a sample size you are bound to find a correlation]; use significance levels with caution (i.e. while understanding your sample size; if the sample size is big enough, even small correlations will be significant)
Data analytics
o Strategic staffing requires the use of data in making hiring decisions o Accurately assessing and measuring candidate characteristics in order to hire the right people. o Data analytics is the process of using data and analytical systems to arrive at optimal decisions, including statistical analyses of data This is like HR analytics We look at the collection of data to help hire someone who will be a good employee
Which procedures ASSESS RELIABILITY BETWEEN TIME PERIODS??
o Test-retest and intrarater agreement - ASSESS RELIABILITY BETWEEN TIME PERIODS
What do the correlations mean? • If r between a personality test and a job knowledge test is zero (or near zero)
o Then there is no correlation between personality test and job knowledge test. There is no shared variance so no relationship.
What do the correlations mean? • If r between a personality test and job performance ratings is -.7
o They have an inverse relationship and it is a relatively strong relationship because there is a 49% overlap. R^2 = 0.7^2 = .49 = 49%
How do we interpret correlations?
o We use both practical significance and statistical significance
Q: Give examples of when you would want the following for a written knowledge test a low coefficient alpha (e.g., α = .35)??
• Coefficient alpha measures internal consistency/reliability • How related are the scores to the questions • The higher the coefficient alpha the more related it is • Ex: an HR job - staffing or T&D, or C that don't have to be related • Ex: IF you really know staffing but not measurement, so internal consistency between those topics is low
Test-Retest Reliability
• Concerned with stability of measurement • Level of r should range between r = .50 to r = .90 • Examiner scores at time 1 and then time 2 • .5 -.9 or higher • Need to know how much the attribute may be expected to change, and what the appropriate time interval between tests is o Short time intervals (hours or days) most attributes are stable and large test-retest r (r = .90 or higher) should be expected o Longer time intervals, it is common to expect lower r's, for example over 6 months or a year, individuals' knowledge of programming languages might change so lower test-retest reliabilities (r = .50)
Intrarater agreement
• For short time intervals between measures, a fairly high relationship is expected - r = .80 or 90% • Same rater at different times (within)
Interrater agreement
• Minimum level of interrater agreement - 75% or higher • If I'm comparing interview scores of Mona/Maria when interviewed Elizabeth • Reliable if have similar scores • Compare scores of scores given by 2 different people • The more important the end use of the ratings, the greater the agreement required should be
Q: Give examples of when you would want the following for a written knowledge test a low test-retest reliability??
• Pretest - Post test • What you are measuring is not stable • If you sit in the 1st day of class exam and then take it again in the end it shouldn't correlate • When do you want test-retest to be high? o When you are measuring in short term o Ex: Personality will be stable (if you are open to experience today then you will be open to experience next week too) o Ex: IQ, Verbal ability