psych testing exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

t score **************************

t=(z x 10) + 50 10= sd 50=mean set

correlation and inference ***************************

---A coefficient of correlation (or correlation coefficient) is a number that provides us with an index of the strength of the relationship between two things ---Correlation coefficients vary in magnitude between -1 and +1; a correlation of 0 indicates no relationship between two variables ---Positive correlations indicate that as one variable increases or decreases, the other variable follows suit ---Negative correlations indicate that as one variable increases, the other decreases ---Correlation between variables does not imply causation but aid in prediction

reliability estimates (continued) ****************************

--Parallel forms: For each form of the test, the means and the variances of observed test scores are equal --Alternate forms: Different versions of a test that have been constructed so as to be parallel; they do not meet the strict requirements of parallel forms but item content and difficulty are similar between tests Reliability is checked by administering two forms of a test to the same group; scores may be affected by error related to the state of testtakers (e.g., practice, fatigue, etc.) or item sampling •Split-half reliability: Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once; entails three steps: •Step 1 - Divide the test into equivalent halves •Step 2 - Calculate a Pearson r between scores on the two halves of the test •Step 3 - Adjust the half-test reliability using the Spearman-Brown formula •Spearman-Brown formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test

sources of error variance

--Test construction - Variation may exist within items in a test or between tests (i.e., item sampling or content sampling) --Test administration - Sources of error may stem from the testing environment; testtaker variables such as pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication and examiner-related variables such as physical appearance and demeanor may play a role --Test scoring and interpretation - Computer testing reduces error in test scoring, but many tests still require expert interpretation (e.g., projective tests); subjectivity in scoring can enter into behavioral assessment Surveys and polls usually contain a disclaimer as to the margin of error associated with their findings --Sampling error - The extent to which the population of voters in the study actually was representative of voters in the election --Methodological error - Interviewers may not have been trained properly, the wording in the questionnaire may have been ambiguous, or the items may have somehow been biased to favor one or another of the candidates

reliability estimates *****************************

--Test-retest reliability: An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test Most appropriate for variables that should be stable over time (e.g., personality) and not appropriate for variables expected to change over time (e.g., mood) As time passes, correlation between the scores obtained on each testing decreases With intervals greater than 6 months, the estimate of test-retest reliability is called the coefficient of stability

measures of variability *********************************

--Variability is an indication of the degree to which scores are scattered or dispersed in a distribution 1) Range: Difference between the highest and the lowest scores 2) Interquartile range: Difference between the third and first quartiles of a distribution 3) Semi-interquartile range: The interquartile range divided by 2 4) Average deviation: All the deviation scores are summed and divided by the total number of scores (n) 5) Variance: The arithmetic mean of the squares of the differences between the scores in a distribution and their mean 6) Standard deviation: The square root of the average squared deviations about the mean; it is the square root of the variance or typical distance of scores from the mean

variance and measurement error

-Reliability is the proportion of the total variance attributed to true variance -Measurement error: All of the factors associated with the process of measuring some variable, other than the variable being measured

extra credit

1) Confidence interval: A range or band of test scores that is likely to contain the true score 2) calculate true score : forumula -->

scales of measurement ***********************

1) Continuous scales: Exist when it is theoretically possible to divide any of the values of the scale 2) Discrete scales: Used to measure a discrete variable 3) Error: The collective influence of all of the factors on a test score beyond those specifically measured by the test 4) Nominal scales: Involve classification or categorization based on one or more distinguishing characteristics; all things measured must be placed into mutually exclusive and exhaustive categories (e.g., apples and oranges and various DSM disorders) 5) Ordinal scales: Involve classifications such as nominal scales but also allow rank ordering (e.g., Olympic medalists) 6) Interval scales: Contain equal intervals between numbers; each unit on the scale is exactly equal to any other unit on the scale (e.g., IQ scores and most other psychological measures) 7) Ratio scales: Interval scales with a true zero point (e.g., height or reaction time)

factors affecting utlity pt 2

1) Cost: One of the most basic elements of utility analysis is the financial cost associated with a test Cost in the context of test utility refers to disadvantages, losses, or expenses in both economic and noneconomic terms Economic costs may include purchasing a particular test, a supply of blank test protocols, and computerized test processing Other economic costs such as the cost of not testing or testing with an inadequate instrument are more difficult to calculate Noneconomic costs include costs related to human life and safety 2) Benefits: We should take into account whether the benefits of testing justify the costs of administering, scoring, and interpreting the test Benefits can be defined as profits, gains, or advantages Successful testing programs can yield higher worker productivity and profits for a company Some potential benefits include an increase in the quality and quantity of workers' performance, a decrease in the time needed to train workers, a reduction in the number of accidents, and a reduction in worker turnover Noneconomic benefits may include a better work environment and improved morale 3) Utility analysis: •A family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment • Some utility tests are straightforward, while others are more sophisticated and involve employing complicated mathematical models • Utility tests address the question of "Which test gives us the most bang for the buck?" • Endpoint of a utility analysis yields an educated decision as to which of several alternative courses of action is most optimal (in terms of costs and benefits)

Measures of Central Tendency ***************

1) Measure of central tendency: A statistic that indicates the average or midmost score between the extreme scores in a distribution 2) Mean: Sum of the observations (or test scores in this case) divided by the number of observations 3) Median: The middle score in a distribution; it is particularly useful when there are outliers, or extreme scores, in a distribution 4) Mode: The most frequently occurring score in a distribution; when two scores occur with the highest frequency a distribution is said to be bimodal

correlation and inference (continued)

1) Pearson r: A method of computing correlation when both variables are linearly related and continuous Once a correlation coefficient is obtained, it needs to be checked for statistical significance (typically a probability level between .01 and .05) By squaring r, one is able to obtain the coefficient of determination, or the variance that the variables share with one another 2) Spearman rho: A method for computing correlation used primarily when sample sizes are small and when the variables are ordinal in nature a. Scatterplot: Involves simply plotting one variable on the x-axis (the horizontal axis) and the other on the y-axis (the vertical axis) Scatterplots of no correlation (left) and moderate correlation (right). Scatterplots of strong correlations feature points tightly clustered together in a diagonal line; for positive correlations, the points form a line and extend from bottom left to top right b. Outlier: An extremely atypical point (case) lying relatively far away from the other points in a scatterplot (curvalinear)

the concept of reliability: measurement error

1) Random error: A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process (i.e., noise) 2) Systematic error: A source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured

measures of variablity (continuted) ****************************

7) Skewness: The nature and extent to which symmetry is absent in a distribution 8) Positive skew: Relatively few of the scores fall at the high end of the distribution 9) Negative skew: Relatively few of the scores fall at the low end of the distribution 10) Kurtosis: The steepness of a distribution in its center 11) Platykurtic: Relatively flat 12) Leptokurtic: Relatively peaked 13) Mesokurtic: Somewhere in the middle

Desion Theory and Utility by Cronbach and Gleser (1965) ********** also false pos & neg

A classification of decision problems Various selection strategies ranging from single-stage processes to sequential analyses Quantitative analysis of the relationship between test utility, the selection ratio, cost of the testing program, and expected value of the outcome A recommendation that in some instances job requirements be tailored to the applicant's ability instead of the other way around (adaptive treatment) 2 misses 1) false positive: person does not have the trait but the test is saying they do 2) false negative: person has the trait but the test is saying they don't

discriminate analysis

A family of statistical techniques used to shed light on the relationship between identified variables (such as scores on a battery of tests) and two (and in some cases more) naturally occurring groups (such as persons judged to be successful at a job and persons judged unsuccessful at a job)

histogram

A histogram is a graph with vertical lines drawn at the true limits of each test score (or class interval), forming a series of contiguous rectangles

multiple hurdles

Achievement of a particular cut score on one test is necessary in order to advance to the next stage of evaluation in the selection process (e.g., Miss America contest)

meta-analysis

Allows researchers to look at the relationship between variables across many separate studies A family of techniques used to statistically combine information across studies to produce single estimates of the data under study ***The estimates are in the form of effect size, which is often expressed as a correlation coefficient

characteristics of criterion **********************************

An adequate criterion should be relevant for the matter at hand, valid for the purpose for which it is being used, and uncontaminated, meaning it is not based on predictor measures Validity coefficient: A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure Validity coefficients are affected by restriction or inflation of range Incremental validity: The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

bar graph

Bar graph: Numbers indicative of frequency appear on the Y-axis, and reference to some categorization (e.g., yes/ no/ maybe, male/female) appears on the X-axis

validity and test bias

Bias: A factor inherent in a test that systematically prevents accurate, impartial measurement Bias implies systematic variation in test scores Prevention during test development is the best cure for test bias Rating error: A judgment resulting from the intentional or unintentional misuse of a rating scale Raters may be either too lenient, too severe, or reluctant to give ratings at the extremes (central tendency error) Halo effect: A tendency to give a particular person a higher rating than he or she objectively deserves because of a favorable overall impression Fairness: The extent to which a test is used in an impartial, just, and equitable way

validity in 3 categories

Content validity - This is a measure of validity based on an evaluation of the subjects, topics, or content covered by the items in the test 2. Criterion-related validity - This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures 3. Construct validity - This is a measure of validity that is arrived at by executing a comprehensive analysis of: a. How scores on the test relate to other test scores and measures b. How scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measur

content validity

Content validity: A judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample Do the test items adequately represent the content that should be included in the test? Test blueprint: A plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, and the organization of the items in the test

criterion related validity

Criterion-related validity: A judgment of how adequately a test score can be used to infer an individual's most probable standing on some measure of interest (i.e., the criterion) Concurrent validity: An index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently) Predictive validity: An index of the degree to which a test score predicts some criterion, or outcome, measure in the future; tests are evaluated as to their predictive validity • A criterion is the standard against which a test or a test score is evaluated

known group method (method for setting cut scores)

Entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest Based on the analysis of data, a cut score is chosen on the test that best discriminates the groups' test performance There is no standard set of guidelines for choosing contrasting groups

face validity*********************

Face validity: A judgment concerning how relevant the test items appear to be If a test appears to measure what it purports to measure "on the face of it," it could be said to be high in face validity

frequency polygon

Frequency polygon: Test scores or class intervals (as indicated on the X-axis) meet frequencies (as indicated on the Y-axis)

IRT Based Methods (method for setting cut scores)

In an IRT framework, each item is associated with a particular level of difficulty In order to "pass" the test, the testtaker must answer items that are deemed to be above some minimum level of difficulty, which is determined by experts and serves as the cut score

construct validity

Judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a construct If a test is a valid measure of a construct, then high scorers and low scorers should behave as theorized All types of validity evidence, including evidence from the content- and criterion-related varieties of validity, come under the umbrella of construct validity

angoff method (method for setting cut scores)*****************

Judgments of experts are averaged to yield cut scores for the test Can be used for personnel selection based on traits, attributes, and abilities Problems arise if there is disagreement between experts

fixed-cut scores

Made on the basis of having achieved a minimum level of proficiency on a test (e.g., a driving license exam)

psychological measurement

Most psychological measures are truly ordinal but are treated as interval measures for statistical purposes

reliability estimates (3)

Other methods of estimating internal consistency --Inter-item consistency: The degree of relatedness of items on a scale; this helps gauge the homogeneity of a test --Kuder-Richardson formula 20: Statistic of choice for determining the inter-item consistency of dichotomous items --Coefficient alpha: Mean of all possible split-half correlations, corrected by the Spearman-Brown formula; it is the most popular approach for internal consistency, and the values range from 0 to 1 --Average proportional distance (APD): Focuses on the degree of difference between scores on test items; it involves averaging the difference between scores on all of the items, dividing by the number of response options on the test, and then subtracting by 1 --Measures of inter-scorer reliability Inter-scorer reliability: The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure It is often used with behavioral measures Guards against biases or idiosyncrasies in scoring Coefficient of inter-scorer reliability: The scores from different raters are correlated with one another The nature of the test will often determine the reliability metric; some considerations include: The test items are homogeneous or heterogeneous in nature The characteristic, ability, or trait being measured is presumed to be dynamic or static The range of test scores is or is not restricted The test is a speed or a power test The test is or is not criterion-referenced

factors affecting utlity

Psychometric soundness - Higher the criterion-related validity of test scores, higher the utility of the test. Exceptions exist as many factors may enter into an estimate of a test's utility, and there are variations in the ways in which the utility is determined. Valid tests are not always useful tests.

the concept of reliability ****************************test

Reliability: Consistency in measurement Reliability coefficient is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance Observed score = True score plus error (X = T + E) Error refers to the component of the observed score that does not have to do with a testtaker's true ability or the trait being measured

the standard error of measurement ***********************

Standard error of measurement, often abbreviated as SEM, provides a measure of the precision of an observed test score; an estimate of the amount of error inherent in an observed score or measurement The higher the reliability of the test, the lower the standard error Standard error can be used to estimate the extent to which an observed score deviates from a true score Confidence interval: A range or band of test scores that is likely to contain the true score

what is the relationship between reliability and standard error

Standard error of measurement, often abbreviated as SEM, provides a measure of the precision of an observed test score; an estimate of the amount of error inherent in an observed score or measurement The higher the reliability of the test, the lower the standard error Standard error can be used to estimate the extent to which an observed score deviates from a true score Confidence interval: A range or band of test scores that is likely to contain the true score Reliability: Consistency in measurement Reliability coefficient is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance Observed score = True score plus error (X = T + E) Error refers to the component of the observed score that does not have to do with a testtaker's true ability or the trait being measured

Taylor-Russell Tables

TRT: tables provide an estimate of the percentage of employees hired by the use of a particular test who will be successful at their jobs, given different combinations of three variables: the test's validity, the selection ratio used, and the base rate Here, validity refers to the validity coefficient, selection ratio refers to a numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired, and base rate refers to the percentage of people hired under the existing system for a particular position •Expectancy data - The likelihood that a testtaker will score within some interval of scores on a criterion measure

normal curve ***************************************

The normal curve is a bell-shaped, smooth, mathematically defined curve that is highest at its center and is perfectly symmetrical Area under the normal curve The normal curve can be conveniently divided into areas defined by units of standard deviations

sources of variance in a hypothetical test

The purpose of a reliability estimate will vary depending on the nature of the variables being studied; if the purpose is to break down error variance into its constituent parts, a number of tests would be used

the standard error of the difference

The standard error of difference: A measure that can aid a test user in determining how large a difference in test scores should be before it is considered statistically significant It can be used to address three types of questions: 1. How did this individual's performance on test 1 compare with his or her performance on test 2? 2. How did this individual's performance on test 1 compare with someone else's performance on test 1? 3. How did this individual's performance on test 1 compare with someone else's performance on test 2?

true score models vs alternatives****************************

The true-score model is often referred to as classical test theory (CTT), which is the most widely used model due to its simplicity True score: A value that according to classical test theory genuinely reflects an individual's ability (or trait) level as measured by a particular test CTT assumptions are more readily met in comparison to those of item response theory (IRT) A problematic assumption of CTT has to do with the equivalence of items on a test Domain sampling theory: Estimates the extent to which specific sources of variation under defined conditions are contributing to the test score Generalizability theory: Based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation Instead of conceiving of variability in a person's scores as error, Cronbach encouraged test developers and researchers to describe the details of the particular test situation or universe leading to a specific test score This universe is described in terms of its facets, including the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration Item response theory: Provides a way to model the probability that a person with X ability will be able to perform at a level of Y IRT refers to a family of methods and techniques used to distinguish specific approaches IRT incorporates considerations of an item's level of difficulty and discrimination Difficulty relates to an item not being easily accomplished, solved, or comprehended Discrimination refers to the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or other variables being measured

multiple cut scores

The use of multiple cut scores for a single predictor (e.g., students may achieve grades of A, B, C, D, or F)

utlity

Utility: The usefulness or practical value of testing to improve efficiency

the concept of validity

Validity: A judgment or estimate of how well a test measures what it purports to measure in a particular context Validation: The process of gathering and evaluating evidence about validity Both test developers and test users may play a role in the validation of a test Local validation: Test users may validate a test with their own group of testtakers

describing data

a. Distributions: A set of test scores arrayed for recording or study b. Raw score: A straightforward, unmodified accounting of performance that is usually numerical c. Frequency distribution: All scores are listed alongside the number of times each score occurred

Naylor Shine Tables

help obtain the difference between the means of the selected and unselected groups to derive an index of what the test (or some other tool of assessment) is adding to already established procedures For both Taylor-Russell and Naylor-Shine tables, the validity coefficient comes from concurrent validation procedures Many other variables may play a role in selection decisions, including applicants' minority status, general physical or mental health, or drug use

base rate************

is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population

Brogden-Cronbach-Gleser formula

is used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument under specified conditions

miss rate ***************

may be defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute

cut scores

may be relative, which implies that they are determined in reference to normative data (e.g., selecting people in the top 10% of test scores)

shapes that frequency distributions make ******************

normal bimodal positive skew : Skewness refers to distortion or asymmetry in a symmetrical bell curve, or normal distribution, in a set of data. If the curve is shifted to the left or to the right, it is said to be skewed. Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution negative skew jshaped retangular

utility gain

refers to an estimate of the benefit (monetary or otherwise) of using a particular test or selection method

hit rate******************

the proportion of people a test accurately identifies as possessing or exhitibing a particular trait, behavior, characteristic, attribute

coefficent of determination (extra credit)******

the variance that the variables share with one another

z score **************************

z = x - mean / standard deviation

standard scores **********************

• A standard score is a raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set mean and standard deviation a. Z score: Results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution b. T scores: Can be called a fifty plus or minus ten scale, that is, a scale with a mean set at 50 and a standard deviation set at 10 c. Stanine: A standard score with a mean of 5 and a standard deviation of approximately 2, it is divided into nine units d. Normalizing a distribution: Involves "stretching" the skewed curve into the shape of a normal curve and creating a corresponding scale of standard scores

evidence of construct validity ******************************

• Evidence of homogeneity - How uniform a test is in measuring a single concept • Evidence of changes with age - Some constructs are expected to change over time (e.g., reading rate) • Evidence of pretest-posttest changes - Test scores change as a result of some experience between a pretest and a posttest (e.g., therapy) • Evidence from distinct groups - Scores on a test vary in a predictable way as a function of membership in some group Convergent evidence: Scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established tests designed to measure the same (or a similar) construct Discriminant evidence: Validity coefficient showing little relationship between test scores and/or other variables with which scores on the test should not theoretically be correlated Factor analysis: Class of mathematical procedures designed to identify specific variables on which people may differ


Conjuntos de estudio relacionados

Ch. 58 Appendicitis (Davis Advantage)

View Set

Arrhenius, Bronsted-Lowry, and Lewis Acids and Bases Assignment

View Set

Chapter 43: Loss, Grief, and Dying

View Set