Psych 309 Test
What is the difference between structured and projective personality tests?
10.structured: you are given a definite stimulus and possible responses ex: personality test with liker scale projective: given something vague and you project from that.
Define psychological testing and psychological assessment. How are they different?
11. testing: how we obtain info assessment: gathering that info from test and integrating it to come up with a diagnosis
What is psychometry? What are the two major properties of psychometry?
12. psychometry is the type of psych that deals with psychological test two major properties: validity & reliability
What are norm- and criterion referenced tests? How is each unique?
13. criterion referenced test: Measures performance against an established criterion.Predicts performance outcome outside of the test. norm referenced test: Compare a test-taker's performance with others. The question can be answered many ways: • Class standing • Ranking • Percentile Rank
What types of questions are answered by psychologists through assessment?
14. applied research how can we use this in clinical setting diagnosis and treatment monitoring progress help client make more effective life choices
In what settings do psychologists assess and what is their primary responsibility in each?
15. everywhere: clinics, school, work primary responsibility: help the client and assess the client
What are the three properties of scales that make scales different from one another? Describe each
16. magnitude: one score is more than another equal intervals: zero to one is same as one to two absolute zero: measure has an absolute zero point where there is nothing
Know the four scales of measurement and be able to differentiate between these scales
17. nominal: values symbolize category membership. it cannot be ranked, added or subtracted, divided ordinal: intervals between are not equal and there is no true zero. ranking is possible. interval: intervals are equal, no true zero ratio: equal intervals and true zero is possible
Think of concrete examples of each of the different scales of measurement.
18. nominal: boys vs girls ordinal: gold, silver, and bronze interval: temperature, intelligence ratio: weight, age, height
Define frequency distribution and histogram? What kind of data are shown in each?
19. histogram: tells us the number of cases for each value. x axis- range of observed values in equal intervals. y axis- column height tells you how many times value was observed. interval data and numerical. frequency distribution: an overview of all distinct values and the frequency that they occur. mostly used for summarizing categorical data.
Be able to define, recognize, and differentiate between states and traits
2. states: specific condition for here and now ex: state of fear Traits: enduring conditions
Understand the concept of percentiles.
20. -often used in measuring child development (height and weight) specific scores or points within a distribution. percentiles divide the total frequency for a set of observations into hundredths.
Define central tendency. Know the three types of central tendency and how to calculate each.
21. mean: average, easily affected by outliers median: middle score, 50th percentile mode: most common
Know the advantages and disadvantages of the different measures of central tendency and when to use them.
22. Mean: Pros- most appropriate index for central tendency, takes each score into account, most democratic, interval level statistic which allows more options then median and mode cons- outliers and tail pull the mean making it skewed Median: not affected by outliers or skewed distributions cons- undemocratic bc all those that arent 50 % are ignored, ordinal level statistic and subject to limitations of ordinal statistics Mode: may be an interesting statistic, allows you to see the distribution shape or the level of measurement of the data cons- it is a nominal level statistic and subject to those limitations, extreme scores (outliers) might have highest rate of occurrence in a distribution
Define variance and standard deviation.
23. variance: measures variability. the average squared deviations around the mean. Standard Deviation: the positive square root of the variance. not an average deviation but it does give approximation of how much a typical score is above or below the average score. most often used as a sample to estimate a populations standard deviation
Understand Normal Distribution conceptually.
24. majority of people are within 2 standard deviations. beyond that are the extremes. central tendency, interquartile range, standard deviation, z score
Define skewness and be able to identify positive and negative skew.
25. index of the degree to which symmetry is absent. positive skew: tail points to the right, the positive side negative skew: tail points to the left normal distribution: value is zero
Define kurtosis and be able to identify its different types, including leptokurtic, playtikurtic, and mesokurtic.
26. how normal is your bell curve? it is the index of "peakedness" vs "flatness" of a distribution leptokurtic: peak is vary high and curve is narrow. all score close to central tendency. posititive, more peaked playtikurtic: not much difference in scores. curve is close to ground like a platypus bill. negative, flatter distribution mesokurtic: normal looking bell curve. value of zero, normal
What is a z score? How is it calculated?
27. a z score is the number of standard deviations away from the mean a score is. z= (x- sample x)/ S --> x = your mean - sample mean / sample standard deviation
How are T scores different from Z scores?
28. t score: mean of 50, sd of 10, upper and lower limit. anything above 2 sd is significant z score: mean of 0, sd of 1, no upper and lower limits
What are quartiles? What is Interquartile range?
29. quartiles: points that divide the frequency distribution into equal fourths. Q1- 25th percentile (1/4 cases below), Q2- 50th percentile (median, 1/2 cases below), Q3- 75th percentile (3/4 cases below) interquartile range: scores bounded by the 25th and 75th percentile (middle 50%)
Define achievement, aptitude, and intelligence testing.
3. Achievement: assess what a person has learned following a specific course of instruction. measure what the person has actually acquired or done with that potential aptitude: attempt to evaluate a students potential rather than how much a student has already learned. intelligence: measures general ability attempt to predict future performance. typically predict potential in a specific area such as math, science, or music.
Define norm, norming, and standardization. For what is each used?
30. standardizing a test: develop specific standardized procedures for the administering, scoring, and interpreting the test norming: process of creating norms. seeing what the bell curve is and how it distributes so you know what the norm is norm is the end product. now that we have a norm we can administer a test and interpret the results.
Define and differentiate between norm-referenced and criterion-referenced tests
31. criterion: test relates to some established unit of measure (criteria) norm: percentage ranking compared to an average population age norm, grade norm, national norm, subgroup norm, local norm
To avoid bias, how should error be distributed in a psychological test?
32. error should be unsystematic and random.
What are the five characteristics of a good theory?
33. 1. explanatory power: explains patterns of behavior that we know or expect to exist with accuracy and precision 2. broad scope: applies to a broad scope of phenomenon 3. systematic: internally consistent and coherent 4. Generative (or fruitful): add new research, predicts useful patterns 5. Parsimonious: explains and predicts phenomenon using minimum variables, concepts, and propositions needed to preform task
What is a scatterplot (scatter diagram)? How does it work?
34. a visual measure of data
What is the Correlation Coefficient? With what concept should correlation not be confused?
35. Pearson's Correlation Coefficient (r): r is covariance in standardized units. numerical measure of some type of correlation, meaning a statistical relationship between two variables.
Understand and be able to differentiate and plot positive, negative, and 0 correlation
36.
What is the principle of least squares? How does it relate to the regression line?
37. -usually defined as the "best choice" -minimizes squared deviations -determines the line of best fit on the regression line
Define covariance
38. the degree that 2 variable (x & y) move, change, vary together. As one value changes, the other changes in the same or opposite direction. get on the same metric by converting x and y to z scores
What is the Pearson product moment correlation? What meaning do the values -1.0 to 1.0 have?
39. co variance of X and Y. Standardized covariance between x and y. the closer they are to 1 the more the covary together (one increase the other increases) and the closer to -1 the more one gets higher the other gets lower
If a test is reliable its results are what?
4. consistant
Define residual?
40 The error in your prediction. difference between the predicted score and the observed score.
What is the standard error of estimate? What is its relationship to the residuals?
41. is an index of the accuracy of the prediction. standard deviation of the risiduals. how far off you were with your prediction and the actual measurement. average distance from the regression line.
What is shrinkage?
42. good test has minimal shrinkage. when you replicate a study and apply model and study to different population it wont work quiet as well
What is restricted range? To what does it lead?
43. ties in with ceiling and floor effect. when your scores are clustered in one point it restricts variance and it alters the strength of correlation coefficient. ex: GRE is a poor predictor bc they only take people at top of gre. looking at byu and comparing religion and blood pressure is bad bc everyone is highly religious so there is no group to compare it leads to reduced variance and strength. less accurate conclusions
What is factor analysis?
44. a data reduction technique. identifies underlying factors that drive responses to your item. factors are a group of variables. reduces a large amount of variables into a few and meaningful factors.
What is the co-efficient of determination? What is the purpose of the co-efficient of determination?
45. R squared. how much of a variation and why. explained and accounted for by x. you want a really high R to get a really high Rsquared.
Know the different types of correlations and when they are used.
46. pearson, ????
What is the regression formula? Understand the different components of the formula and how they are applied.
47. Y' = a+bx Yprime (predicted Y value) = a (intercept) +b (slope) x (x value)
What is the difference between simple linear regression and multiple regression?
48. simple linear regression: Y prime= a+bX x= one independent variable y= one dependent variable multiple regression: Y prime= a+b1X1+ b2X2
What is reliability?
49 Consistency, stability, accuracy of assessment results.
What are test batteries?
5. series or group of test that are used together. you use a test battery to make an assessment. you would never make assumptions off of one test fixed: you have a fixed set of test batteries you use every time flexible: you dont have to use the exact same
What contributes to measurement error?
50. test construction, test administration, test interpretation and scoring
What components make up Classical Test Score Theory?
51. error of measurement are random and we assume that there will be error when we try and measure something. bc of this observed score will be different than true score. TS+E= OS
Know what an observed score is.
52. It is the true score but with the error that is always involved
In what ways can error impact the observed score?
53.
Test reliability is usually estimated in one of what three ways? Know the major concepts in each way.
54. test restest: two different time periods parallel forms: same test but different forms internal constancy: . it measures homogeneity with which a test measures a construct. do all items measure the same thing? measured by using either split half, spearman brown, KR20, or cronbach's alpha Interrater: consistency between more than one researcher. uses Kappa coefficient which looks at the measure of agreement between two researchers observing so we know how much variance is being explained by raters vs preformance
What is a carryover effect?
55. maybe someone who took test at time 1 will remember the the question time 2 they take it. therefore test retest usually overestimates (inflates) the true reliability
Define parallel/alternate forms reliability. What are its advantages and disadvantages?
56. correlation between equivalent forms of a test. when you have one hypothetical construct and you have two different test to measure that construct. so when you administer it you have two different ways to get information. you get good info but it is very rigorous and hard to create two different forms of the same test, can be expensive, difficult. ex: depression and you have self report scale and observational scale to measure it. pearson r for interval or ratio level data spearman for ordinal (ranked) data
Define split half reliability. How is this measured?
57. split half: take test and split it in half, give them first half of test, then give them second, compare the scores reliability is deflated bc halves can never be as reliable as the whole. to measure you must use the Spearman Brown Formula that measures internal consistency.
How do the different aspects of internal consistency differ?
58. alpha: compare all the items and how they compare to each other. coefficient alpha general reliability for things that dont have a right wrong answer like a liker scale KR20: for dichotomous items/ true false (two options) spearman brown: measures internal consistency suited for test being split in half
Understand the major components of inter-rater reliability.
59. index of agreement for 2+ judges. consistency between different researchers recording measures. kappa scores
Define standardization? Why is it important to obtain a standardization sample?
6. the method you use to get your scores that conforms to a standard. standard conditions: precisely the same instructions and conditions. this standardization allows you to compare the the results to a new sample and see if it is above or below it. it is not appropriate to compare an individual with a group that does not have the same characteristics as the individual and is not standardized.
What is the Kappa statistic and how does it relate to reliability?
60. is the best method for assessing observing assessor agreement. look at two different observer scores and if they are close to 1 it is good and if its not you go again. -1 to 1. hoping for something close to a positive 1
Know the Summary of Reliability Table from lecture
61. table on his last slides posted. know the sources of error variance
What does the standard error of measurement do?
62. tell us on average how much a score varies from the true score. true score is if you are in a perfect world without error.
What factors should be considered when choosing a reliability coefficient?
63. know the names of each coefficient look at type of data ordinal interval ratio know which stats fit. spearman brown for split half, kr20 for dicomonous, alpha for general
Why types of irregularities might make reliability coefficients biased or invalid?
64. researcher bias observer bias different scoring criteria
How can one address/improve low reliability?
65. increase length of the test, throw out items that bring down reliability, estimate true correlation if test did not have a measurement of error. bigger sample= better reliability discriminability analysis: examines the correlation between each item and the total test score. if it is low then item is measuring something else.
What is the purpose of factor and item analysis?
66. factor analysis: find factors that are associated to your items. so you can interpret score item analysis: interpret how good your items are both help you know how accurate you are at measuring what you want to measure
What example was given in class regarding reliability
67. verbal fluency: how many words can you think of when given first letter- depends on letter bullseye example- cluster but not by bullseye is still reliable but not valid, cluster at bullseye is reliable and valid
What are the stages of test development?
68. step 1: test conceptualization (preliminary questions: what are you constructs, audience, objectives ) step 2: test construction (scaling methods, scoring methods, writing items) step 3: test tryout step 4: item analysis (good test should be reliable, valid, discriminatory, unbiased) step 5: test revision
Define and know examples of incremental validity and ecological validity.
69. incremental: does it add something new to existing literature (cheese love- something new but not useful) Ecological: does it apply to the real world
Define representative sample and stratified sample. Know when and why representative and stratified samples are collected.
7. representative: randomized sample that you compare to the population stratified: you want specific criteria for subgroups. ex: you want to look at population of LA and make assumption it would look different than population at byu. you would have different percentages of each ethnicity
Define dichotomous and polytomous format. Common examples? Advantages? Disadvantages?
70. dichotomous: gives two options (true/false, yes/no) advantages- simplicity, quick scoring, absolute judgement disadvantages- encourages memorization, 50% chance of guessing right, to be reliable there must be many items, truth is often in shades of gray polytomous: multiple choice and matching. there are distractors (wrong answers) 3-4 is probably best. advantages- ease of administration and scoring, probability of guessing correct answer is lower disadvantages: harder to write, still testing recognition not recall of information, need excellent distractors for high reliability
Which types of questions are "selected-response format"?
71.true false or multiple choice. you get to choose the response you want to pick. There is a correct answer there.
What are the two major formats of summative scales, as given in lecture? What type of data do they create?
72. likert format scale: multiple point scale that looks at degree of agreement, popular in personality test and measurements of attitude category: scale of 1 to 10. similar to likert but uses many points. used to make more fine grained discriminations, clearly defined anchoring intervals (midpoints and endpoints) items are summed to create composite or cumulative scores
Be able to define and recognize the Likert Format. What scales most frequently use the Likert format?
73. personality test and 4/5 options
What are the primary differences between the Likert and Category formats?
74.
In creating a category format, the use of what will reduce error variance?
75. category format is 1-10/ reduce error by having clearing defined anchor points and intervals. define end points and middle point to help people have accurate answers
When does the category format begin to reduce reliability?
76. more than 9-10 points
What are the four questions that should be asked when generating a pool of candidate test items?
77. 1. what content domain (construct) should the test items cover 2. how many items should i generate 3. what are the demographics of my population 4. how shall i word my items
What are the four ways to score tests and how is each differentiated from the others?
78. 1. cumulative scoring: items summed to create total score 2. subscale scoring: test scores are divided into subscales that are independently summed 3. class or category scoring: test takers are classified according to the degree that they meet the criteria for a category. ex: DSM 4. ipsative scoring: forced choice. respondents are forced to choose between two traits.
Define item analysis. What two methods are closely associated with item analysis?
79. discriminability and difficulty of an item. if it has high item difficulty then a lot of people got it right item 1= .98 means 98% get it right
Define hypothetical construct
8. Something that is not directly measurable, but which is inferred to exist and to produce measurable phenomena. ex: intelligence, faith, resilience, hope, self esteem, love
Define item difficulty. What does the proportion of people getting the item correct indicate?
80. item difficulty index= the % of people that got it right. if it is high then its an easy question
Define item discriminability. What is good discrimination? What are two ways to test item discriminability?
81. discriminability: people that do really well on the test should do well on the individual items 1. extreme group method: comparing the top third to bottom third. only takes the extremes into consideration 2. point by serial: comparing one item to whole test. takes everyone into consideration
Will guessing help you on an exam?
82. when there is a score correction than dont guess unless you can narrow it down to 2 it bc you are penalized
Know and be able to identify examples of a double-barreled item.
83. two concepts within one question
Define and explain how the extreme group and point biserial methods differ.
84. extreme group: calculate the proportion of people in each group (usually top or bottom quartiles) and calculate the difference in the groups. high values mean good discrimination. zero means none, and low means reverse discrimination and bad item biserial method: find correlation between item performance and total test performance, hard with test with only a few items, closer the value is to 1.0 the better the item is, if it is low or negative get rid of item.
Define item characteristic curve. Know what information the X and Y axes give as well as slope
85. looks at if a test has good discrimination. x axis: plots of ability (high vs low scoring groups) y axis: probability of correct response slope: extent of item discrimination. higher slope = good discrimination, positive slope= more high than low scores, negative slope= more low scores than high
When shown an item characteristic curve, be able to determine good or poor discrimination
86.
What is systematic error variance called? Is it good or bad and why?
87. bias. its bad bc it messes up your data and you come to the wrong conclusions
Know ceiling effects, floor effects, and indiscriminant items.
88. ceiling effect: everyone gets it right so its bad bc you dont know if it is really getting at your construct floor: everyone gets it wrong indiscriminate: everyone scores in the middle so it also doesn't discriminate and tell you anything
Define operational definition, measurable phenomenon, and hypothetical construct
9. operational definition: Defining a way to measure a hypothetical construct. Precisely defined, measurable, replicable, reliable, valid, and unbiased. ex: "Total number of chocolate boxes, flowers, thoughtful note cards, and plates of goodies exchanged during the week preceding St. Valentine's Day"easurable phenomenon: Includes "indicators" of romantic love such as: •PDA's •Moonlight walks •Manifestations of cooking prowess hypothetical construct:Something that is not directly measurable, but which is inferred to exist and to produce
define test and item
test: a measurement device to quantify behavior or understand and predict behavior. can be individual or group. items: specific questions, problems, stimulus that make up a test. they can be scored or evaluated