Psych 309 exam 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

79. Define item analysis. What two methods are closely associated with item analysis?

A set of methods used to evaluate "goodness" of test items, one of the most important aspects of test construction Item difficulty and item discriminability

Define achievement testing.

Achievement: measures previous learning Are they learning what you expect them to learn past

Think of concrete examples of each of the different scales of measurement.

Activity: Measure ghosts Nominal: ghost species/names, temple worthiness Ordinal: ranking scariness Interval: give each ghost an IQ test?, temperature shifts Ratio: Age, weight, height. Frequency reader

Interval

Adjacent values represent equal intervals No true zero point Can be added and subtracted, averaged, ranked Calendars, fahrenheit and celsius, IQ

Understand Normal Distribution conceptually.

Also known as symmetrical binomial probability distribution a function that represents the distribution of many random variables as a symmetrical bell-shaped graph.

88. Know ceiling effects, floor effects, and indiscriminate items.

Ceiling effect-all at the top Floor effect-all at the bottom Indiscriminate- mediocre so that everyone scores in the middle

Define central tendency. Know the three types of central tendency and how to calculate each.

Central Tendency: a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Measures of Central tendency Mean*** Average Median Middle Mode Most

68. What are the stages of test development?

Conceptualizing Objectives, use, settings, decisions, scaling Constructing Writing test items Scoring Testing/analyzing

What is reliability?

Consistency, stability, dependability of measurement Test that are relatively free of measurement error are "reliable" Getting the same results repeatedly There will always be error in measurement

What is the co-efficient of determination? What is the purpose of the co-efficient of determination?

Correlation coefficient squared Proportion of total variation in scores on Y that we know as a function of information about X Helps us determine how much percentage we can explain in variation of performance Amount of shared variance between variables

78. What are the four ways to score tests and how is each differentiated from the others?

Cumulative- all together, summative scales Sub scale scoring- divided into sub-scales that are independently summed -ACT, GRE Class or category scoring- participants are classified according to the degree that they know certain criteria (diagnosing scoring) DSM 5 diagnosing Ipsative Scoring Choose between two traits that best describe them

Define item discriminability. What is good discrimination? What are two ways to test item discriminability?

Determines whether the people who have done well on particular items have also done well on the whole test Good discrimination means those that know it will do well The extreme group method The point biserial method

What types of questions are answered by psychologists through assessment?

Diagnosis and treatment planning Evidence based treatment Monitor treatment progress Help clients make more effective life choices/changes Helping third parties make informed decisions (employers, police academies, etc.)

70. Define dichotomous and polytomous format. Common examples? Advantages? Disadvantages?

Dichotomous format (true/false, 2 options) Easy to administer and requires absolute knowledge, BUT it encourages memorization without true learning and 50% of getting it right from guessing, need a lot of questions to be reliable Polytomous (multiple choice) Includes distractions (multiple wrong answers) Guessing hurts if there is a correction, better to leave blank If not correction, always guess It's easy to administer BUT you need good distractors and it's more recognition rather than recall

Define residual?

Difference between expected and observed frequencies You want smaller residuals (smaller error)

85. Define item characteristic curve. Know what information the X and Y axes give as well as slope

Does the test item discriminate between the top and bottom quartiles? Graph the characteristics of test items For some, we can prepare a graph for each one X axis: plots of ability Y-axis: probability of correct response Slope of the curve represents the extent of item discrimination Higher slope: higher discrimination Positive slope: more high than low scorers got the item right No discrimination is the goal! (no slope)

Know and be able to identify examples of a double-barreled item.

Double-barreled item: The question asks two things at once Ex: Agree or disagree that cars should be faster AND safer

66. What is the purpose of factor and item analysis?

Factor Analysis: powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data. Item Analysis: intended to assess and improve the reliability of your tests

. Define frequency distribution and histogram? What kind of data are shown in each?

Frequency distribution: displays scored on a variable or a measure to reflect how frequently each value was obtained All possible scores, how many people obtained each score. Horizontal Axis:lowest to highest value Vertical Axis: How many times each was observed Usually bell shaped Histogram: type of graph to show frequency a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.

What are the five characteristics of a good theory?

Generative (or fruitful) Predicts the existence of regularities (patterns) that may not have been suspected before the theory was proposed Facilitates discovery Not circular Explanatory power Explain patterns of a variable's behavior that we know or suspect to exist with some accuracy Broad scope Generalizable Systematic All parts of a theory fit together coherently Parsimonious (Occam's Razor) the simplest explanation of an event or observation is the preferred explanation

86. When shown an item characteristic curve, be able to determine good or poor discrimination

Good: people should be scoring better who know it better

What is a scatterplot (scatter diagram)? How does it work?

Graphical tools to identify linear and nonlinear relationships Bivariate scatterplot Once variable on x, one on y Visual data can be inspected to determine degree Reveals curvilinear relationships Detecting bivariate outliers Detecting restricted range problems Ceiling effect Floor effect

82. Will guessing help you on an exam?

Guessing will only help if it's not counted against you to get it wrong

53. In what ways can error impact the observed score?

If you have more error, the observed score will go up Try to minimize the error so we're closer to the true score

65. How can one address/improve low reliability?

Increase test length Throw out items that run down the reliability after individual item analysis -want to measure common characteristic Discriminability analysis Estimate true correlation if test didn't have measurement of error

76. When does the category format begin to reduce reliability?

Increasing the number of choices beyond nine or so can reduce reliability because responses may be more likely to include an element of randomness when there are so many alternatives that respondents cannot clearly discriminate between the fine-grained choices.

69. Define and know examples of incremental validity and ecological validity.

Incremental Validity: validity that is used to determine whether a new psychometric assessment will increase the predictive ability beyond that provided by an existing method of assessment. Ex:a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone Ecological Validity: a measure of how test performance predicts behaviours in real-world settings Ex: a study in which a participant drove with a steering wheel would have more ecological validity than one in which the participant drove by moving the cursor of a computer with a mouse.

What is the standard error of estimate? What is its relationship to the residuals?

Index of accuracy of prediction approximately how much error (residual) you make when you use the predicted value for Y (on the least-squares line) instead of the actual value of Y.

In what settings do psychologists assess and what is their primary responsibility in each?

Inpatient units: diagnosing psychopathology Forensic: insanity defenses, competency, psychopathology, evaluations Schools: diagnosing learning disabilities, ADHD, etc. Career counseling: testing/assessing interests and aptitudes Employment: HR, employee performance evaluation Premarital counseling: assessing compatibility

Define intelligence testing.

Intelligence: Overall mental ability, potential to solve problems IQ Current

60. What is the Kappa statistic and how does it relate to reliability?

Inter-rater reliability a statistical measure of the degree of agreement or concordance between two independent raters that takes into account the possibility that agreement could occur by chance alone

62. What does the standard error of measurement do?

It allows us to estimate the degree to which a test provides inaccurate readings...how much rubber there is The larger it is, the less certain we can be about the accuracy

To avoid bias, how should error be distributed in a psychological test?

It needs to be distributed unsystematically -no systematic error is wanted Content validity Positively and negatively worded items Good reliability Good discriminability Start with a larger item pool You can weed out bad items Cross-validation: different populations

What is the regression formula? Understand the different components of the formula and how they are applied.

It says how much one thing predicts another y=a +Bx Regression can be infinite, multiple predictors, greater ability

58. How do the different aspects of internal consistency differ?

Kuder-Richardson formula 20 (KR20) dichotomous Considered all possible ways of splitting the items ^^^all measure how well the items on a test measure the same ability or trait

Define kurtosis and be able to identify its different types, including leptokurtic, platykurtic, and mesokurtic.

Kurtosis: Index of "peakness" vs "flatness" Leptokurtic:distributions with positive kurtosis larger than that of a normal distribution Platykurtic: a statistical distribution in which the excess kurtosis value is negative Mesokurtic:the outlier characteristic of a probability distribution in which extreme events (or data that are rare) is close to zero

73. Be able to define and recognize the Likert Format. What scales most frequently use the Likert format?

Likert Format is used to determine agreeance... surveys Behavioral tests

74. What are the primary differences between the Likert and Category formats?

Likert tests agreeance Category has more fine-grained discriminations, 9 or 10 point max

63. What factors should be considered when choosing a reliability coefficient?

Link name of efficient with type of test, look for chart Type of data (ordinal, nominal, etc) Type of test Ratio= variance of true score vs. observed score. The differences in variability between each test, what the nature of the test is (e.g. multiple choice, continuum of agreement) All of the measures of internal consistency evaluate the extent to which the different items on a test measure the same ability or trait When the two halves of a parallel forms test are of unequal variances, use Cronbach's alpha- for internal consistency When variances are equal, Spearman-Brown coefficient gives the same results The Coefficient Alpha/cronbach's alpha determines the general reliability of a test when there is no right or wrong answer--a continuum of agreement KR20 should be used to calculate the reliability of a test in which the items are dichotomous.a measure of the accuracy of a test or measuring instrument obtained by measuring the same individuals twice and computing the correlation of the two sets of measures Strength and similarity

What is a z score? How is it calculated?

Mean of 0 SD of 1 Transforms data to standardized units Normal distribution: symmetrical on both sides of the middle z= (X-Xbar)/s... (Score-mean)/SD Standard scores: raw scores that have been transformed from one scale to another with an arbitrarily set mean and SD Mean of 0 SD of 1

Know the advantages and disadvantages of the different measures of central tendency and when to use them.

Mean: use for normal data Outliers affect the mean Takes all scored into account Median: use for skewed data Shows very middle Doesn't take all scores into consideration Mode: Least used, but good for nominal data (categorical) Not defined when there's no repeats Not based on all values Very unstable with small sets Easy to calculate Not affected by extreme values

Define measurable phenomenon

Measurable phenomenon: All phenomena the construct generates "the tip of the iceberg". There's a lot we can't see Veil of measurability- can't see Underlying constructs

What is the difference between simple linear regression and multiple regression?

Multiple: It is used when we want to predict the value of a variable based on the value of two or more other variables.

In creating a category format, the use of what will reduce error variance?

Need to make the endpoints of the scale clearly defined Subjects need to be frequently reminded of the definitions of the endpoints

What are norm- and criterion referenced tests? How is each unique?

Norm-referenced tests: Compare a test takers's performance with others Class standing Ranking Percentile rank Is the norm group an appropriate comparison group for this individual? How similar are they? Criterion-referenced tests: Measures performance against an established criterion EX) 70% is passing Predicts performance outside the test No child left behind, gifted placement, eppp Is the criterion measuring performance a good criterion? Is the criterion relevant?

Define and differentiate between norm-referenced and criterion-referenced tests

Norm-referenced tests: standardized tests that are designed to compare and rank test takers in relation to one another SAT, IQ Tests, Curved Tests Criterion-referenced tests: Compares performance with some clearly defined criterion for learned whether or not an objective has been achieved, not against others AP Tests

Define norm, norming, and standardization. For what is each used?

Norm: Psychometric definitions Gender: girls to girls, boys to boys Race: controversial ethnicity to ethnicity Age: compare same age to peers Grade: students in the same grade National and Local norms Norming: administer to normative sample to create test norms Standardization: develop specific procedures for the administration for the administration, scoring, and interpretation of a test

56. Define parallel/alternate forms reliability. What are its advantages and disadvantages?

Parallel forms reliability: compares two equivalent forms of a test that measure the same attribute. They use different items; however, the rules used to select items of a particular difficulty level are the same. When given on same day, only variation is random error and different of forms of test Doesn't occur often in practice People don't like making more than one test

Understand the concept of percentiles.

Percentiles:Percentage of test-takers whose scores fall below a given raw score What is the percentile rank of a given score Z=0=50%

Understand and be able to differentiate and plot positive, negative, and 0 correlation

Positive: up Negative: down 0: none

What is the principle of least squares? How does it relate to the regression line?

Principle: states that the most probable values of a system of unknown quantities upon which observations have been made, are obtained by making the sum of the squares of the errors a minimum. Helps you find the..... Regression line: line of best fit...

Define psychological testing and psychological assessment. How are they different?

Psychological Testing: Obtaining information that measures characteristics of human being that pertain to behavior Psychological Assessment: gathering data for the purpose of making an evaluation Tests, interviews, case studies, behavioral observations

What is psychometry? What are the two major properties of psychometry?

Psychometry: the branch of psychology dealing with the properties of psychological tests Evaluating how good a test is Reliability: is it accurate and repeatable and dependable? Validity: does the test measure what it purports to measure?

What are quartiles? What is Interquartile range?

Quartiles: points that divide the frequency distribution into equal fourths Median: middle number Interquartile range: bounded by the range of scores that represent the middle 50% of the distribution The IQR may also be called the midspread, middle 50%, or H‑spread. It is defined as the difference between the 75th and 25th percentiles of the data

If a test is reliable its results are what?

Repeatable, valid, etc

Define representative sample and stratified sample. Know when and why representative and stratified samples are collected.

Representative sample: A sample drawn in an unbiased or random fashion so that it is composed of individuals with characteristics similar to those for whom the test is to be used. Ideal choice for sampling analysis because they are more likely to align with an entire population group. When a sample is not representative, it can be known as random BYU: representing the amounts of race present Stratified sample: A method of sampling that involves the division of a population into smaller sub-groups known as strata This is more time consuming, but more high quality because it requires more upfront information BYU: trying to get equal amounts of race

What types of irregularities might make reliability coefficients biased or invalid?

Researcher, observer, administrator and scorer bias, different scoring criteria (eg. the Rorschach)

50. What contributes to measurement error?

Situational factors How the test was created Test construction, administration Test scoring/interpretation

Define skewness and be able to identify positive and negative skew.

Skewness: Index of degree symmetry absent Positive: Most values clustered around the left tail Negative: Most values clustered around the right tail

Define hypothetical construct

Something that is not directly measurable, but which is inferred to exist and to produce measurable phenomena.

Define standardization? Why is it important to obtain a standardization sample?

Standardize a test: develop specific procedures for the administration for the administration and scoring of a test Standard Scores Raw scores that have been transformed from one scale to another with an arbitrarily set mean and SD

Be able to define, recognize, and differentiate between states and traits

State is temporary moment of being Traits is a long lasting characteristic

What is the difference between structured and projective personality tests?

Structured: can be unequivocally scored, assumes that a subject's response could be taken at face value Provide a relatively unambiguous test stimulus and specific alternative responses First one: Woodworth Personal Data Sheet- WWI Many were later created, but they were later analyzed and nearly driven out of existence Projective: Often subjective scoring Ambiguous stimulus and unclear response requirements Rorschach inkblot test Thematic Apperception Test (TAT) -Henry Murray More structured, make up story to ambiguous scene

What are the two major formats of summative scales, as given in lecture? What type of data do they create?

Summative Scale: Summed to form composite or cumulative scales (not pass or fail) ***Likert Scales Multiple point scales indicating degree of agreement negatively worded items are reversed scores Requires an assessment of item discriminability Category Format Many-point scales Fine grained discrimitions Scale of 1-10 (shouldn't be more than that)

How are T scores different from Z scores?

T scores the mean is 50 and SD is 10 harder to get into negative numbers Mean of 50 SD of 10 T=z(10)+50

Define split half reliability. How is this measured?

Tests are given once and divided into separately scored halves First/second, even odd Both halves need to be equivalent Use the spearman brown formula Cronbach's alpha****(best one for this class) Provides the lowest estimate of reliability that one can expect

What is the Pearson product moment correlation? What meaning do the values -1.0 to 1.0 have?

Tests degree of relationship between two variables. Covariance in standardized units -1.0 is strong and negative, 1.0 is strong and positive

Define test and item

Tests: A measurement device or technique to quantify behavior or aid in the prediction of behavior (can be individual or group) Item: a specific stimulus to which a person responds overtly (can be scored or evaluated)

What is the Correlation Coefficient? With what concept should correlation not be confused?

The Pearson product-moment correlation coefficient is a statistic that is used to estimate the degree of linear relationship between two variables. Correlation should not be confused with causation

What example was given in class regarding reliability

The bullseye!

Equal intervals

The difference between two points on a scale has the same meaning as the difference between two other points that differ by the same number of units

Define and explain how the extreme group and point biserial methods differ.

The extreme group method: compares those who have done well with those who have done poorly The point biserial method: performance on the item versus performance on the whole test

Define item difficulty. What does the proportion of people getting the item correct indicate?

The number of people who get a particular item correct More correct means it's a better test

Magnitude

The property of "moreness", when an instance of the attribute represents more, less, or equal amounts of quantity

87. What is systematic error variance called? Is it good or bad and why?

The variance of the random or unexplainable component of a model Reliable: free from systematic error variance

Ratio

True zero point, you can do anything with it mathematically Age, weight, height ***scale defines which mathematical operations we can use

What is shrinkage?

Values are "shrunk" towards a central value Often occurs when a regression equation is calculated using one group of subjects and used to predict performance in another group of subjects. Regression analysis is prone to overestimate a relationships's strength between variables It not only takes the true nature of the relationship into account, but also "chance relationships" Likely in small samples

Define variance and standard deviation.

Variance: average squared deviations about the mean Standard Deviation: positive square root variance, average measure of spread-outness

7. What are the four questions that should be asked when generating a pool of candidate test items?

What content domain (construct) should the test items cover? How many items should I generate? Make them clearly defined Use expert judgment...

Absolute Zero

When nothing of the measured quantity exists, extremely difficult if not impossible to define an absolute zero

55. What is a carryover effect?

When the first testing session influences scores from the second session When this happens, the test-retest correlation usually overestimates the true reliability

51. What components make up Classical Test Score Theory?

X=T + E Observed value equals the true value plus the error Error is RANDOM Scale that is always 3 lbs. Heavy Scale that is always off, but randomly so

ordinal scale

a scale of measurement in which the measurement categories form a rank order along a continuum Ranks, counted, proportioned Can't be added/subtracted. We don't know distance between them, just that one has a higher magnitude than the other Best college, 1st, 2nd, 3rd

What is factor analysis?

a statistical procedure that identifies clusters of related items on a test Studies interrelationships among a set of variables without reference to an external criterion A data reduction technique Intensifies underlying constructs that drive responses to your items When two items correlate highly, it is because they measure the same thing or load onto the same factor A factor is extracted for each "cluster" of highly intercorrelated items Factors are a group of variables

Define covariance

how much both variables change together The degree to which two variables vary/change/move Z-scores, same metric for both variables As one value changes, the other changes in the same or opposite direction Pearson correlation coefficient [r] Outliers can have large impacts, distort

Define operational definition

identifying variables and quantifying them so they can be measured efining a way to measure a hypothetical construct Precisely defined, measurable, replicable, reliable, valid, unbiased Ex) total number of chocolate boxes, or flowers bought on valentine's day Can we ever measure something 100%? NO THERE IS ALWAYS MEASUREMENT ERROR

Define aptitude testing.

measures potential for acquiring a specific skill Likelihood of learning a certain thing future

Which types of questions are "selected-response format"?

multiple choice, true/false, matching, short answer, and fill-in questions

Test reliability is usually estimated in one of what three ways? Know the major concepts in each way.

over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability)

Know the different types of correlations and when they are used.

positive, negative, no correlation

Nominal

qualitative Boy vs girls, political, religious

Know the Summary of Reliability Table from lecture

see notes

What is restricted range? To what does it lead?

term applied to the case in which observed sample data are not available across the entire range of interest. Leads to reduce variance and may alter strength of correlation coefficient.

52. Know what an observed score is.

the actual score that you got added to any error

59. Understand the major components of inter-rater reliability.

the degree of agreement among independent observers who rate, code, or assess the same phenomenon

Discrete:

two or more mutually exclusive categories One or the other (pregnant or not pregnant)

What are test batteries?

two or more tests used in conjunction A series or group of tests that are used together Trying to test different sides of a thing All tests are delivered in one administration, together Ex: Diagnosing medical disorder, test different sides

Continuous

values may be theoretically be divided into progressively smaller units Age


Conjuntos de estudio relacionados

Всё, что вы хотели знать о функциях в Python

View Set

Chapter 10: Leadership, Managing and Delegating

View Set

3 Domains/6 Kingdoms of Life (SHORTENED)

View Set

UCF Dr. Dow Anatomy Exam 3 study guide

View Set