PSYCH 309 Midterm #1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Know the four scales of measurement

(NOIR) nominal, ordinal, interval, ratio

magnitude

(property of scale) property of moreness

equal intervals

(property of scales) Difference between 2 points on a scale has the same meaning as the difference between 2 other points that differ by the same number of units

absolute zero

(property of scales) nothing of the measured property exists

Define and explain how the extreme group and point biserial methods differ.

-Extreme Group Method: compares people who have done well with those who have done poorly on the test -The Point Biserial Method: find the correlation between performance on the item and performance on the total test

What factors should be considered when choosing a reliability coefficient?

-Homogeneity v heterogeneity of items: is the test measuring a multi-faceted or uni-faceted construct? -Dynamic v static characteristics: is the true score fluctuating or relatively stable over time? Does it change frequently or from situation to situation?

When shown an item characteristic curve, be able to determine good or poor discrimination

-high positive slope - good discrimination -no or negative slope- bad discrimination

How do the different aspects of internal consistency differ?

-internal consistency: examine how people perform similar subsets of items selected from the same form of the measure consistency of items within the same test. evaluate the extent in which the different items on a test measure the same ability or trait -Split half: corrected correlation between two halves of the test -KR20: requires you to find the proportion of people who got each item "correct" -Alpha: designed to use KR20 with tests where there is not a right or wrong answer (such as a likert scale test) more general reliability estimate

Polytomous format

-resembles dichotomous format but there are more than two alternatives, multiple choice tests, easy to score, takes little time for the test taker, incorrect choices are called distractors, Well-chosen distractors are an essential ingredient for good items -ADVANTAGE: good to tell if someone really know the information -DISADVANTAGE: poorly written distractors can adversely affect the quality of a test

how do you measure split half reliability (internal consistency)

-this can cause problems when one half is more difficult than the other, if this is the case its better to use the odd-even system where one subscore is obtained from odd numbered items and vice versa -to estimate the reliability you need to find the correlation between the two halves -spearman brown formula: corrects for the half length: corrected r= 2r/1+r where r is the estimated correlation between the two halves of the test if each test had the total # of items in the test

Dichotomous Format

-two alternatives for each item Most common is T/F -ADVANTAGES: easy construction and easy scoring, T/F require absolute judgment -DISADVANTAGES: encourage students to memorize material, "truth" comes in shades of gray, do not allow test takers to show they understand complexity, tends to be less reliable and less precise

skewness

A measure of the shape of a data distribution

criterion validity

A property exhibited by a test that accurately measures performance of the test taker against a specific learning goal.

interval scale

Adjacent values on the scale represent equal intervals in magnitude of the attribute being measured. What properties does it have? -data can thus be: Classified, Counted, Proportioned, Rank-ordered, added (creating a total-scale score), Subtracted, divided to form averages (calculating a scale mean) -data does not have a "0" point and cannot be: Divided to form ratios -temperature

advantages and disadvantages of mode

Advantages: easy to obtain, only measure that can be used for nominal scale Disadvantages: not stables sample to sample, there may be more than one mode for a particular set of scores

advantages and disadvantages of median

Advantages: less sensitive to extreme scores, distributions that are skewed the median is the best measure Disadvantages: it responds to how many scores are above or below but not how far away the scores are

advantages and disadvantages of mean

Advantages: the best choice when we need a measure of central tendency to reflect the total scores, stable from sample to sample, most resistant to chance sample variation Disadvantages: reactive to exact position of each score, it gives undue weightage to extreme values

measurable phenomenon

All phenomena the construct generates (gives rise) (ex. Panic attacks and operational def if minutes spent worrying, # of anxious thoughts)

ordinal scale

Assignment of ranks according to the degree to which the measured attribute is present/absent. What properties does it have? -data can be: Classified, Counted, Proportioned, Rank-ordered -data cannot be: Added/Subtracted or Divided to form averages/ratios -ex. how do you feel on a scale of 1-10

Know and be able to identify examples of a double-barreled item.

Avoid "double-barreled" items that convey two or more constructs at the same time

Define Content Validity: How is it measured?

Based on the correspondence between the item content and the domain the items represent. Measured through asking whether the items are a fair sample of the total potential, consider the wording of test items (Does my test get at the whole domain?)

What are the primary differences between the Likert and Category formats?

Category Format: Like the Likert format but has an even greater number of choices,10 point rating scale, people will change ratings depending on the context, problems can be avoided if the endpoints of the scale are clearly defined and the subjects are frequently reminded of the definition of the endpoints

What are the two types of evidence in construct validity?

Convergent and discriminant

What is the Correlation Coefficient? With what concept should correlation not be confused?

Describes how much two measures or items covary. How similar is the variance between the variables. Not to be confused with causation

What types of questions are answered by psychologists through assessment?

Diagnosis and Treatment Planning, Monitor Treatment Progress, Help clients make more effective life choices/changes, Program evaluation, Helping third parties make informed decisions

IQR

Discards the distribution's upper and lower 25% and taking what remains IQR = Q3 - Q1 (middle 50% or 75th to 25th%)

To avoid bias, how should error be distributed in a psychological test?

Double blind, random sampling, want error to be unsystematic and random!

Which two types of validity are logistical and not statistical? Why?

Face validity and content validity Require good logic, intuitive skills, and perseverance

What are the five characteristics of a good theory?

Has explanatory power, broad scope, systematic, fruitful, Parsimonious

Why types of irregularities might make reliability coefficients biased or invalid?

How could you introduce bias basically. environment, personal, evaluator bias, not the same scoring material, tired participants, personal effects

How can one address/improve low reliability?

Increase the number of items, Factor and Item analysis, the reliability of a test depends on the extent to which all of the items measure one common characteristic, Correction for attention: estimating what the correlation between tests would have been if there had been no measurement error, Estimate the true correlation if the test did not have measurement error

Kurtosis

Index of the "peakedness" vs. "flatness" of a distribution

In what settings do psychologists assess and what is their primary responsibility in each?

Inpatient, Schools, Forensic (legal) settings, Employment settings, such as corporations and law firms, Career counseling, Pre-marital counseling

What is the Pearson product moment correlation? What meaning do the values -1.0 to 1.0 have?

It is a ratio used to determine the degree of variation in one variable that can be estimated knowledge about variation in the other variable. The closer to +1, the stronger the positive correlation is. The closer r is to -1, the stronger the negative correlation is. If |r| = 1 the variables are perfectly correlated! (continuous variables)

ratio scale

Measured on a scale with a true "0" point. Allows all mathematical operations. It can be meaningfully: Classified, Counted, Proportioned, Rank-ordered, added (creating a total-scale score), Subtracted, divided to form averages (calculating a scale mean), divided for form ratios -ex. weight scale

median

Middle score in the distribution (50% ↑ 50% ↓) Rank the scores (include repeating scores) from lowest to highest If an odd number of scores, select the middle score. If an even number of scores, take the average of the middle scores. measure of central tendency

Playtikurtic

Negative kurtosis = flatter distribution (Plate)

percentiles

Percentage of test-takers whose scores fall below a given raw score

Leptokurtic

Positive kurtosis = more peaked distribution (Leaping)

hypothetical construct

Processes that are not directly measurable, but which are inferred to have real existence and to give rise to measurable phenomena

What is psychometry? What are the two major properties of psychometry?

Psychometry: the branch of psychology dealing with the properties of psychological tests -Reliability: dependability, consistency, or repeatability of the test results (measuring tool) -Validity: Does a test measure what it purports to measure? Accuracy

What is the relationship between reliability and validity?

Reliability is not necessary but sufficient for validity

What is factor analysis?

Studies interrelationships among items within a test. Data-reduction technique. Can be used as measure of internal consistency

How are T scores different from Z scores?

T scores (Unlike Z Scores): Mean = 50 and standard deviation = 10. Are all positive, Values > 70 are often considered "clinically significant" (2 SD's above)

What are the stages of test development?

Test conceptualization, Test construction, Test Tryout, Item Analysis, Test revision

What should be asked when generating a pool of candidate test items?

Test is covering universe of construct. What content domain should the test items cover? How many items? What are the demographics of population? How should I word my items?

Norm (testing)

Testing in which scores are compared with the average performance of others. Test norms are created during the standardization process and must be periodically updated.

Ecological validity

The extent to which a study is realistic or representative of real life.

What is the co-efficient of determination? What is the purpose of the co-efficient of determination?

The proportion of the total variation in scores on Y that we explain through X (r^2)

standardization. why is it important to obtain a standardization samle?

The uniform procedures used in the administration and scoring of a test. This is important because without it the meaning of scores would be almost impossible to evaluate

What is restricted range? To what does it lead?

Using a sample of people who won't fit the test or test is too easy or hard. It reduces range and variability (leads to flooring and ceiling effect)

nominal scale

Variables can be named - put into categories. What properties does it have? -Values symbolize category membership, and can be: Classified, Counted, Proportioned -data cannot be meaningfully: Ranked, Added/Subtracted, divided to form averages/ratios -ex. labels

Define item analysis. What two methods are closely associated with item analysis?

a general term for a set of methods used to evaluate test items, one of the most important aspects of test construction. The basic methods involve assessment of item difficulty and item discriminability

test

a measurement device or technique used to quantify behavior or aid in the understanding and predictions of behavior

What is a scatterplot (scatter diagram)? How does it work?

a picture of the relationship between 2 variables. each point on the diagram shows where a particular individual scored on both X and Y

item

a specific stimulus to which a person responds overtly (questions on a test)

Define split half reliability

a test is given and divided into halves that are scored separately. the results from each half are compared to one another

If a test is reliable its results are what?

accurate, dependable, consistent, or repeatable

What is a construct?

an indicator variable that measures a characteristics, or trait. For example, college admission scores are constructs that measure how well a student is likely to do in their first year. Construct validity measures how well the observed construct predicts the outcome expected.

What is incremental validity?

are you offering something new with your test

What components make up Classical Test Score Theory?

assumes that each person has a true score that would be obtained is there were no errors in measurement. This is used to understand and improve reliability of test. X=T+E (observed score=true score+error)

What is systematic error variance called? Is it good or bad and why?

bias, bad

Know the different types of correlations and when they are used.

biserial correlation, point biserial correlation, phi coefficient, spearmans rho

norm referenced tests

compares a test takers performance to others

psychological assessment

comprised of tests, interviews, case studies, behavioral observations, apparatus, etc. it is comprised of psychological tests

What are the three main types of validity evidence?

construct related, criterion related, content related

concurrent validity

criterion is at the same time as the measure. you work at the factory and they give you a test to make sure you know what your doing while you are there

Define item difficulty. What does the proportion of people getting the item correct indicate?

defined by the number of people who get a particular item correct.1st things a test constructor needed to determine is the probability that an item could be answered correctly by chance alone

operational definition

defining a way to measure a hypothetical construct

Reliability

dependability, consistency, or repeatability of the test results (measuring tool) -The proportion of true to total observed score variance

Define item discriminability. What is good discrimination? What are two ways to test item discriminability?

determines whether the people who have done well on particular items have also done well on the whole test - 2 ways to test item discrimination - Extreme group method - top 3rd and bottom 3rd - Point biserial - all people tested

standardization

develop specific (standardized) procedures for the administration, scoring, and interpretation of a test

What is a z score? How is it calculated?

difference between a score and the mean, divided by the SD

Define parallel/alternate forms reliability. What are its advantages and disadvantages?

different forms of the same test (ACT/SAT). They are hard to make the same in difficulty but are better tests to administer.

content validity

does the test represent the whole content. does it get to the whole breathe and depth of the content ex: if you are being tested on the first 6 chapters of a book, then content related validity is provided by the correspondence between the items on the test and the information in the chapters.

construct validity

does your test measure what it should measure. ex: trying to find out if an educational program increases emotional maturity in elementary school age children. Construct validity would measure if your research is actually measuring emotional maturity.

What is the regression formula? Understand the different components of the formula and how they are applied.

equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0)

In what ways can error impact the observed score?

error pulls from the true score

test-retest method

evaluates the error associated with taking a test at two different time (rorshark ink blot tests are not appropriate for this evaluation) there is a possibility of a carryover effect: when the first testing session influences scores from the second session (some remember their answers from their first test). or practice effects, when some skills improve with practice

Internal consistency

examine how people perform on similar subjects of items selected from the same form of the measure

What is the content validity ratio and how is it calculated?

experts look at it and the rate if it is related or not and give it a CVR content validity ratio. The formula of content validity ratio is CVR=(Ne - N/2)/(N/2), in which the Ne is the number of panelists indicating "essential" and N is the total number of panelists (good for educational testing)

biserial correlation

expresses the relationship between a continuous variable and an artificial dichotomous variable.

What is Criterion-related Validity?

how well the test relates to the criterion we are using.

postdictive validity

if the test is a valid measure of something that happened before. For example, does a test for adult memories of childhood events work?

Be able to define and recognize the Likert Format. What scales most frequently use the Likert format?

indicate the degree of agreement with a particular question, used with attitude and personality, "Strongly agree... strongly disagree"

Central tendency

indices of the central value or location of a frequency distribution with respect to the X Axis.

The Normal distribution

is the most common continuous probability distribution. The function gives the probability that an event will fall between any two real number limits as the curve approaches 0 on either side of the mean. Area underneath the normal curve is always equal to 1

Mesokurtic

kurtosis at zero. normal distribution (Medium)

Define item characteristic curve. Know what information the X and Y axes give as well as slope

learn about items by graphing their characteristics. X is ability level Y is probability of correct responses

What are the three properties of scales that make scales different from one another?

magnitude, equal intervals, absolute zero

Three types of central tendency

mean, median, mode

mode

measure of central tendency, the most frequently occurring score in the distribution

What is the Kappa statistic and how does it relate to reliability?

measures Inter rater reliability (observations of the samples with more types of judgment) indicates the actual agreement as a proportion of the potential agreement following correction for chance agreement

Criterion referenced tests

measures performance against an established criterion

spearman's rho

method of correlation for finding the association between two sets of ranks tetrachoric correlation: if both dichotomous variables are artificial

Which type of validity has been referred to as "the mother of all validities", or "the big daddy" and why?

mother: construct validity. it measures what it should. you want your test to measure the construct

multiple regression

multivariate analysis that considers the relationship among combinations of three or more variables; the goal is to find the linear combination of the three variables that provides the best prediction

Name and define the three subtypes of criterion related validity. Be able to give examples.

predictive, concurrent, postdictive

covariance

relationships between variables (How much both variables change together)

What example was given in class regarding reliability

rubber yardstick

Define representative sample. Know when and why representative samples are collected.

sample comprised of individuals similar to those for whom the test is to be used When: when used for the general population, a rep. Sample must reflect all segments of the population Why: it can be used as a standardized sample and be representative of an entire population which raised the validity

predictive validity

sat and freshman GPA, the test is trying to predict something in the future

simple linear regression

seeks to find the linear explanation for the relationship between 2 variables

convergent evidence of validity

shows how similar your test is to other test that are measuring the same thing

discriminant evidence of validity

shows that your test is different from other tests of different constructs

variance

standard deviation squared

What is the principle of least squares? How does it relate to the regression line?

statistical procedure to find the best fit for a set of data points by minimizing the sum of the offsets or residuals of points from the plotted curve

negative skew

tail points to the left (towards - end)

positive skew

tail points towards to the right (towards + end)

What contributes to measurement error?

test construction, test administration, test scoring and interpretation

What prerequisites exist for validity?

test needs to be RELIABLE. you can't have validity without reliability (you can have reliability without validity)

Test reliability is usually estimated in one of what three ways?

test retest method, parallel forms method, internal consistency

intelligence testing

testing a person's general potential to solve problems

aptitude testing

testing potential for learning or acquiring a specific skill

achievement testing

testing previous learning

incremental validity

tests add something to new science

skrinkage

the amount of decrease observed when a regression equation is created for one population and applied to another

mean

the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores. measure of central tendeny

What is Construct-related Validity?

the degree to which a test measures what it claims, or purports, to be measuring

residual

the difference between the predicted and the observed values

Define Face Validity. How does it differ from other aspects of validity?

the mere appearance of that measure has validity Not technically validity but still important. When its obvious what you are measuring.

percentile rank

the percentage of scores below a specific score in a distribution of scores. equation is (BN/)X100. B is # of cases below individual score and N is total # of scores -Ex. Runner finishes 62nd out of 63. 1/63 = .016 = 1.6

What is the meaning of a squared validity coefficient?

the percentage of variation in the criterion that we can expect to know in advance bc of our knowledge of the test scores. ex: from the previous example we will know .40 squared, or 16% of the variation in college performance bc of info we have from the SAT test. the remaining 84% variance of why they preform differently is still unexplained

standard deviation

the positive square root of the variance

psychological testing

the process of measuring psychology-related variables by obtaining information

What is the validity coefficient?

the relationship between a test and a criterion is usually expressed as a correlation called validity coefficient. ex: the SAT has a validity coefficient of .40 for predicting GPA at a particular university. bc the coefficient is significant we can say that it tells us more about how well people will do in college than we would know by chance. (r)

How do construct underrepresentation and construct-irrelevant variance relate to content validity?

the score that you get on a test should represent your comprehension of the content you are expected to know. construct underrepresentation is the failure to capture important components of a construct. construct irrelevant variance occurs when scores are influences by factors irrelevant to a construct.

What is the standard error of estimate? What is its relationship to the residuals?

the standard deviation of the residuals

Define and be able to apply the broad definition of validity.

the usefulness and meaning of the results. can be defined as an agreement between a test score or measure and the quality it is believed to measure. is can also be defined as the answer to a question. does the test measure what is it supposed to measure?

Parallel forms method

this compares two equivalent forms of a test that measures the same attribute Advantages: Reduces memory bias, One of the most rigorous assessment of reliability Disadvantages: Hard to construct

Understand the major components of inter-rater reliability.

three different ways to do this: most common method is to record the percentage of times that two or more observers agree. Kappa statistic is the best method for assessing the level of agreement among several observers

What is the purpose of factor and item analysis?

to see if a certain item is bringing the reliability down. see how many factors there are in the test

observed score

true score plus error

what are test batteries?

two or more tests used in conjunction

point biserial correlation

used when the dichotomous variable is true, meaning that the variable naturally forms two categories

What does the standard error of measurement do?

uses standard deviation of errors as the basic measure of error. allows us to estimate the degree to which a test provides inaccurate readings. the larger the standard error of measurement, the less certain we can be about the accuracy with which an attribute measured

phi coefficient

when both variables are dichotomous and at least one of the dichotomies is true


Kaugnay na mga set ng pag-aaral

IB Psychology Madsen (Kin Selection Theory)

View Set

KINS 3115E Test 3/ Final (CH. 8 - 12)

View Set

412 community final exam quizzes and EAQs

View Set

Chapter 2: Managerial Accounting

View Set