IO Psychology Test 2

Ace your homework & exams now with Quizwiz!

What is an example of a hypothesis you could test with an independent samples t-test?

We use t-test hypothesis when we are comparing means of TWO different groups and thus will tell us if the difference between the mean of two groups is greater than chance -Difference between the means will be divided by the standard error. - means divided by measure of variability-standard error. If the SE is bigger, the t will be smaller. e.g. in our study #2, comparing intervention group vs control group on dependent variable e.g. assigning a group to no study condition, and another group to 24 hour study conditions, and comparing those two -independent samples -not same people in these two groups. -drawed from same population

a test can be _____ but not _______ but a test cannot be ______ unless it is ______

reliable but not valid cannot be valid unless it is reliable

What is the null hypothesis?

still most common -talks about no effects - says that the intervention will not have higher than control group

What does a p value actually tell us

If the populations really have the same mean overall (really no difference aka null is true), what is the probability that random sampling would lead to a difference between sample means as larger (or larger) than you observed? if the population means were not different all we can say is as long as that is less than 0.05 we are good with that and still find support for our studies. e.g. p-value of 0.03, we have a 3% chance of observing a difference as large, or larger than we did even if the population means are identical What it does not tell you - a p value of .03 does not mean that you have a 3% chance the difference is due to chance, and therefore a 97% chance that the difference you observed is real

What would be the problem with describing a bimodal distribution, with the modes in the extremes, with only a mean?

In Lisa's example of college likeness, half the class 12 gave a 1 for strongly disagree (not liking college) and the other 12 gave a 5 for strongly agree(loving college), the mean would then be mean = 3 neutral But would this mean actually describe the group/s? Scale would not describe the actual attitude of the students, we need to look at variability, frequency distribution to really get what is going on with the data and not only the mean

What is the difference between statistical and practical significance

Statistical significance means that differences in group means are not likely dt sampling thus null is reject - doesn't tell you how much it matters to the world - tells you only if we would see something this big if there was no difference Practical Significance- asks larger questions about significance, "are differences between samples big enough to have a real meaning to the world"?

When you group items together, what happens? How many items are needed?

When we get items together you would get people out of the habit of answering the same thing, 4, 4, 4, and reading the items. forces to read. Gets people confused when you starts switching around. Better to have more items- twice as many as you think you need.

Think carefully about instructions

Its important to think about the presentation of items- was it something you filled with a pen or pencil as well as ease of reading the items. Make sure there are no typos- avoid people making errors

Be able to describe the process (the steps involved) to creating a new scale in detail, from construction through validation. (Note validity covered in next section, so integrative question).

Scale Construction 1.identify a construct and develop definition (e.g. intelligence) 2. Generate a set of items related to construct: but while avoiding early influence problem, sensitive items in middle, leading, double barreled etc. -closed ended items open ended items, use of midpoint 3. Assess content validity-make sure measurement items assess construct in mind -item generation-When a Likert-type response scale is used, the points on the scale should reflect the entire measurement continuum. Responses should be presented in an ordinal manner, i.e., in an ascending order without any overlap, and each point on the response scale should be meaningful and interpreted the same way by each participant to ensure data quality 4. Pilot on similar people who will be part of sample Utilization 5. Pilot items on target population of interest- have them give a lot of feedback-5 to ten times the number of items. 6. Assessing item characteristics: means, SD's, inter-item correlations -Each item is listed out just within one scale - want to see if correlating properly - want to see if each item correlates with total items Item total correlations-examine the extent to which scores on one item are related to scores on all other items in a scale 7. Play around by: reducing items, recalculations e.g. getting rid of items that are constant, since we want variability, revise scale and We can analyze to figure out what t items to drop to raise alpha, -actually removing an item will improve internal consistency. We could make sure that no item is repeated and that people are actually reading the items. Test retest- for reliability-estimate strength of relationship between scale items over two to 3 points in time then it goes into validation process -look at what scale would look like if item taken out Odd patterns- one item that we thought correlated positively but actually correlated negatively. Before testing hypothesis look at scale to determine if some items should be left out before we go on to test hypothesis. Administer to pilot participants -- 5 to 10 times items the number of items Validation 8. Test validity through: -predictive-determine if scores predict future outcomes, first predictive then criterion -concurrent-how test correlates w/other previous tests -I would correlate construct that I know may be related to it -give it to experts to make sure items are not contaminated -Do a multitrait multimethod matrix allows us to look at reliability, systematic measurement bias, plus convergent(which two measures of constructs that theoretically should be related, are in fact related) and discriminant validity. do not "and then i would validate the scale"

What do we use for effect size

Cohen's D- reported with p values. 0.2 small, 0.5 medium, and 0.8 large.

What is finkelsteins favorite article

Schwarz: the questions shape the answers

How might the order of items impact how people respond to the scale?

- if more personal threatening items are placed first, respondent might not answer honestly bc rapport(close relationship) has not been built. people might have a discomfort in answering a question. Ease people into the items first with more bland and general. Get them to agree with you on something and ask them that is in line. people do not want to be hypocritical -early item influence problem if a question asks "should freedom of speech be guaranteed in the US?" the respondent would most likely respond yes and subsequent questions would be look at through a liberal perspective -grouping similar items or not? - if all similar items are not together- there might be a bit more variance but you are risking more error.

Features of Semantic Differential Scale?

- stem of the item is the attitude object - responses made on scales compromised of evaluative bipolar adjectives - 7-pt scale where only ends are labeled with adjectives -easier to develop, but also still have to think about appropriate adjectives - only appropriate for certain kinds of constructs

How do we determine split-half reliability, and why isn't it done much anymore?

- we take one sample - then take one measure - split the measure in half, e.g. take 1st half of items, then 2nd half of item then... - correlate. Take results from one half of test and the other half of test - 1.There might be something a little different to the beginning of the measure. 2. fatigue issues going on. We often do correlating scores with all odds and then the even items. depending how you break down the measure - might get difference in correlation among items. Not done anymore because Coefficient alpha also looks at internal consistency, but much more efficiently (with the use of computers) - rather than just choosing one way to split it in half, it averages out all possible ways to split items in half.

What's a Likert Scale?

-contains "strongly agree" to "strongly disagree" -some argument for 7pt with "slightly" and "moderately" as anchors -classic usually 5 point -summative scale -Adding something up. don't just have one item because it could attract a lot of error. - want to cover everything in our construct - often averaged to make mean more interpretable add them up and divide by # of items -- puts it on the metric scale. doesn't change mathematical properties

What are the goals of scaled development? What else do we have to observe about scale development?

1. A scale that reliably taps into only the construct that we intend to measure. - Reliability= which is consistency -Validity= how well an instrument represents a measure. 2. difference among people on the scale will indicate differences among people on the construct. 3. Scale should be as easy and quick and read for participants and and interpreter's as possible without sacrificing those firs two goals. We also need to think about practical concerns.

What are the two interpretations of the standard error?

1. Average distance between a sample statistic and population parameter 2. How much distance is reasonable to expect between two sample means if the null hypothesis is true

Why should Likert items be worded moderately favorable or unfavorable rather than extremely favorable or unfavorable?

1. Because since respondents are indicating their level of agreement or disagreement, it is unnecessary to generate items of varying favorability -thus the response format itself provides an indication of extremity 2. Using extreme items would also be a waste of effort bc later scaling operations would indicate those items have to be discarded

What are some reasons that wise interventions sometimes do not work?

1. Lack of precision in understanding the underlying process, so you have to do a good amt of background research to understand it very well 2. Lack of precision in translating the process to an intervention- research done and you understand it, but something is off and your not actually targeting the thing e.g. whether"s and why's 3. Context dependent-if something depends on a contextual factors (e.g. things in our environment) it will only work in certain conditions but not others thus is a MODERATOR (it depends) 4. Often improved with active engagement in intervention 5. Timing- times of day, year where people are paying attention and more open to things e.g. company going through change etc

What are the different indicators of Variability and how do they differ from on another?

1. Frequency - uses bar graph, allows us to eyeball range restrictions and skew, and whether our items might be problematic (useful in early stages of scale dvlp - used more for categorical data Can you provide mean for categorical variable- no except for age. 2. Range- highest minus lowest score - helpful to see how much of scale is covered but has limitations 1. does not include whole distribution, only endpoints, 2. capitalizes on extreme measures -importance to have variance on these variables. 3. Variance- average squared distance of scores for the mean of scores, "on average how far away are each of the scores, aka average amount they differ -why do we square it-so that it turns positive doesn't cancel back to zero. 4. Standard deviation- average distance of each score from the mean of scores. Are they really spread out? -related to variance as square root -more commonly reported than variance -more commonly reported than variance more interpretable- we report SD at descriptive level.

What are the different indicators of central tendency and how do they differ from each other?

1. Mean (average)-used most often, and used as component in tests e.g. t-tests for inferential stats 2. Medium (middle) 3. Mode (most) Do these things prior to testing our inferential hypothesis -they are going to become part of inferential equations as well as the means. But med and mode are also used depending on what your looking for in need of e.g. teacher often looks at mode to see which test Q was more popular in comparison to true answer to see if maybe most popular Q may have some component of it that may be true

Describe and give examples of three different threats to measurement validity

1. Mood- respondents mood effect responses e.g. telephone survey asking people their quality of life, some where in sunny place, others in rainy place, thus quality of life may be influenced by temporary states such as weather 2. Social Desirability- people responding not to how they truly feel, but by social norms or what they may think others think is the best response 3. Language difficulty- language used within measures is different and has different meanings to respondents 4. Acquiescence- the tendency to agree with positively worded statements

What are some challenges with estimating test-retest reliability

1. Practicality- hard to get them once, but harder to get them twice 2. Memory- if too close in time, they may still remember the questions etc. Might not remember what they exactly said before in the first measure. They are just trying to be consistent without paying attention to what they are feeling at the moment. remember exact responses 3. History- something that could happen in between measure first time and 2nd time that systematically changes peoples answers e.g. event occurred in politics, or is effecting everyone. Changes people over time and does not have random error in it which is what we want with reliability.

What are the steps to scale development? When does Validation end?

1. Scale construction - idea of construct and turning into a scale. 2. Scale utilization - how our initial scale is working out - figure out how its working- pretest look at reliability evidence #. validation of scale further study before we actually put the scale into use to test research questions - works towards gathering validation. Technically validation never ends, in reality you reach "good enough" = we have enough validation evidence to go forward. As we collect data we might change our minds to what measures to use.

Distinction between reliability and validity

If finkestein operationalize your height in regards to exam. She defines your knowledge as our height its reliable(consistent) but not validity.

Identify reasons why items are worded poorly

1. avoid jargon or acronyms 2. avoid always, never 3. avoid double barreled items e.g. the xfinity guy was fast and polite? 4. don't use leading or loaded items e.g. have you stopped beating your wife? R1-yes, indicates he was beating wife but not anymore or R2 no indicates he still beats his wife- don't imply something 5. do not use double negatives-don't confuse as to agreeing or disagreeing because you said yes or no too many times. 6. Keep it simple and good reading level.

er vs es

If large n-sample size sum of random error= 0

What is item discrimination? What is an example? More information on Item discrimination?

Appropriately able to tell people apart on a particular construct that we are interested in\ = our scale is sensitive enough to tell people apart. When low ability individuals (i.e., Ron and Malfoy) score high on a given item, and high ability individuals do not, something is clearly wrong with the item. Most likely, it is scored incorrectly, perhaps even miskeyed, or scored with the wrong answer keyed as correct. If the item is correctly keyed and scored, we would expect there to be a positive relationship between performance on the item and level on the construct. Being high on the construct should correspond to high in item performance, and being low on the construct should correspond to low item performance. This relationship between item response and level on the construct is referred to as item discrimination. Good items should discriminate between individuals of high and low ability, so that, knowing an individual's item score should tell us something about where they are on the construct.

Why do we need to take the square root of the variance to get the standard deviation, rather than just looking at the average deviation?

Because if you only take the average distance, it'll be 0 for anything, but squaring them creates either a + or -

How do low or high frequency response alternatives create a frame of reference for respondents? Open vs closed ended items?

Depends on order in which you ask about it When you ask info about marriage first- calling to mind info about your marriage to make a decision about how satisfied you are with those when asked about life, some things about marriage can get assimilated into how you are feeling about your life- so you respond more consistently- marriage first, life second- r=.67 If they assume when asked about life first they thought about life first- thought about all things not just marriage when asked about marriage specifically, there was a bit more distinction.- life first, marriage second-32 When both are together it is no longer significant- people separated them in their heads- answered separately not being correlated-.18 when asked what should be the most important for students to prepare them for life when closed ended- so much easier to analyze- provided with big list-"to think for themselves" when they saw it- 61.5% most popular. V.S. open ended only 4.6% came up with that answer- might get anxious if understanding the item right.

What is the difference between descriptive and inferential statistics?

Descriptive- asks the question of "what do the data look like" aka describes the data and who was there? -descriptions of variables in study-eg. central tendency, variability, skew, internal consistency reliability Inferential 1. Did the variables measured in sample likely represent population characteristics? - accounts for sampling error 2. Decisions about how likely it is that differences are due to chance - is the essence of what we are doing with statistical tests testing hypothesis does it generalize o the population-accounts for sampling error BUT generally is descriptive 1st Inferential 2nd

will be going to stats jail if you dichotomize a continuous variable? What gets you a ticket?

Do not dichotomize continuous variables+ continuous variables ranges from low to high. for example when you say consciousness people are more blank than non conscioiusoness people - cannot do that because consciouness is on a scale. You can be anywhere in between- so you cannot dichotomize continuous What gets you a ticket, if you tell finkel only what's happening on one end of the scale. and you leave off the continuous nature of the relationship

What are the advantages of open items?

Do not force respondents to choose among a perhaps overly limited set of response items requires lots of coding but you get the answers right off the top of respondents heads Tap respondents feelings with greater fidelity How question shapes response.. "Dr. Finkelstein eats pizza frequently.. how often does she eat pizza?" Make respondents determine what is of most important to researcher and discard things researcher is obviously aware of (took a shower/took a survey today)

Why should we ever care about face validity?

Face Validity- whether the measurement procedure you use in a study appears to be a valid measure of a given variable or construct If measure is not direct, or doesn't have face validity, your asking questions that are getting at something different, you participants could sense that and therefore mistrust you Thus designing questions important in impression management-measures could get contaminated.

What do extreme high scores (or low scores) do to the mean and variance

It pulls mean and distribution tail toward them - e.g. 1 extremely high score, increased mean and postive skew(tail toward right), and increase variance - e.g. 2 extremely low score, decreased mean and negative skew (tail to left), but also INCREASED variance Thus extreme scores whether high or low, INCREASE variance, while mean will always follow tail

Why is the standard error important in inferential statistics? Does it appear in the numerator or denominator of the t-test equation?

It's important because it helps us estimate the amount of error we have due to the fact that we are basing our hypothesis on a sample rather than the whole population - "how accurately does the statistic represent the parameter?" Determine how comfortable we are with difference between these means is not just chance. It's at the bottom/denominator of a t-test equation

What is important when looking ahead to the goals of reliability and validity

Its important to think about the construct, the research goals, the potential sample, practical constraint's What the construct actually means Have to understand the construct. How important is the research, will it effect peoples lives. Make sure validation evidence is strong - theory about certain people vs others - able to test measure on a particular sample - will you have variance on a particular sample - do you only have access to participants that are unlikely to range across the scale ---- not going to help you want to have variance on your construct in your sample. Practical constraints include time, money to do the research,- access to participants that you need.

Difference between Likert and Semantic Differential scales?

Likert asks person to agree or disagree on an item while the semantic differential scale offers two evaluative polar opposites and directly asks the construct in question eg. Likert: is above picture beautiful a.agree b. neutral c. disagree Semantic: is above picture a. beautiful......ugly Semantic -numbers ranging from bad to good you'd only label endpoints-poles More variance across with different adjectives and how people would feel on the scale. No neutral point- not labeled and its usually 7 points. Easier to develop but still have to think of appropriate adjectives. This scale is only appropriate for certain kinds of constructs. and or research questions. Gut level of reactions is obtained such as emotions.

what is validity? What is the common thread of making correct inferences?

Measuring what you intend to measure. systematic error- we can get stuff in there that we do not need in there something in there that does not belong to everybody -could be contaminated or deficient. Inference we want to make make correct assumptions of something. Can infer something about a construct. Can I infer about a persons standing on a construct based on a score of measure. Validation process never really ends- it is a continued search for evidence that the inferences that we are making are likely to be accurate

What are some advantages and disadvantages of Midpoints?

Midpoints are neither agree or disagree point among Likert scales aka NEUTRAL POINT Advantages - it allows people to legitimately express themselves, sometimes neither of the options are representative of peoples feelings - people prefer to neutral/no response option which maintains interest in the respondents -disturbed people may get upset if no midpoint, where they think you are manipulating them Disadvantages -easy way out - the midpoint is difficult to infer-can have different meanings such as neutral can mean undecided or indifferent, but can be put into same category even if they connote different things -neutral category may be used more even when correct response option is "don't know" -reluctance of true response

When is the statement "this is a valid test" actually accurate?

Never! The statement should read "this measure has demonstrated adequate evidence of validity to date"

Should we take the mean of a categorical/nominal variable; why or why not? e.g. ethnicity

No b/c we can't calculate the mean on a categorical data and it wouldn't make sense to say the mean of ethnicity or gender was 3.45, % and frequency would be suitable, but not mean

What is a true score?

Observed score= True Score + Random Error True score= Observed score - random error True score- part of reliability(represents it), part of an observed score that recurs in the absence of error - Observed score minus Error - is hypothetical, never an actual true score. -true score represents replicability. We never know the true score, we never know the error score. - how to figure out- estimate how much random error is going- left over will be the true score. True score not necessarily a valid representation of the construct: doesn't mean we are getting at the thing we think we are getting. its a mathematical concept does not mean you actually capture the construct.

Does a p-value give you information about statistical or practical significance? Does an effect size give you information about statistical or practical significance? and why

P-value= statistical significance Effect size tells us how meaningful, big= practical significance Effect size not same thing as statistical significance. You can have a very small p value indicating statistical significant but also have a small effect that may not be of as much practical significance. Because you can have a really small p-value indicating something is significantly significance, but actual effect itself may not be practically significant.

Conceptual definition of Correlation and what are the two pieces of information a correlation gives us?

Pearson's allows us to estimate the relationship between two variables, or to what degree do two variables co-vary in a systematic way Two pieces of info it tells us 1. Strength 2. Direction

What is the difference between random and systematic error?

Random Error (unbiased error) and reliability- due to chance -examples: misreading something, guessing something correctly on test, having bad day , example of Lisa where weather and getting rain on test affected the way people are reacted to test that have nothing to do with the actual knowledge of it that may be interfering in some kind of a random way - can be reduced such as controlling environment, but some error is going to be inevitable -It degrades relationship between observed and true score Bigger error on the denominator will result in a smaller numerator- blocking the ability for us to see real differences in our t test- more variance causing t's to be smaller. IF the error in denominator is smaller we have more power to detect differences, and thus deem it statistically reliable. . A unreliable measure requires a greater difference between groups to conclude that their measure is reliable. It will not change the treatment group control group cause they add up to zero. But it does change the variability does not change the averages. Systematic Error (biased error) and Validity- consistently and artificially inflating or deflating scores within a given group of participants -sometimes intentional, typically not, but impacts whole group- affects everybody - something was getting in there that was affecting the way people were responding- the gunk- come to a different conclusion than reality- treatment group higher than control group- different than reality because there was systematic bias stuff. -causes problems with our conclusions- keep people less variable within a group Reliability and validity problems in our measures can mess with us in our tests of hypothesis.

How can random and systematic error each impact statistical tests (hint: see Table 3.1)?

Random(due to chance) error does not change differences between the group, rather it increases variability (more spreading out) within the groups, and therefore has us dividing by bigger number to produce a smaller number, and if stat test is smaller, then less likely to get statistically significance - thus if we have a test that is lower in reliability bc more random error, then we are less likely to find a difference between groups, when difference is ACTUALLY there. Systematic error decreases variability, dividing by smaller number, making the difference look larger - consistently and artificially inflating or deflating scores within a given group of participants. - decreases variability.

Does a recursive process imply a chain of mediation or moderation and why?

Recursive process does not directly trigger the end events, rather triggers a MEDIATING process because it tells us why we can get from here to here because this thing in the middle is occurring and this occurs over and over again, hence spiraling e.g. intervention in marriage couples of perspective taking, after doing intervention couples may have taken perspective leading to them being less angry at spouse, this spirals into being better prepared to solved problems and for future conflicts, thus the intervention DID NOT reduce conflict, but did reduce distress in couple conflicts and the result MEDIATED (process variables) the stabilization in marital quality.

What are the advantages of closed ended items?

Reduce likelihood that respondents would endorse items not on a list. when parents answer a question of what is the most important thing for children to prepare for life Those that chose closed ended were provided t=with a big list with a 61.5% most popular in answer. When presented with open ended- they are given a blank space to write an answer, 4.6%

What is the difference between reliability and validity?

Reliability- consistency with which a measure assesses a given concept - absence of random error Validity- degree of relationship between an instrument (operationaliztion) and construct it is intended to measure - absence of systematic error Look at reliability first- but are separate things - if a measure is not reliable it is not have reliability evidence- then impossible to have validity consistently vs accuracy

How does the "criterion problem" affect criterion-related validity?

Since people focus so much on the predictor such as GRE scores, and then to validate these we chose an outcome measure e.g future job performance, correlations would tend to be weak and our conclusion would be that we don't have validation evidence for the new measures when in reality we chose a bad outcome measure(criterion) of future job performance contaminated example- work engagement might also include items on job satisfaction. along with engagement items. typically assessed by subject matter experts

What does coefficient alpha tell us?

Tell us the average correlation between all possible pairs of split half ie. all possible combinations In general the more items, the higher the alpha( to a point) Always looking forward the practical issues of collecting data. Coming up with enough items getting it down to the internal consistency of our measure-focusing, we don't have repetition of the same exact kinds, of items unless we are trying to catch people to make sure they are answering the same. We can analyze to figure out what t items to drop to raise alpha, -actually removing an item will improve internal consistency.

Inferential tests are used to?

Test hypothesis Inferring representativeness to a population. -relationship between variables and not just due to chance.

What is criterion validity?

The degree of relationship between the predictor (test)-measure and a outcome/criterion (level of performance test is trying to predict -predictive-predictor measure first, criterion measure later -concurrent-collect them at the same time. While observing people doing behavior ask them to fill out measure. Whole purpose is to gather validation evidence for that measure based on a theoretical reasoning. we can test this prediction to gather criterion related validity. Concurrent gives larger N sizes but easier. But has range restriction Likelihood that behavior of interest can be determined by the measure.

How do item anchors(strongly agree/disagree, good bad, happy sad etc) create a frame of reference for respondents?

The length of the reference period can affect question interpretation such that low frequency categories such as "how often are you irritated" with a scale of "less than once a month" etc are interpreted as the researcher interested in knowing not the minor often occurrences, but rather the major occurrences and that the higher frequency occurrences referred to minor annoyances e.g. 2: in patients in a hospital, pain 2 times a week was interpreted as severe in the high frequency

What should happen to random error in a large sample?

The sum of random error=0 - aka if we had all population in world, largest N, random error is going to sum to 0 Systematic error will not do that. its consistent, sum not equal zero.

Why should Likert items be worded as moderately favorable or unfavorable rather than extremely favorable or unfavorable?

They should not be worded in the extreme because there could be a lot of error on opposite end of scale. do not know what people are responding to = are they responding against extremity or the actual item. There could also be a lot of vagueness cause you don't know what people are referring at with frequencies.

Modern thinking vs Tridtional thinking about measurement validity?

Traditional thinking Content, criterion-related, construct. Different types of validity traditionally in certain situations. Certain conditions under which you needed to choose those different types of validity. Modern thinking - these are all really part of construct validity, which is really hypothesis testing -talk about some things under the umbrella of construct validity-make inferences about our construct. -develop hypothesis that may utilize those items from traditional thinking. IF this measure really taps my construct it should do things that my construct should do. Modern takes in evidence from different types of styles studies that can gather difference evidence so we can give an inference that we want to make. The more approaches we use the better-the more evidence you gather the more you can add in.

What is an example of a hypothesis you could test with a correlation

Using Pearson correlations- looking for linear relationships between two continuous variables. Is something related to that? NO GROUPS -measures IV and DV on everybody. e.g. if resilience goes up ( a positive correlation), then WE goes up e.g. the more you study the better you do on a test

What is the problem with frequency scales that use terms like "often" or "sometimes"? What does it mean when there's problems with frequency?

Using often and sometimes mean very different things to people e.g. Pizza example, "how often do you eat pizza" often may mean every single day for some, while only once a week for others (like me) using occasionally or frequently can provide vagueness. There could be different points in there. frequent to her might mean frequently different to someone else. Frames of reference as to what we think is something is occasionally. We may not be ordering people on the measure as on the construct. TV consumptions based on response alternatives (Random assignment)times frames quantifiable- low frequency alternatives group- high frequency alternatives- interested in the 2 and half and 4 and half time frames

Be able to identify an example of a heterotrait-monomethod correlation, a heterotrait-heteromethod correlation, and a monotrait-heteromethod correlation, and know which we would like higher than others and why. Multi trait multi method (matrix)- allows us to look at reliability, systematic measurement bias plus convergent and discriminant validity?

conditions for example: t1=self esteem (SE), t2=sociability, t3=intelligence. method A-self report method b-peer checklist method c-behavior observation Heterotrait-monomethod correlation-reflect correlation between different traits that are assessed by identical measurement data -e.g. t1=SE, t2-sociability t3 intelligence method A- self report; in this example correlating SE with S, and SE with I, and S with I, when all traits measured by self report -higher than heterot-heterom because the relationship between different measures of the same trait should be stronger than different measures associating different traits Heterotrait-heteromethod-relationship between different traits assessed by different measurement data -e.g. correlation between sociability as measured by peer checklist and SE measured by self report Monotrail-heteromethod correlation-association between identical traits assessed by different measurement data - since only convergent validity, is higher than the other two because they are meant to express the association between different measures of the same trait

What does the term "discrimination" mean in the context of measurement scales? Do we want it or not want it?

constructs are ideas we have created but can test operalization or measures. - We want to know if people end up scoring differently on a measure then that really indicates differences on the construct. It means that differences among people on the scale, will indicate differences among people on the construct. Thus scales and constructs are highly sensitive/discriminate to people regardless of the construct or scale - in this case, it's something positive.

What is test-retest reliability(temporal measure)?

the consistency of measures when the same test is administered to the same person over time, we look to see if measure correlate/ are consistent over time. Giving the measure to the sample to the same sample again at a particular time Problems- looking at whether people are consistent over time. - see if people are stable over time. - hard to get to do it once, two do it twice even harder.

IO Psychology Test 2

Related study sets

BITM 330 chapter 4

Chapter 11 --according to WS

Chapter 25 Section 3

ART- CH. 11

AP Psych - Unit 2

CIT452 Chapter 1-3

HY 120 chapter HW

Research Methods Spring - Midterm Practice Questions

Comp & Benefits Final Exam

Final Pedi

MANG 4469 EXAM 2

221 Exam 2

Lesson 2 Introduction to Programming for Games

Art Appreciation Chapters 1-5

stats shi!

Insurance Planning and Risk Management - Viatical Settlements

chapter 10 quiz

Psych EAQ's

scrum2017

ECON EC Sample Final