Psych 303 Final

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

confounds

alternative explanations, threats to internal validity. a design confound (often just called confound) refers to a second variable that happens to vary systematically along with the intended independent variable and therefore is an alternative explanation for the results.

Outliers

an extreme score-- a single case (or sometimes a few) that stands out far away from the pack. Depending on where it sits in relation to the rest of the sample, a single outlier can have a strong effect on the correlation coefficient, r. Questions statistical validity. Can be problematic for an association claim, because ben though they are only one or two data points, they may exert disproportionate influence. Can impact direction or strength of correlation. In bivariate correlations, outliers are mainly problematic when they involve extreme scores on both of the variables. In evaluating the correlation between height and weight ,for ex, a person who is both extremely tall and extremely heavy would make the r appear stronger. Best way to look at them is scatterplots and see if one or a few data points stand out. Matter the most when a sample is small.

control variable

any variable that an experimenter holds constant on purpose. keep levels the same for all participants. internal validity .

Attrition threat in quasi experiments

applies when people drop out of a study for some systematic reason. occurs mainly in designs with pretests and posttest.

control groups

level of an independent variable that is intended to represent no treatment or a neutral condition.

Maturation effects in quasi experiments

occur when, in an experimental or quasi experimental design that has a pretest and posttest, a treatment group shows an improvement over time, but it is not clear whether the improvement was caused by the treatment or whether the group would have improved spontaneously even without a treatment.

cross-sectional method

taking info from population at one point in time. look at multiple groups of people (different cohorts/ ages) at one point in time

Repeated Measure factorial design

within-groups (same people participate at each level). both iv are manipulated as within groups. Therefore, if the design is 2 x 2 there is only one group of participants, but they participate in all flu combinations, or cells, of the design. researcher would counterbalance the order of presentation to help protect against order effects and ensure internal validity

order (carry over effects)

within-groups designs- threat to internal validity. order effect is a confound-participants' performance at later levels of the independent variable might be caused not by the experimental manipulation but rather by the sequence in which the conditions were experiences. Such a sequence might lead to practice, fatigue, boredom or some other contamination that carries over from one condition to the next. Counterbalancing used to avoid these effects. learning (practice), fatigue, contrast.

nominal

Nominal scales When measuring using a nominal scale, one simply names or categorizes responses. Gender, handedness, favorite color, and religion are examples of variables measured on a nominal scale. The essential point about nominal scales is that they do not imply any ordering among the responses. For example, when classifying people according to their favorite color, there is no sense in which green is placed "ahead of" blue. Responses are merely categorized. Nominal scales embody the lowest level of measurement. Chi Square test

Type 1 error

Reject the null when the null is true. (most feared mistake) alpha is our threshold for Type 1 error (its the type 1 error rate). False positive. By setting alpha lower (1%) researches are trying to minimize their chance of making a Type 1 error. preventing type 1 error only involves one factor= alpha level.

Content Analysis

Systematic way of describing the content of textual material or audio-visual media. Similar to other observational studies - needs clear coding scheme and operationalizations, need exhaustive categories, multiple raters would be nice. Such studies, like most observational studies do not have very good internal validity. They are more descriptive than explanatory.

Literature reviews

a qualitative assessment of a theory's validity based on previous research. Generally narrative in style brief versions can be seen in papers' introduction. come to narrative conclusions both validity of a theory.

Meta-analysis

a quantitative assessment of a theory's validity based on previous research. mathematically averaging the results of all the studies that have tested the same variables to see what conclusions the whole body of evidence supports. generally results are combined to create an average effect size. when two group means are compared, the effect size d evaluates how far apart the two group means are in standard deviation units- or how much the two group means over lap. In contrast, the effect size r is most appropriate when a study evaluates the association between two numeric values. Meta analysis average all the effect sizes to find an overall effect size- the type of effect size used (r or d) depends on the research questions - whether it is a correlation or a group difference.

phi coefficient

bivariate correlations, descriptive stats. used to evaluate the association between two categorical variables.

instrumentation threats in quasi experiments

can threaten internal validity when participants are tested or observed twice. a measuring instrument could change over repeated us uses. If a study uses two versions of a test with different standard (one test more difficult) or if a study uses coders who change their standards over time, then participants might appear to change, when in reality there is no change between one observation and the next. comparison group almost always helps rule out a testing threat to validity.

measures of central tendency

central tendency: a measure of what individual scores tend to center on. mode, median, mean Mode: value of the most common score- the score received by more members of the group than any other. Can look at frequency histogram or stem plot. Some distribution have more than one mod (bimodal or multimodal). Median: value at the middlemost score of a distribution of scores--the score that divides a frequency distribution into halves. the media is a typical score in the sense that if we were to guess that every student in the class received the median score, we would not be consistently off to either direction. We would guess too high and too low equally often. Stemplot is easy to find media. Mean: average, add up all the score in the batch and then divide by the number of score E(x/n). x stands for student's score Which to use? mean is usually most appropriate measure of central tendency, but when a set of data has outliers, the median (or sometimes mod) may provide a better description..

within-groups (within subjects) designs

concurrent-measure; repeated-measures

control for/correcting for

cue that researchers used multiple regression. perhaps recess and behavioral problems are correlated only because poorer children both are more likely to have behavior problems and are more likely to be in schools that have less time for recess. recess and behavior are correlated and recess and income are also correlated and income and behavior are also correlated. Researchers want to know whether economic disadvantage, as a third variable correlated with both recess and behavior, can account for the relationship between recess and behavior problems. To answer question, they need to see what happens to the relationship between recess and behavior when they control for income. Stat accurate way to describe control for income is to talk about proportions of variability. Researchers are asking whether, after they take the relationship between income and behavior into account, there is still a portion of variability in classroom behavior attributable to recess. Testing a third variable with multiple regression is similar to identifying subgroups. Think of controlling for process like this: we start by looking only at the highest level of income and see whether recess and behavior are still correlated, the move to next highest level, the next highest and so on until we have analyzed the relationship at the lowest level of income. We ask whether the bivariate relationship still holds at each level of income. (when we only look at poorer classrooms, or middle or upper, still still find the key relationship between behavior problems and recess: it is still negative even within these income subgroups, therefore the relationship is still there even when we control for income. If the relationship did go away within the individual income groups, then income would indeed be the third variable responsible for the relationship. Regression does not establish causation- they cannot establish temporal precedence and cannot control for variables they did not measure

effect size

effect size describes the strength of the association. Stat significance is related to effect size, the stronger the correlation (and the larger its effect size) the more likely the correlation will be stat sig. But you can't tell whether a particular correlation is stat sign by looking at its effect size alone, you need to look at the p values associated with it. Many researchers use r to measure effect size. when a study has found a relationship between two variables, the effect size describes how weak or strong the relationship is.

problem generalizing to other researchers

experimenter effects are often overlooked. gender race and demeanor of the researcher can influence the response of participants. This is more of an issue in some context (social psychology) than others (cognitive psych). To the degree that experimenter effects play a role. its a good idea to standardize procedures and sufficiently train experimenters, use multiple experimenters with diverse characteristics if feasible. Include experimenter as an IV in analyses

independent-groups vs within-groups designs

independent-groups: different groups of participants are placed into different levels of the independent variable. Within-groups: there is only one group of participants and each participant is presented with all levels of the independent variable.- advantages: ensures that the participants in the 2 treatment groups will be equivalent; ability to use each participant as their own control. also gives researcher more power to notice differences between conditions. disadvantages of within-groups: demand characteristics.

taking into account

means researchers conducted regression analyses

behavioral measures

measuring DV. less reactive, particularly if done on the sly. Ex) frequencies/tallies, duration, reaction time/response latency. More difficult to implement (need equipment for some measures)

physiological measure

measuring DV. less reactive, reduced measurement error, even more difficult to implement (need specialized equipment) problem of biological reductionism: biology is not always the cause. Examples) GSR reflects anxiety, EEGs show brain activity, fMRI does too wit greater precision, Eyeblink startle response can index affective responses. Implicit Association Test-unconscious beliefs.

measurement scales

nominal, ordinal, interval, ratio

test for pearson's r

r-test. how do we know whether a sample correlation coefficient reflects a population correlation? We want to test null: p = 0. Software will prove df and p-value.

Direct/exact replication

repeating a study's procedures as closely as possible, to see whether the original effect shows up in the newly collected data. Not done very often. Successful replication is reduced chance of type 1 error. unsuccessful replication is ambiguous: was it type 2 error? was the original study a type 1 error? did some important component get left out? or is there an experimenter effect? or fraud?

replication plus extension

researchers replicate their original study but add variables to test additional questions. introduce a participant variable or situational variable. look for interaction.

quasi-experiments

similar to true experiments: researchers select an IV and a DV, then they study participants who are exposed to each level of the IV. But in a quasi-experiment, the experimenters do not have full experimental control. For ex) they may not be able to randomly assign participants to one level or the other. Instead, participants are assigned to the IV conditions by teachers, political regulations, acts of nature- or even by choice.

expectancy effects

single-blind studies: participants don't know which group they are assigned. experimenter bias: experimenters may unconsciously try to confirm their hypotheses (differential treatment of ps, bias in recording responses. solution= double blind- neither researchers nor participants no which group they are in)

naturalistic observation

strengths: useful in complex and novel settings. Limitations: less useful when studying well-defined hypotheses under precisely specific conditions; Must constantly reanalyze and revise hypotheses based on subjective interpretation; both internal and external validity can be low.

staged manipulations

the ones that are most interesting to people-and have created the most controversy. Milgrams. Whether staged or straightforward you want the manipulation to have impact. Strength of manipulation, experimental realism, deception and risks, pilot testing, manipulation checks.

Alternative (research) hypothesis

this is the logical alternative for the null hypothesis. You assume this to be true if you reject the null. for some (but not all) tests, this can be 2 tailed or 1 tailed. There is a difference between first-born (2-tailed).

one-group pretest-posttest design (quasi experiments)

we can compare people's scores both before and after. With the repeated measure, each person serves as their own control. confounds threaten internal validity.

field setting

when a study takes place in the real world, it has a built in advantage for external validity because it clearly applies to real-world settings. mundane realism- referring to how similar a study's manipulations and measures are to the kinds of situations participants might encounter in their everyday lives.

sensitivity of measures

you don't want error variability. Recall that we want our measured scores to be close to people's true scores. This points to the importance of having clear, judicious questions with clear response options. relates to IAT- with the IAT, response latencies are recorded in milliseconds

post hoc analyses

determine where the difference is. test to determine stat sig pairwise differences. Bonferroni, turkey, led, duncan

Null hypothesis

assume there is no effect. a null hypothesis states that the data were generated just by chance and this usually carries the meaning that just by chance, there would be no difference, no relations or no effect in our data.

types of t tests

(2 groups means are different?) Independent groups t-test is the one you use when you want to compare two independent groups of participants on a certain variable which is measured only once. For instance, you might compare the height of girls and boys. Or you can compare 2 stress reduction interventions- e.g. when one group practiced mindfulness meditation while the other learned progressive muscle relaxation Repeated measures t-test (dependent t-test) is used when the same participants are being tested on two occasions (so your dependent variable is measured twice) and you want to know whether the scores on the two occasions were different. For instance, you might want to examine the efficacy of some intervention for depression so you first measure all participants' depression levels prior to the treatment. Then you run a 6 week intervention and after it's completed your participants complete the same depression assessment they did before the intervention. A repeated measures t-test allows you to see whether those two scores are significantly different and whether your intervention worked. So in this t-test each participant's score on time 1 is compared against the same participant's score on time 2. matched-pairs t test and paired-samples t test.

multiple regression

1 DV (criterion variable) and multiple predictors(IVs). we can include more than one predictor of an outcome variable. Such analytic designs have many benefits: allow us to a richer model that predicts even more variation on an outcome variable. We can compare the relative contributions of predictors. We can test more complex effects like interactions/moderation. We can "control for" variables that might not be of interest--but that might explain variation on the outcome (i.e. third variable problem). when researchers use regression, they are testing whether some key relationship holds true even when a suspected third variable is statistically controlled for.can address questions of internal validity. Use both beta and b. a nice aspect of r is that it is standardize. for the slow, we cal also get a standardized coefficient beta -this tells us how many SDs y changes for each SD that X increases. These coefficients are very important for multiple regression.

measures of effect size

Cohen's d: when a study involves group or condition means (such as the mean amount of popcorn consumed from a large bucket compared with the mean amount of popcorn consumed from a smaller buck) we can describe the effect size in terms of how far apart the two means are, in SD units. This will tell us not only how far apart the two group means rare but also how much overlap there is between the 2 sets of scores. d = (M1-M2)/(SD pooled). pooled ST is an average of the two standard deviations for the two groups. if d= 1.45 that means the average of the large-buck group is 1.45 SDs higher than the average of the medium-bucket group. d also represents the amount of overlap between groups. The larger the effect size, the less overlap between two groups. If the effect size is zero, there is full overlap between the two groups. Usually use the effect size r to determine the strength of the relationships between two quantitative variables . We usually use the effect size d when one variable is categorical. d can be higher than 1 or lower than -1 . n2 or eta squared is another effect size measure. used when researchers describe differences among several groups (that is more than two means) or when they describe the effect sizes of interaction effects. partial eta-squared can be used in ANOVA. Partial eta-squared is a measure of variance, like r-squared. It tells us what proportion of the variance in the dependent variable is attributable to the factor in question.

experimental realism

Experimental realism is the extent to which an experiment can involve the participant and get them to behave in a way that it is meaningful to what you're doing. If you're looking at the effect of fear on performance in some task for example, your manipulation needs to actually make the participant afraid; they need to take the threat/consequences of the stimulus seriously, or you won't be getting the reaction you want. when high in experimental realism, create settings in which people experience authentic emotions, motivations and behaviors.

p-value

In statistical significance testing, the p-value is the probability of obtaining a test statistic result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.A researcher will often "reject the null hypothesis" when the p-value turns out to be less than a certain significance level, often 0.05. Such a result indicates that the observed result would be highly unlikely under the null hypothesis. Many common statistical tests, such as chi-squared tests or Student's t-test, produce test statistics which can be interpreted using p-values. A small p-value means that we can be confident that the null hypothesis is incorrect. It does not necessarily mean that we have a large effect. It could be we have a large effect but it could also mean that we had little error. A big sample size gives great sensitivity to reject the null hypothesis even for small differences. --statistical power.

posttest-only

Independent groups experimental design. in this design, participants are randomly assigned to independent variable groups and are tested on the dependent variable once. (very common and simple) meet all three rules for causation- allow researcher to test covariance, establish temporal precedence, and when conducted well, internal validity. unless you're particularly concerned with measuring change over time, pretests are not needed.

Interval scales

Interval scales are numerical scales in which intervals have the same interpretation throughout. As an example, consider the Fahrenheit scale of temperature. The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees. This is because each 10-degree interval has the same physical meaning (in terms of the kinetic energy of molecules). Interval scales are not perfect, however. In particular, they do not have a true zero point even if one of the scaled values happens to carry the name "zero." The Fahrenheit scale illustrates the issue. Zero degrees Fahrenheit does not represent the complete absence of temperature (the absence of any molecular kinetic energy). In reality, the label "zero" is applied to its temperature for quite accidental reasons connected to the history of temperature measurement. Since an interval scale has no true zero point, it does not make sense to compute ratios of temperatures. For example, there is no sense in which the ratio of 40 to 20 degrees Fahrenheit is the same as the ratio of 100 to 50 degrees; no interesting physical property is preserved across the two ratios.

qualitative vs quantitative research

Qualitative: observational measures; focuses on behavior in natural settings; small groups and limited setting; researcher describes or captures themes that emerge from the data; data are non-numerical and expressed verbally and/or images; more exploratory. Observer bias. concealment. participation Quantitative: focuses on specific behaviors that can be easily quantified; assigns numerical values to responses and measures; typically uses large samples; data are analyzed using inferential statistics; more theory-based

shape of frequency distributions

Skewness: skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined. negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right. If the distribution is symmetric then the mean is equal to the median and the distribution will have zero skewness.[2] If, in addition, the distribution is unimodal, then the mean = median = mode. This is the case of a coin toss or the series 1,2,3,4,... Note, however, that the converse is not true in general, i.e. zero skewness does not imply that the mean is equal to the median. the skewness does not determine the relationship of mean and median. bimodal/multimodal.

ordinal

The items in this scale are ordered, ranging from least to most satisfied. This is what distinguishes ordinal from nominal scales. Unlike nominal scales, ordinal scales allow comparisons of the degree to which two subjects possess the dependent variable. the difference between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels. In our satisfaction scale, for example, the difference between the responses "very dissatisfied" and "somewhat dissatisfied" is probably not equivalent to the difference between "somewhat dissatisfied" and "somewhat satisfied." Nothing in our measurement procedure allows us to determine whether the two differences reflect the same difference in psychological satisfaction.

ratio scales

The ratio scale of measurement is the most informative scale. It is an interval scale with the additional property that its zero position indicates the absence of the quantity being measured. You can think of a ratio scale as the three earlier scales rolled up in one. Like a nominal scale, it provides a name or category for each object (the numbers serve as labels). Like an ordinal scale, the objects are ordered (in terms of the ordering of the numbers). Like an interval scale, the same difference at two places on the scale has the same meaning. And in addition, the same ratio at two places on the scale also carries the same meaning. Another example of a ratio scale is the amount of money you have in your pocket right now (25 cents, 55 cents, etc.). Money is measured on a ratio scale because, in addition to having the properties of an interval scale, it has a true zero point: if you have zero money, this implies the absence of money. Since money has a true zero point, it makes sense to say that someone with 50 cents has twice as much money as someone with 25 cents

problem of multiple comparisons in ANOVA and solutions

The term "comparisons" in multiple comparisons typically refers to comparisons of two groups, such as a treatment group and a control group. "Multiple comparisons" arise when a statistical analysis encompasses a number of formal comparisons, with the presumption that attention will focus on the strongest differences among all comparisons that are made. Failure to compensate for multiple comparisons can have important real-world consequences, as illustrated by the following examples. Suppose the treatment is a new way of teaching writing to students, and the control is the standard way of teaching writing. Students in the two groups can be compared in terms of grammar, spelling, organization, content, and so on. As more attributes are compared, it becomes more likely that the treatment and control groups will appear to differ on at least one attribute by random chance alone. Suppose we consider the efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. As more symptoms are considered, it becomes more likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom. Suppose we consider the safety of a drug in terms of the occurrences of different types of side effects. As more types of side effects are considered, it becomes more likely that the new drug will appear to be less safe than existing drugs in terms of at least one side effect. In all three examples, as the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in terms of at least one attribute. Our confidence that a result will generalize to independent data should generally be weaker if it is observed as part of an analysis that involves multiple comparisons, rather than an analysis that involves only a single comparison. from the increase in type I error that occurs when statistical tests are used repeatedly. Bonferroni

participant variables and IV x PV designs

You can test designs with both manipulated and measured IVs. Participant variable: a variable whose levels are selected (measured) not manipulated- like age. Ex- rabinowitz tested the symbolic racism x policy beneficiary interaction as a predictor of people's attitudes toward contract set-asides. good for theory testing, removing variability associated with known factors; increase statistical power.

nonequivalent control group design

a quasi-experimental study that has at least one treatment group and one comparison group, but participants have not been randomly assigned to the two groups. perhaps one classroom's students vs another's in testing a new curriculum. No random assignment to condition: they are frequently pre-existing groups or ps self-select into condition. Such a design will rule out history effects, regression to the mean, but not person variables as confounds.

generalization mode

although much of research in psychology is conducted in theory-testing mode, there are times when researchers work in generalization mode in which they want to generalize the findings from the sample in their study to a larger population and therefore are careful to use probability samples with appropriate diversity . because external validity is of primary importance when researchers are in generalization mode, they might strive for a representative sample of a population. buy they might also try to enhance the ecological validity of their study in order to ensure its generalizability

Pearson's r correlation

bivariate association, descriptive stats. value is constrained betws +1 and -1. The sign communicates the direction of the relationship. The value communicates the straight of linear relationship. + or -1 is a perfect linear relationships. 0 means no relationship.

coefficient of determination (r2)

bivariate correlation, descriptive stats. tells you the proportion of variability in one variable that is explained by the other variable. indicates how well data points fit a statistical model - sometimes simply a line or curve. It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model. Regression- builds on the principles of correlation. At its simplest, it is a way to describe the line that best predicts the DV using info from the IV.

point-biserial correlation

bivariate correlations, descriptive stats. When at least one of the variables in an association claim is categorical researchers may use point-biserial correlations to describe the relationship. It is correlation coefficient similar to r, but it is especially intended for evaluating the association between one categorical variable and one quantitative variable.

Spurious correlations

bivariate correlations, descriptive stats. sometimes when you have an association between two variables the apparent overall association is spurious, meaning that the overall relationship is attributable only to systematic mean differences on subgroups within the sample. related to the notion that within subgroups the association can be different than when everyone is lumped together. Subgroups can cause potential problems. When interrogating an association claim, it is important to think about subgroups.

complete counterbalancing vs partial counter balancing

dealing with order effects (within-group designs). present the levels of the independent variable to participants in different orders. When counterbalancing is used, any order effects should cancel each other out when all the data are collected. when researchers counterbalance, they must split participants into groups; each group receives one of the condition orders (randomly). 2 methods for counterbalancing an experiment - when a within-groups experiment has only two or three levels of an IV experimenters can use full counterbalancing- there are two orders (A->B and B->A). In a repeated-measures experimented with three conditions each group of participants would be randomly assigned to one of six orders. As the number of conditions increases however, the number of possible orders needed for full counterbalancing increases dramatically so researchers might use partial counterbalancing in which only some of the possible condition orders are represented. could do this by presenting conditions in a randomized order for each subject. or could use a Latin square: formal system of partial counterbalancing that ensure that each condition appears in each position at lease once.

z score

describes whether an individual's score is above or below the mean and how far it is from the mean, in standard deviation units. computing z score. z= (X-M)/SD. any score below the mean will have a negative z score. Any score above mean will have positive score. any score that is directly at the mean will have a x score of zero. Using z scores. allows us to compare individual cases' relative standing on variables that might have been measure in different units. standardization score. IF we convert each person's scores on two or more variables to z score, we can meaningfully compare the relative standings of each person on those variables, even when the variables are measured in different units.

bivariate association

descriptive statistics. relationship between two measured variables. main types of associations: positive, negative, zero, curvilinear. quantitative: correlation coefficient r (direction and strength) and scatterplots. when both variables in an association are measured on quantitative scales, a scatter pot is usually the best way to represent data. But if one is categorical, use a bar graph- would examine the difference between the average scores to see whether there is an association.

Matched-group designs

especially is good when assigning small numbers of participants to groups. the researchers first would measure the participants on a particular variable that might matter to the dependent variable (IQ) then they match participants up set by set; that is, they would take the three participants with the highest IQ scores and then within that matched set, randomly assign one of them to each off the three groups. has the advantage of randomness. because each member of the matched set is randomly assigned, the technique prevents selection effects. but this method also ensure that the groups are equal on some important variable, such as Iq before the manipulation of the independent variable. downside-is that the matching process requires an extra step. Not too common because random assignment takes care of most issues . Advantage- you can think of these as fitting between between-groups and within-subjects designs in terms of control. with between-groups designs any inherent differences between people become error (or noise) in the data. With within-subjects designs we can estimate variability do to inherent differences and subtract it from our error term (smaller error means larger test stat) with matching, although its between groups, the similarity between matched people allow us to reduce the error term. this works only if your matching criterion is important for variability on the DV.

artificiality of studies

experimenters' emphasis on control tends to make them more artificial. However, this does not necessarily entail meaninglessness. One could argue that making your experiment mimic real life is not as important as the psychological meaning it has for participants

main effect

factorial design results. main effect is the overall effect of one iv on the dev, averaging over the levels of the other iv. that is, a main effect is a simple difference. in a factorial design with 2 IVs there are 2 main effects. It's the mean of the conditions. (marginal means- the means for each level of an Ic, averaging over levels of the other IV - if the sample size in each cell is exactly equal, marginal means are a simple average. if the sample sizes are unequal, the mms will be computed using the weighted average, counting the larger sample more) the test of main effects concern marginal means. main effects may or may not be stat sig. main effects described as overall effects. it is equivalent to a t test (when you have 2 levels) or a one-way ANOVA (when you have 3 + levels) describing main effect. when describing main effect of IV A, don't mention any aspect of IV B.

Type 2 error

fail to reject the null when the null is false. a miss. preventing type 2 error involves several factors, collectively known as power. when predicting the chance of making a type 2 error (and power), several factors to consider simultaneously- the alpha level (when research set alpha lower in a study it will be more difficult for them to reject the null hypothesis so when they decrease alpha to avoid type 1, they increase chances for type 2) sample size (a study that has a larger sample will have more power to reject the null if there really is an effect) 3) effect size (when there is a large effect size in the pop there is a greater chance of conducting a study that rejects the null. when there is a small effect size in a population, type 2 errors are more likely to occur) (sample size and effect size interact. large samples are necessary only when researchers are trying to detect a small effect size. therefore the smaller the effect size in the pop, the larger the sample needed to reject the null and therefore to avoid a type 2 error) 4) degree of unsystematic variability (when a study's design introduces more unsystematic variability into the results, researchers have less power to detect effects that are real there. measurement error, individual differences and situation noise) 5) statistical choices (some of the tests make it more difficult to find sign results and thus increase the chance of type 2 errors.use of a one-tailed vs a two-tailed test. do we want to know if a drug improves symptoms (one-tailed) or do we want to know if it improves or worsens (two-tailed). in general one, tailed test is more powerful than 2 tailed test when we have a good idea bout the direction in which the effect will occur.

Consumer reports survey of psychotherapy (Seligman paper and Brock paper)

how to find out whether psychotherapy works? efficacy study and effectiveness. Efficacy study- more popular method- control and experimental groups with a lot of detailed necessities. argue that deciding whether one treatment under highly controlled conditions is better than another treatment or control group is a different question than deciding what works in the field. Argues that the rigid control on efficacy studies do not reflect real-world conditions. Instead, that effectiveness studies about how patients fare under the actual conditions of treatment in the field can yield useful validation. Argues for a survey of large numbers of people who have gone through treatments. captures how and to whom treatment is delivered. Efficacy studies disadvantages- inertness assumption (if it hasn't been empirically validated it's assumed to be inert). they don't contain the five properties that characterize psychotherapy in the field (not of fixed duration, self-correcting, active shopping, patients with multiple problems, improvement in general functioning) fixed treatment durations. survey advantages- sampling (important that sample represents people who choose to go to treatment), naturalistic so treatment duration went as long as the patient needed, self-correction, multiple problems, general functioning survey flaws: sampling bias- low return rate, time-consuming, threat to generalizability that participants chose their treatment (they believe someone in it); no control groups; self-report; no blindness; inadequate outcome measures (poor question wording); retrospective; nonrandom assignment; therapy junkies (influencing the result that long-term therapy appears better than short term) Brook- regression toward mean from CR, minuscule percentage return

Frequency distributions

illustrates in graphical or tabular form how often each score occurs. It''s important to look at frequency distributions early on; you can catch errors. Techniques for organizing a column of data in a data matrix. It clearly shows how many of the cases scored each possible value on the variable. To make a frequency distribution, we list possible values for the variable from lowest to highest and tally how many people obtained each score. From frequency distribution, can create a frequency histogram

subject mortality (attrition)

in many studies that have a pretest and a posttest, attrition or mortality occurs when people drop out of the study before it ends. threatens internal validity when it is systematic--that is when only a certain kind of participant drops out. comparison group is not always a cure-all for attrition. if both groups experience the same pattern of dropouts then attrition is not an internal validity threat, but attrition can be a threat when only one group experiences attrition. attrition is easy for researchers to identify and correct. when participants drop out of study, most researchers will remove those participants' origin scores from the pretest.

solomon four-group design

independent groups experimental design. 2 of the four groups get pretests and the other 2 don't. Can compare means on the groups to see whether the pretest had an effect. Downside: need more people.

pretest/posttest designs

independent groups experimental design. participants are randomly assigned to at least two groups and are tested on the key dependent variable twice--once before and once after exposure to the independent variable. might use this design when hey want to evaluate whether random assignment made the groups equal. may be especially important when group sizes are on the small side, because chance is more likely to lead to lopsided groups when samples are small. allow researchers to be absolutely sure that there is no selection effect in a study - two groups equivalent at pre=testing. work well to track how participants in the experimental groups have changed over time. disadvantages: the pretest can sensitive ps to the hypothesis/subject under study. The pretest affords an opportunity for practice and learning. Adding a pretests lengthens procedure-subject mortality. we can say that the new curriculum is better than the old one but the degree of improvement over time is not clear: was it the intervention or was it the pretest itself?

samples vs. populations

inferential stats. differentiate population distributions and sampling distributions. A population distribution is the distribution of raw scores. A sampling distribution of, say, the mean is the distribution of the sample means obtained from all of the possible samples of size n. Using data from a sample to make inferences about some populations, whose characteristics are often unknown.

small n (single case) designs

instead of gathering a little info from a larger sample, researches use small n design to obtain a lot of info from just a few or just one case(s). each participant is treated as a separate experiment. Small- n designs are almost always repeated measures designs, in which researchers observe how the subject responds to several systematically designed conditions. Individals' data is presented rather than group averages. Researchers decide whether a result is replicable by repeating the experiment on a new participant rather than by doing a test of statistical significant as they would with large n designs.

one-group posttest-only design (quasi experiments)

it's not scientific. No comparison, we have no idea how behavior has changed.

Self-report

measuring the DV. most survey questions; very easy to use. More susceptible to reactivity and social desirability influences. Susceptible to lots of measurement error

conceptual replication

more common than exact replication and arguably more useful. Although the hypothesis and underlying theory may be the same, the method differs.

mundane realism or ecological validity

mundane realism- referring to how similar a study's manipulations and measures are to the kinds of situations participants might encounter in their everyday lives. study's similarity to real-world contexts- one aspect of external validity. can be limited in laboratory settings. best in field settings

interaction

new form of effect in which the combination of IVs influences the DV. Definition: An interaction (or moderation) occurs when the influence of one IV on the DV depends on the level of another IV. whether the effect of one iv (cell phone use) depends on another iv (driver age). "depends on". mathematical definition: different in differences - in driving ex, suggesting that the difference between the cell phone and control conditions (cell phone minus control) might be different for older drivers than younger drivers. cell means- when looking at factorial designs, each cell has the mean score on the DV. The test of the A x B (2 IVs) interaction concerns these cell means. detecting interactions from a table (difference in differences)- (difference between on sale and reg price for small bottles, difference between on sale and regular price for large bottles- compare those two differences. If they are significantly different, then there is an interaction. interactions on graphs-are the lines parallel? or bar graph-connect top of the bars for the same level and see if they'd be parallel. describing interaction: start with one level of the first IV and explain what is happening with the second IV at that level , and then move to the next level of the first IV and do the same thing. ex) "when people poured from the small bottle, they poured more detergent when it was on sale than when it was regular price. When people poured from the large bottle, they poured the same amount of detergent when it was on sale as when it was regular price." As you move from level to level you make it clear that the size of the effect of other other iv (product price) is changing. the interaction is almost always more important than a main effect. Simple effects: (simple main effects) subtract the two dots at the end of the crossing lines in an interaction graph - if the two simple effects differ, it suggests that we have an interaction.

Random sampling distributions

normally distributed: meaning that if you plot the IQ scores of a very large random sample on a frequency histogram you will notice that bout 68% of people fall between one standard deviation above and one standard deviation below the mean. About 14% fall between one and two and 14% between 1 and 2 below. 2% fall higher or lower than 2 sds. Central limit theorem: as n increases, the sampling distribution of the mean will become more normal--even if the population distribute has a non-normal shape.

selection effects

occurs in an experiment when the kinds of participants at one level of the independent variable are systematically different from the kinds of participants at the other level of the independent variable. example of confound. problem with self-selection: willingness/volunteering indicates more motivation or another confound.

regression toward the mean

occurs when an extreme score is caused by a combination of random factors that are unlikely to happen in the same combination again, so the extreme score gets less extreme over time. extreme score that was a lucky comb of random factors that did not repeat itself in the next game so the next game's score regressed back toard the average. threat to validity primarily when a group is selected because of its extremely high or low scores. true experiments use random assignment to place subjects into groups, a practice that eliminates regression to the mean as an internal validity threat.

file drawer problem

problem with meta-analysis. because meta analyses usually contain data that have been published in empirical journals, there could be a publicatio bias in psychology: significant relationships are more likely to be published than null effects. This can lead to the file drawer problem where instead of being published, these studies sit forgotten in the researchers' filing cabinets. The file drawer problem refers to the idea that a meta-analysis might be overestimating the true size of an effect because null effects have not been included. To combat the problem, researchers who are conducting meta-analysis usually contact colleagues requesting both published and unpublished data.

Internal validity

on of mills' conditions for causal assessments. The most difficult condition to satisfy. Are there design confounds? Are there selection effects? Are there order effects? all three involve an alternative explanation for the results. Design confounds- there is an alternative explanation because another variable happened to vary systematically along with the intended independent variable. Selection effect- a confound exists because the different independent variable groups have different types of participants. Order effect (in a within-groups design)- there is an alternative explanation because the outcome might be caused by the independent variable but it also might be caused by the order in which the levels of the variable are presented. when there is an order effect we do not know whether the independent variable is really having an effect or whether the participants are just getting tired, bored, or well-practiced.

mixed factorial designs

one iv is manipulated as independent groups and the other is manipulated as within groups. study on cell phone use and driving among different age groups is example- age was an independent groups participant variable: participants in one group were old, and those in the other group were young. But the cell phone condition independent variable was manipulated as within groups. each participant drove in both the cell phone and control conditions of the study.

pilot study

pilot study, is a small scale preliminary study conducted in order to evaluate feasibility, time, cost, adverse events, and effect size (statistical variability) in an attempt to predict an appropriate sample size and improve upon the study design prior to performance of a full-scale research project. using separate group of participants before or sometimes after to confirm effectiveness of manipulations.

Between-groups designs

posttest-only; pretest/posttest

problem of generalizing research with pretesting

pretesting effects. pretest itself can have a testing effect (learning/practicing) that can effect results of post test. the solomon four-group design allows you to assess the pretest's effect. you hope not to find an IV x Pretest interaction.

control series design

quasi experimental design. incorporates long runs of data and a comparison group. Finding differences in these designs means that confounds are not likely. helmet law repealed in FL but not GA (look at long runs of data to see motorcycle fatalities - essentially a state x time interaction)

nonequivalent control group pretest-posttest design

quasi-experimental design. adding a pretest. we can see how equivalent the groups are at the beginning. Even with initial difference, we can investigate differential change from pre-to posttest.

measure of variability or dispersion

range, variance (SD2) and standard deviation(SD) (descriptive techniques that capture the relative spread of scores. How spread out the scores are. Set can have same number of scores and same mean but have different variability. To compute variability, start by calculating how far each score is from the mean, so first calculate mean. Then create deviation score for each participant by subtracting the mean from each score. Square each deviation to eliminate the negative/positive problem-compute average of squared deviations = variance. Square root of that = SD. variance of SD2= (E(X-M)2)/n and SD = square root of SD2. Standard deviation more commonly reported than variance because it better captures how far, on average, each score is from the mean. When stand deviation is large, there is a great deal of variability in the set-the scores are spread out far from the mean, either above or below.

problem of generalizing to other participants and cultures

rats and college students are available and cheap so they are often used. But how representative are these groups for general populations? for much of its history, scientific psychology concentrated on WEIRD people: whiet, english-speaking, industrialized, rich, democratic societies. Any give study will generally have restricted diversity. Volunteer may differ from the general population. generalization as (lack of interaction): in all of these cases you can think of the potential problem as an interaction.

selection effects/selection threats in quasi-experiments

relevant only for independent groups design, not for repeated-measures designs. a selection threat to internal validity applies when the groups at the various levels of an IV contain different types of participants. in such cases, it is not clear whether it was the IV or the different types of participant in each group that led to a difference in DV between groups.

mediation

researchers propose a mediating step between two of the variables . a study does not have to be correlational to include a mediator-even experimental studies can test them. Mediation analyses often rely on multivariate tools such as regression analyses. we know there is an association between recess and behavior but researchers might next propose a reason for the associate, or a mediator- could be physical activity. Using regression you could show that recess was associated with classroom behavior in the first place only because physical activity was responsible. mediators and third variables function differently with respect to bivariate correlation. third variable can be seen as an external lurking variable that is problematic- often seen as a nascence and not of interest to researchers. with mediation researchers are interested in isolating which aspect of the causal variable is responsible for that relationship. it is internal to the causal variable and often of direct interest to researchers.

tips for consuming research

say this: "is that result stat sig? was that result replicated? that is a single study. how does the study fit in with the entire literature? how well do the methods of this study get at the theory they were testing? Was the study conducted in generalization mode? if so, were the participants sampled randomly? Would the results also apply to other settings? Would that result hold up in another cultural context?" Not this: "this single study is definitive! this study has nothing to do with the real world. that's only a theory. This is a bad study because they didn't use a random sample. This is a bad study because they used only north american participants. they used thousands of participants so it must have great external validity. That psychological principle seems so basic, i'm sure it's universal."

ABAB Designs

single case/small n designs. Add another treatment session to see if score change yet again. Also, weigh effective treatments, this aids participants' outcomes. baseline, treatment, baseline, treatment

Reversal (ABA) design

single case/small n designs. Logic is similar to the time series designs: you're looking for a steady baseline, followed by change that corresponds with treatment onset, which is then followed by a return baseline with treatment removed. Such a pattern makes history effects unlikely. You can try to identify a control group too. Must be concerned about carryover effects. appropriate mainly for situations in which the treatment would not cause lasting change.

Stable-baseline designs

single case/small n designs. comes from a study of a memory strategy called expanded rehearsal that was used with an alzheimer's patient. before teaching her new strategy, the researcher spent several weeks recording baseline info about her memory. then she taught her the new strategy as researchers continued to monitor how many words she could remember. Researchers noticed sudden improvement to memory ability. baseline was stable. if the researchers had done a single pretest before the new training and a single test afterward at the improvement could be explained by any number of factors such as maturation or regression. reporting an extend stable baseline which made it unlike that some sudden spontaneous recovery happened to occur right at the time of the new therapy. AKA interrupted time series.

Multiple baselines

single case/small n designs. research-practictioners stagger their introduction of an intervention acres a variety of contexts times or situations. if effective treatment means that you shouldn't remove the treatment --and you can recruit multiple ps--this design can help establish effectiveness. Because the treatments' onsets are at different times --but the results are the same--once can feel pretty confident that this isn't due to historical effects. could be multiple baselines are represented by a set of behaviors within one person or might be represented by different situations for one person or three different people. In any format, the multiple baselines provide comparison conditions to which a treatment or intervention can be compared.

systematic variability vs. unsystematic (or error) variability.

some confounds are problems for internal validity only if it shows systematic variability with the independent variable- that is, did the surly experimenters work out with the red ink group and the sweet ones only with the green and black ink groups? then it would be a design confound . However, if the experimenters' temperatures showed unsystematic (random or haphazard) variability across all three ink groups, then temperament would not be a confound.

ceiling effects and floor effects

special cases of weak manipulations and insensitive measures. These effects cause IV groups to score almost the same on the DV. All the scores are squeezed together either at the high end (ceiling effect) or at the low end (floor effect). Can be the result of a problematic IV or poorly designed DVs. These effects can obscure a true difference between groups -if problems on a math test are all too easy, everyone would get a perfect score. if the problems are too difficult, everyone will score low.

the research cycle

start with an idea (maybe from causal or systematic observation or researcher) -> hypotheses (parsimonious, testable, observable) -> choose research design -> decide on a population/sampling -> operationalize variables (manipulate and/or measure) ->conduct study-> analyze data and interpret findings-> report results (paper or present) contribute to larger body of knowledge. From interpret findings-> go back to idea or re-conduct study. not linear pattern. research cycle is idealized portrayal of how we do research.

systematic observation

studies consist of careful observation of specific behavior in a particular setting. Methodological issues: coding systems, equipment, reactivity, reliability: multiple observers?, sampling.

Rabinowitz paper

symbolic racism: conceptualized as a blend of basic anti-black antipathy and the sense that AAs are violating consensually held values, such as working hard. Construct validity: discriminant validity by examining whether SR predicts policy attitudes that do not involve focus on Black Americans. If Black-focused SR truly represents racism, is should work well in predicting black targeted policy attitudes and less well predicting other non black targeted policy attitudes. first study involved analysis of data from national samples. second study, experimentally manipulated the beneficiaries of a policy (blacks vs women) to assess whether removing race targeting weakened the ability of SR to predict preferences among college students.

coding system

systematic observation. Should have clear and valid operationalizations for different behaviors.

alpha level

the point at which researchers will decide whether the p is too high (and therefore will retain the null hypothesis) or very low (and therefore will reject the null hypothesis). Also, alpha is our threshold for Type I error. Therefore if p < a then we can reject the null hypothesis. Usually 5%. when we see an alpha of .05 we are admitting that 5% of the time, when the null is true, we will reject the null anyway.

statistical power

the probability that a researcher will be able to reject the null if it should be rejected. several factors to consider simultaneously- the alpha level (when research set alpha lower in a study it will be more difficult for them to reject the null hypothesis so when they decrease alpha to avoid type 1, they increase chances for type 2) sample size (a study that has a larger sample will have more power to reject the null if there really is an effect) 3) effect size (when there is a large effect size in the pop there is a greater chance of conducting a study that rejects the null. when there is a small effect size in a population, type 2 errors are more likely to occur) (sample size and effect size interact. large samples are necessary only when researchers are trying to detect a small effect size. therefore the smaller the effect size in the pop, the larger the sample needed to reject the null and therefore to avoid a type 2 error) 4) degree of unsystematic variability (when a study's design introduces more unsystematic variability into the results, researchers have less power to detect effects that are real there. measurement error, individual differences and situation noise) 5) statistical choices (some of the tests make it more difficult to find sign results and thus increase the chance of type 2 errors.use of a one-tailed vs a two-tailed test. do we want to know if a drug improves symptoms (one-tailed) or do we want to know if it improves or worsens (two-tailed). in general one, tailed test is more powerful than 2 tailed test when we have a good idea bout the direction in which the effect will occur.

sequential designs

this represents a mic combination of longitudinal and cross-sectional designs. look at multiple groups of people for long periods of time. You'll get a lot more info in the same amount of time. You can delineate between cohort effects and developmental change. However, you will need to recruit more people.

history effects in quasi-experiments

threatens internal validity. occurs when an external, historical event happens for everyone in a study at the same time as the treatment variable. With a history threat, it is unclear whether the outcome is caused by the treatment or by the common external event. selection history threat- the history threat applies to only one group, not the other so that the historical event systematically affects the subjects only in the treatment group or in control group, not in both. Quasi-experiments with only one group (no control) are more susceptible.

random assignment

to avoid selection effects. each participant has an equal chance of being in all groups.

longitudinal method

to see how people change, we could track people over time. these can take a long time though. and may be interested in cohort effects too. look at one group of people (one cohort) for a long period of time.

one way anova

two or more groups are different? testing the significance of a difference between two or more groups (comparing group means) One variable is categorical. F value. Allows us to test multiple mean differences at once. Uses the F statistic which is also a ration of effect/error. Most common tests in psych-useful for factorial designs. F is a ratio of squared terms like x2. like x2 it is not centered at 0; ranges from 0 to infinity. omnibus test. With three o more groups, you will need a post hoc test to determine stat significant pairwise differences. For each test, software will report F, df, and p-value. 2 values for df: one associated with the effect (numerator) and one associated with error (denominator)

repeated -measures designs

type of within-groups design in which participants are measured on a dependent variable more than once, that is after exposure to each level of the independent variable. (taste coke, taste pepsi, choose favorite).

Beta

use of beta to tea for third variables (multiple regression). there will be one beta value for each predictor variable. Beta is similar to r but it reveals more than r. A positive beta, like a positive r, indicates a positive relationship between the predictor variable and the dependent variable, when the other predictor variables are statistically controlled for. a beta that is zero, or not significantly different from zero, means that there is no relationship when the other predictors are controlled for. denote direction and strength. not appropriate to compare the strengths of betas from one regression table to those of another table. may see small b instead of beta. coefficient b is also called an unstandardized coefficient. denotes pos neg relationship when the other predictors are controlled for. unlike two betas, you cannot compare 2 b values within the same table to each other because b values are computed from original units of predictor (dollars, percentages) not standardized units like with beta. a predictor variable that shows a large b may not actually denote a stronger relationship to the dependent variable than a predictor variable with a smaller b. INTERPRETATION for example, the predictor variable "number of minutes for recess" has a beta of -.05 a negative relationship so as recess minutes go up, behavior problems go down, even while we statistically control for the other predictor variable in this table-the free lunch. sig or p indicate whether each beta is statistically significantly different from zero. p value give probability that the beta came from a pop in which the relationship is zero. when p is less than .05 the beta (that is the relationship between predictor and DV when other predictors are controlled for) is considered stat sig. If beta is not significant- the relationship goes away when the potent ion third variable is controlled for. --adding several predictors to a regression analysis can help answer 2 kinds of questions- first, it helps control for several third variables at the same time. can get the researchers closer to making a causal statement because the relationship between the suspected cause and the suspected effect does not appear to be attributable to any of the other variables measured. Second, by looking at betas for all the other predictor variables, we can get a sense of which factors most strongly affect classroom behavior problems (strength of beta).

manipulation checks

use them to collect empirical data on the construct validity of their IVs. an extra dependent variable that researchers can insert into an experiment to help them quantify how well an experimental manipulation worked. for ex) researcher might have asked each participant to report how social included or popular they felt right after writing about the social member-if they found that people in the social inclusion group felt more popular and included than the people in the social exclusion group, the researchers would have some evidence that their memory manipulation worked.

factorial design

used to test for interactions. is a design in which there are two or more independent variables (iVs may also be called factors) in the most common factorial design, researchers cross the two ics; that is, they study each possible combination of the IVs( test whether the effect of driving while talking on phone depended on the driver's age (manipulated 2 IVS- cellphone use and driver age) Each IV must have 2+ levels, All combinations of levels are represented. Simplest possible factorial design: a 2 x 2 "two-way" or "three-way": the number of factors/IVS the number of values shown (each separated by a multiplication sign) represents the number of IVs. The value of each value tells how many levels each one has. So a 3 x 4 design has 2 factors one with 3 levels and the other with 4 levels. There is a practical limit to the number of IVs. levels/conditions/cells. Still need to run an F test for stat significance. In a three way A x B x C factorial design, we can test 7 effects ( A, B, C main effects, AxB AxC BxC AxBxC interactions)

t test

used to test hypotheses about 1 or 2 means when the population SD needs to be estimated. When t=0 that means there is no difference between the means. however, values other than 0 do not necessarily mean we would be comfortable rejecting the null. As with any test stat, the calculated value of t would have to fall in the outermost tails of the sampling distribution. Suppose we're testing gender differences in aggression among children, and we find a stat sig difference. We could report Boys did aggress more than girls in the observational period. Is the difference significant? for independent groups. 2 features of the data influence the distance between two means. One is the difference between the two means themselves and the other is how much variability there is within each group. the t test is a ratio of these two values. the less the two groups overlap, either because their means are farther apart or because there is less variability within each group, the larger the value of t will be. sample size also influences t so that if n is larger, then the denominator of t formula becomes smaller making t larger. sampling distribution of t- estimate the probability of obtaining the t we got just by chance from a null-hypothesis population. always centered at 0 because if null is true, the points will average to be 0. as n increase t distribution approaches normal. What is the probability of getting the t we obtained or an even more extreme value of t if the null hypothesis is true in the population = p value.

f distribution

we derive the sampling distribution of f by assuming that the null hypothesis is true in the population and then imagining that we run the study many more times, drawing different samples from that null hypothesis population. what values of f would we expect from such a population if the null hypothesis is true? If there is no difference between means in the population, then most of the time the variance between the groups will be about the same as variance within the groups. When these two variances are about the same, the f ratio will be close to 1 most of the time. Therefore, sampling distribution of f is centered on 1. Not symmetrical distribution it is not possible to get an f that is less than zero. degrees of freedom for the f distribution contain two values. df value for the numerator (that is, for the between-groups variance) is the number of groups minus 1 the degrees of freedom value for the denominator (within group variance) is computed from number of participants in study minus the number of groups. computer program will calculate probability of getting the F we got, or one more extreme, if the null hypothesis is true. the larger the F we obtained, the less likely it is to have happened just by chance if the null hypothesis is true.

theory-testing mode

when a research is in theory-testing mode, external validity and real-world applicability are lower priorities. in terms of the theory being tested, generalizability may or may not be valuable to the researcher. the theoretical understanding gained from some seemingly artificial studies can contribute to our understanding of human processes or whatever. Theory testing mode often demands that experimenters create artificial situations that allow them to minimize distractions, eliminate alternative explanations, and isolate individual features of some situation. prioritizes internal validity at the expense of all other considerations including ecological validity

demand characteristics

when an experiment contains cues that lead participants to guess its hypothesis, the experiment is said to have demand characteristics or experimental demand. if demand characteristics are high, they may create an alternative explanation for results. disadvantage of within-groups designs. these are aspects of the study that elicit expectancies in ps and yield unnatural response. Most people try to figure out the purpose of a study. When they think they know they may try to help the experimenter. For this read, hide info that may provide hints to hypothesis-deception, between groups manipulations, single-blind procedures, embedding measures among filler items.

factors determining an appropriate sample size

when researchers are striving to generalize a frequency claim from a sample to a population, the size of a sample is in fact much less important than how the sample was selected. a researcher will choose a sample size in order to optimize the margin of error which quantifies the degree of sampling error in a study's rules. the larger the sample size, the smaller the margin of error. however, after a sample of 1,000 people it takes many more people to gain just a little more accuracy in the margin of error. sample size and effect size interact. large samples are necessary only when researchers are trying to detect a small effect size. therefore the smaller the effect size in the pop, the larger the sample needed to reject the null and therefore to avoid a type 2 error

testing threats in quasi experiments

whenever researchers measure participants more than once, they need to e concerned about testing threats to internal validity. testing threat is a kind of order effect in which participants tend to change as a result of having been tested before. repeated testing might cause people to improve regardless of the treatment they receive. Repeated testing mitt also cause performance to decline because of fatigue or boredom.

chi-square test

x2. Used when you have DVs that are measured on a nominal bases (categorical). The ration of effect/error is distributed in a manner that is called x2. Unlike t, it is not centered at o; it ranges from 0 to positive infinity. Like t, it uses a single value for df. "Republicans and Democrats are equally represented in the population". A sort of chi-square test allows you to test contingency or dependency. That is whether or not the frequency of one category depends on the category of another variable. These are more interesting sorts of hypotheses and are related to the concept of interaction. phi coefficient - used to evaluate the association between two categorical variables.


Kaugnay na mga set ng pag-aaral

Porth's PrepU: Chapter 30- Disorders of Blood Flow in the Systemic Circulation

View Set

A&P Chapter 3.2, Blood composition and plasma

View Set

Chapter 49 NCLEX style questions

View Set

Technical Support Fundamentals Module 6 Troubleshooting Customer Service

View Set

Chapter 12 Caring for the special needs child- Tinky

View Set

Principles of Finance Ch 1-3 Test

View Set