Research methods

Ace your homework & exams now with Quizwiz!

What is meant by a linear contrast effect of order-0?

A linear contrast effect of order-0 is one in which (i) all positively valued weights sum to +1 and (ii) all negatively valued weights sum to -1.

Family wise Alpha

Probability of rejecting Nh for one true Nh among j when all j Nh are true

Actual false rejection rate

Rate of rejecting null hypothesis in practise over large amount of repeated applications

Nominal false rejection rate

Rate set a priori by aloha criterion

Standardised mean contrast value

Raw mean contrast / square root of ms within

Dummy categories

Can only ever be 0r 1 an then have ref category as well 0. Categorical variables can't be given continuous ordering bc meaningless r squared will change depending on value and sums of squares will be incorrect In regression with dummy variables the intercept is the mean of reference category Coeffiecnt is difference between dummy category mean and reference category mean

Standardisers in depends group design

Can use cohens d or hedges g for change (s x square root of 2(1-r) ) Use dunlaps d or bonetts delta for matched groups (using standard dev of one group or pool both groups variance

F distribution of regression

Cannot be negative Positively skewed- region of rejection in upper region only The shape of line defined by Df reg Df resid

Error rates in multiple hypothesis testing

Carry out multiple non independent null hypnosis tests the error rate increases!!

Efficiency of estimator

Degree of variation of an estimate PP indicates its efficiency. *Smaller amount of variation/standard error* is the more efficient estimator. All other things being equal, prefer estimators to be efficient

Variance calculation

Deviation scores are square to calculate the variance because the sample of mean deviation scores will always equal 0.

Chi square

Expected frequency = row frequency/column frequency / total n (what we would expect if there was no association) Chi square sum of observed minus expected squared/expected Bigger decipher any, more association and bigger chi square smaller p vale Df (r-1)(c-1) Correspodm to x square distribution, Los skew one tailed region of rejection

Confidence interval on confidence interval

Gives 95% confidence range of plausible values for how cub of the change on the dv could be accounted for. U one unit change in IV Spps sons does unstandardised reg coefficent Exci does SDT, semi partial as well

Features of 2x2 contingency table

Row and column marginal mean should Su, zero (because debating scores from gran mean Interaction effects in cell should also sum to zero within column and within their row

Benefits of orthogonality

Shows independent and non redundant Can provide additive account for SS between each contrast has unique components K-1 orthogonal contrasts can be created for k levels of factor Need to be orthogonal and equal sample size to be perfectly additive tho!!!!!

What are the implications of violating the assumption of sphericity for planned comparisons in a within-subjects ANOVA?

The implications of violating the assumption of sphericity for planned comparisons in a within-subjects ANOVA is that the statistical testing associated with the planned comparisons are likely to be false rejection rates higher than the nominal levels set by the defined α for each comparison. This will mean that some contrast tests will be rejected more often when the null hypothesis is true than what the rate is defined to be by the α value set for each comparison.

Name the five components of the additive decomposition of means in a two-way factorial design:

The five components of the additive decomposition of means in a two-way factorial design are: (i) the cell means, (ii) the grand mean, (iii) the row effects, (iv) the column effects, and (v) the interaction effects. It can be shown that each cell mean can be decomposed in an additive fashion into the grand mean + a row effect + a column effect + and interaction effect.

If the ANOVA table from a two-way ANOVA was compared to ANOVA tables from two separate one-way ANOVAs, what features would be different?

If the two separate one-way ANOVAs were for the marginal means of the factors from the two-way ANOVA then the following differences in general would be observed when these separate results were compared to the two-way ANOVA results: (i) the respective sum of squares between and mean sum of squares between for each factor; (ii) the within sum of squares; (iii) the denominator degrees of freedom; and (iv) the F values and obtained p values for each factor.

Effect sze for contrast and within anova

Ms contrast (sum of aj mj) / sqr root of MS within Analogous to hedges g Using ms within standardiser assumes homogeneity

F contrast

Ms contrast / ms within If focused the df in numerator always 1 if omnibus will be >1

Standardised score

Number obtained on a variable when the raw score is transformed into a new value with a predefined mean AND a predefined scale or metric based on the number of raw units equated to each standard deviation. Used when raw score has no natural scale. Example IQ test mean of 100 SD = 15 units

Raw score on construct measure

Numerical value obtained directly from original method of measurement in research- also known as observed score Noted as Xi where I is individual reference number in the sample (i-th person) X is a * random variable* because each persons construct has likelihood of being observed, assume normal distribution of scores (probability distribution) E.g some have naturally defined unit of measurement (RT in ms, age in years, GSR in ohms)

Transformed raw score

Numerical values on variable after applying mathematical formula to each participants raw score

t observed score

Obsv - null hypothesis / standard error Aka observed test statistic Sample static transformed by the standard error into a metric of theoretical probability distribution. Indicates the degree of deviation from null hypothesises

NHST reg coefficient a

On reg coeffiecnt if simple or partial reg coeffiecenfs Where the test stat is bj - Bj/sbj Df is that of residuals (n-no of IVS-1)

What is the term a priori mean in reference to the concept of planned comparisons in ANOVA?

priori is "from the former" or "from before". It means that the set of contrasts, and their weights, are decided upon and specified before the sample means of each group are known to the researcher (i.e., this is directly analogous to specifying a research question that defines how the research design and data analysis are to be undertaken in a particular research context).

Confidence intervals help...

testing multiple null hypotheses

By multiplying Z scores together with the covariance formula ---

you get a correlation coefficient i.e. Pearson's correlation

Confidence intervals around score with standard error

1.96 * SE

What is meant by a focused approach (to either or both research questions and statistical tests)?

A *focused approach* is one in which the research question or the statistical analysis immediately lead to an answer that does not require any further specification or analysis to be meaningful and fully interpretable. Within statistical testing, a focused approach is one characterised by only *one degree of freedom* in the numerator of the F test in linear regression and in ANOVA (and 1 df for the chi-square test for contingency tables). Any statistical analysis that results in a t-distributed test statistic can also be classified as a focused approach, because the *t distributed when squared becomes the F distribution* with 1 degree of freedom in the numerator.

Estimand

Aka population parameter i.e. μ and σ²

Multiple regression model properties

Each Iv has its own regression coeffiecnt Only ever one a intercept Assumes persona has a score on each Iv

Robustness on one way anova

If equal variance not assumed under leveien then use f-* or w test which are more robust Similar in that the adjust the df and f value (like in separate variance t test) and result in actual rate being close to nominal when null hypothesis is true

T contrast

Square root of f contrast

Multiple correlation coeffiecnt

Square root of r squared, ranges 0 and 1

What are the three basic procedures often used in calculating sample statistics?

Three commonly used mathematical procedures in calculating statistics are: (i) summing up scores (or some transformation of raw scores) over the complete sample, (ii) squaring (or some other transformation of) scores (iii) averaging the total sum of these values using the number of people in the sample (or some function of sample size).

What is required to translate a focused null hypothesis into a set of planned comparison weights?

To translate a focused null hypothesis into a set of planned comparison weights, we require the null hypothesis to be formulated directly on the relevant groups to be compared such that the hypothesis becomes analogous to an independent sample t-test, with one combination of group means being compared to a second combination of group means (where either combination can be represented by a single group mean if so chosen).

Balance in mixed design anova

Unbalanced between subject groups make design unbalances, even though is the same at both time points.

Standard error in sampling distribution

Variability of sampling distribution (I.e. The spread) gets smaller as sample size largess - skinnier bell curve Standard error: measure of the average variation in a sample distribution. Equivalent to standard deviation. Standard deviation / square root of n

Linear combination of means

Weighted sum of means of all levels of the factor Needs to equal zero Sum of aj mj

P value function

What graph would look like if repeated NHST like spiky bell curve.

What is different about a planned comparison compared to an omnibus null hypothesis?

When it is *rejected*, an omnibus null hypothesis provides *no direct evidence* of where the difference in group means may be occurring among the levels of the factor. A planned comparison, however, can provide direct specific evidence about which levels of the factor may be different (depending on the groups chosen to make the planned comparison).

Homogeneity of group variance assumption

Worse if small sample size and non normal distributions Small size with... Small variance - conservative Large liberal Inverse for large sample size!! But is nuts with same sample size even if heterogenous Legumes test will show this of you can adjust df or do. Arrange without pooling

Sum of residuals always equals

Zero

Critical test statistic

analogue to alpha

Observed test statistic

analogue to p value

Covariance

the measure of degree of concurrent variation in people's scores one two variables that covary togethter - measures the strength and direction of the linear association between two variables. - can be both a PP and SS

Ind sample t test

1) call sp 2 weighted average n-1 + s + n-1+s/n+n-2 2) sm1-m2 - square root of sp2+ (n+n)/(n x n) 3) mean diff - (m1-m2)-(pop mean 1-pop mean 2)/ sm1-m2 value

Assumptions of persons corrlet

1) independent obs 2) continuous 3) normal distribution 4) linearly related 5) no error measured 6) values weren't restricted in their range

level of confidence

100 x (1-a)% - if alpha = 0.05, then the confidence level is 100 x (1 - 0.05) = 95%

Estimators of odds ratio

3 you can use Unadjusted Adjusted add o.5 to deal with undefined cell means Adjusted that smooths frequency All are biased consistent and equally fi ent Variability is smaller with bigger n ➡️smaller standard error

Estimator

A *formula*, a mathematical expression we apply to our sample scores to obtain an estimated value for the population parameter. Can have different estimators for the same statistic - judges on bias, efficiency and consistency

Define a 95% confidence interval that is placed around a sample R² value, which is analogous to the interpretation of the sample R² itself:

A 95% confidence interval on R² means that we can be 95% confident that the population R² value will lie between the lower bound value LB and the upper bound value UB (where LB and UB refer to two values found between 0 and 1, with LB < UB).

What is meant by a design being balanced? What is the opposite of a balanced design, and how would it be identified?

A balanced design is one in which each level of the factor has the same sample size. The opposite of a balanced design is one where the sample size of at least one level is not the same as the other levels. In these circumstances, the design is called unbalanced.

Confidence Interval (Ci)

A confidence interval defines the set of null hypothesised PP values that would not be rejected for a chosen value of alpha in using null hypothesis test when applied to a SSS. It contains two values that define the interval: (i) a lower bound (ii) upper bound

What is meant by a hypothesis test being too conservative, and what would be the implication of this occurring when using a hypothesis test?

A conservative hypothesis test is one in which the actual rate of false rejections of a true null hypothesis is *smaller* than the nominal maximum rate set by the α value. The implications of a liberal test are that (i) we will reject a true null hypothesis less often than we should, because (ii) the obtained p value is larger on average than the value it would actually be if the test was robust.

What is the difference between a covariance and a correlation?

A covariance is a measure of the strength and direction of association between scores on two continuous variables that is in an unstandardised metric (i.e., the value of the covariance reflects the scaling of the two sets of scores for the two variables). A correlation is a measure the strength and direction of association between scores on two continuous variables that is in a standardised metric because it is calculated on Z scores.

Describe briefly how a covariance matrix of observed scores would show compound symmetry. If a covariance matrix meets the requirement of compound symmetry, what does that imply about sphericity?

A covariance matrix exhibiting compound symmetry at a population level would have the same value on all diagonal elements of the matrix (i.e., for the variances) and the same value for all off-diagonal elements in the matrix (i.e., for the covariances). Any covariance matrix exhibiting compound symmetry would also exhibit sphericity of differences scores constructed from the observed scores comprising the compound symmetry matrix. This is because compound symmetry is a sufficient condition for sphericity (it is possible, however, for difference scores to exhibit sphericity without the observed scores exhibiting compound symmetry because sphericity is not a necessary condition for compound symmetry).

What is meant in general terms by a distribution of scores?

A distribution is a set of different numerical values that can be found, or conceived to occur, in at least four different contexts: (i) a sample distribution of observed scores; (ii) a population distribution of construct values; (iii) a sampling distribution of sample statistics; and (iv) a theoretical distribution

What is meant by a factorial design for investigating group differences?

A factorial design is logical extension to a one-way design, and the cross-classification of levels of the two factors is the key feature that enables interaction effects to be investigated. A factorial design involves two (or more) factors within an ANOVA framework, whereby each level of one factor is fully crossed with all levels of the other factor (or factors). A fully crossed design is indicated by the following kind of depiction being possible in a crossclassified table, whereby each cell in the table can be defined by the intersection of a particular level of one factor and a particular level of the other factor.

What do we mean by a parameter in psychological research? What is its role?

A parameter is a quantitative summary characteristic of all people in a population that may define, e.g., the average amount or average variability of some construct, or the strength of relationship between constructs. A population parameter is regarded as having only one possible fixed value (i.e., the population value).

What do we mean by a sample? How is it defined?

A sample is a finite set of people of size n who are selected from a relevant population to investigate the research question, and on whom we take measurements in order to undertake the research.

What is the difference between a simple regression model and a multiple regression model?

A simple linear regression model will contain only one independent variable, whereas a multiple linear regression model will contain two or more independent variables (this makes common sense because "multiple" implies "more than one").

Population parameter

A single numerical value that specifies a particular summary feature of an unknown population

What is the meaning of an odds ratio?

An odds ratio is the ratio of two sets of odds. Each set of odds is the ratio of the probability of one category in a variable being present to the probability of it not occurring within each of two categories of a second variable. It therefore indicates the relative odds of two events jointly occurring relative to them not occurring.

How is the standardized partial regression coefficient in a multiple regression model interpreted?

An standardised partial regression coefficient is the expected (or, equivalently, predicted) change in the dependent variable for a change of one standard deviation (i.e., an increase, or decrease, of one standard deviation/a Z score unit) on the independent variable., while holding constant (i.e., not changing) scores on all other independent variables in the multiple regression model.

What is meant by an unbiased estimator?

An unbiased estimator is a mathematical function for a sample statistic for which the mean of that statistic's sampling distribution is equal to the population parameter value. A biased estimator is, in contrast, a mathematical function for which the mean of the sampling distribution does not equal the population parameter value. As an aside, it is not necessary to explicitly know the actual value of the population parameter to know if it is unbiased or not--- mathematical statisticians have devised ways of calculating the degree of bias from the mathematical expression for the estimator.

Construct

An unobservable attribute we use in theories and research to explain human behaviour, cognition and affect

What kind(s) of analysis immediately comes to mind when we talk of association among variables?

Association represents a symmetric form of relationship between construct measures in that all variables have the same functional form and role in the analysis. The two most common forms of analysis here are correlation (for continuous measures) and contingency tables (for categorical measures).

Estimation

Calculating values for a population parameter based on the data you collected in a sample. Can be either Point estimation: produces a single value Interval (confidence) estimation: produces a range of values

Intercept parameter

Generally at zero in standardised regression. Equation Is arbitrary and meaning ingles at zero not generally interpreted

Unbiased estimator

If the mean of the sampling distribution of SS equals PP then formula is *unbiased*, anything else is bias. N is bias but *n-1 is unbiased* estimator

(a) A sample mean has a value of 35. Is this value an estimate or an estimator of the population mean? (b) What reason(s) did you base your answer on?

(a) It is an estimate because it is a particular value calculated from sample scores. (b) The question gives a specific value for the sample mean, and therefore this value is considered an estimate of the population mean.

Standardising transformation

Mathematical function to change current value of individual score OR sample statistic into new value with known metric scaling expressed in standardised units. E.g. Calculating observed test statistic from a sample statistic, number of a SD unit

Linear contrast weights

Need to equal zero Weight 0 if want to exclude and IV Can be undertaken with Ci fro raw or std mean diff

Linear contrasts

Needs to equal zero for meaningful analysis Reduce k levels into 2 distinct groups (pls weight and meg weighted)

False rejection error rate

Number of times we would reject a true null hypothesises over the long run Pr (ho = t|ho has been rejected) Error rate unknown but relates to known probability set by alpha

2x2 contrast weights

Order0 for main effects, Su, to zero -1 and +1 Order 1 for interaction effects pos sum to +2 neg -2 sum to zero overall

Implied research design of prediction

Regression analysis

Summing squaring and averaging

Same underlying ideas in formulas. E.g. Sample mean is summing the total individual scores in the numerator and dividing by average in the denominator

How can an alternative hypothesis be expressed in two equivalent ways for an independent samples t test?

The alternative hypothesis can be equivalently expressed as either as H0 : μ1 ≠ μ2 or H0: μ1 - μ2 ≠ 0, where μ1 and μ2 again indicate the population means of Group 1 and Group 2.

What is meant by the cross-product of planned contrast weights and samples means in a planned comparison?

The cross-product of contrast weights and sample means is where the sample mean for a particular level of a factor is multiplied by its corresponding planned comparison weight.

How are the degrees of freedom calculated for a sample correlation?

The degrees of freedom for a sample correlation are the n ‒ 2, where n is the sample size.

What are the properties of main effects in relation to the grand mean?

The row and column effects are the values btained by subtracting the grand mean from the marginal row means and marginal column means respectively.

What is the standardiser used to calculate a standardised mean contrast that is analogous to Hedges' g in a two group design?

The square root of the MSwithin from the omnibus ANOVA Table is typically used as a standardiser to transform the raw contrast mean different into a standardised metric.

How is the statistical significance of R² assessed using a null hypothesis test?

The statistical significance of an observed R2 value is assessed using an F test obtained from an ANOVA table that contains the sum of squares for the regression model and the sum of squares of the residual plus their respective degrees of freedom.

What does the term association mean in the content of a research question?

The term association in the context of a research question means that researchers are proposing a particular kind of relationship between two constructs exists. If the measurements of the two constructs are both categorical in nature, then the occurrence of particular categories in one construct is contingent with particular categories in the other construct. If the measurement of the two construct are both continuous in nature, then values on one construct measure tend to vary in a systematic way with values on the second construct (i.e., higher values of one construct are related to either higher values [a positive association] or lower values [a negative association] of the other construct).

If a scatterplot demonstrates a pattern of values in which high scores on the Y-axis tend to co-occur with high scores on the X-axis, what kind of correlation might this indicate?

This is also indicative of a positive correlation between the two sets of scores.

If a scatterplot demonstrates a pattern of values in which low scores on the X-axis tend to co-occur with low scores on the Y-axis, what kind of correlation might this indicate?

This is indicative of a positive correlation between the two sets of scores.

What is the most important difference between an independent sample t-test and a dependent samples t-test? (a) The standard error in a dependent samples t-test takes into account the correlation between scores on the two groups. - Why is this correct?

This statement is right because a dependent sample t-test is calculated using individual difference scores, the variability of which is in part determined by the strength of the correlation between the raw scores in the two groups used to calculate the difference scores. The observed scores for the two groups in an independent samples t-test cannot correlate because each person belongs to only one of the two groups, and therefore people in the sample do not have a score for each group.

SS in anova

Total SS sum of ind scores - grand mean squared SS between sum of group scores - grand mean sq SS within sum of und score from group mean sq

Conditional probability

the long-run relative frequency of one outcome among a set of all possible relative outcomes, *given* a particular outcome in a second set of possibly related outcomes --> Pr(A|B) which is not equal to Pr(B|A) - B events are conditioned/have occured

Estimator or correlation

Biased but consistent Confidence interval should capture actually pp is same Ci as pop Small sample size not normal has drastic effects, not consistent or efficient Kurtosis bugger threat to variance than skew Ess If point estimator unbiased p consistent doesn't follow that interval estimate will be

Confidence interval for r squared

Calculates the possible values for r squared Is precise because between zero and one If less than one can reject as insignificant Can indicate exeme bias such as when the point estimate is outside of the Ci interval estimate Is biased but consistent estimator Ideally, Big sample size (will otherwise over inflate) Small number of IVS Big r squared value Can take square root and it will apply to ,ultimate correlation coeffiecnt

Orthogonal comparison sets

Can be difference contracts or helmert Test orthogonal its by comparing every pair wise come I in the set Difference contrasts each successive levels compared to average of all levels preceding it Helmert inverse starts with last group and breaks down

Find the interaction in cells of contingency table

Cell mean - grand mean - column effect - row effect Where column and row effects are deviation scores from grand mean

Types of design of association

Continuous --> correlation Categorical odds ratio --> contingency tables

Semi partial correlation

Correlation between observed score son DV and that part of scores on IV that's isn't accounted for by other IVS

Implied research design of association

Correlation or contingency table

Consistency of estimator

If estimator gets closer to true PP as sample size increases then it is consistent (corrects bias) - otherwise it is inconsistent Therefore, degree of bias may get smaller as sample size increases - use a biased estimator that is consistent

Orthogonal its

If teh contrast weights if 2 planned comparison have cross products that sum to zero then they are orthogonal

How is a Bonferroni correction applied to an α value, and which kind of α value is it applied to?

If there are j hypothesis tests being undertaken and all are considered to belong to family, then the Bonferroni correction involves using a per comparison αPC value given by α/j, where α is the value chosen to achieve control of familywise αFW at a particular level (e.g., .05).

What is required to translate a planned comparison into a focused hypothesis test?

If we wish to translate a planned comparison into a focused hypothesis test, then we require a set of linear contrast weights (i) to multiply each of the group sample means by, and (ii) to form the denominator of the equation used to derive the sum of squares for the contrast

In one-way ANOVA, what is the total variation in the dependent variable decomposed into?

In both ANOVA and linear regression, total sum of squares on the dependent variable is decomposed into one part being accounted for by the model (i.e., the regression sum of squares in linear regression and the between-subjects sum of squares in ANOVA) and another part not being accounted for by the model (i.e., the residual sum of squares in linear regression, and the sum of squares within for ANOVA).

Relationship between sample size and confidence interval for odds

Larger sample size will have a more narrow confidence interval (more precise) for both correlation and odds rations

Partial regression coeffiecnt

Least squares estimator partial out overlap caused by correlation with other dvs to find the unique contributions of each IV

Within sub anova f test

MS occ/MS error

Type 1 error

Rejecting null hypothesis when actually true

Within sub SS break down

SS between SS within - ss occasion and SS ind x occasion sxo SS sxo = SS total - SS Occassion - SS between Called error in output

R squared

SS reg / ss total (ss reg and SS resid) Three sources of variability In obs scores (sums of squared deviation scores) In predicted sores (sum of ÿ-m) squared In residuals some of (Y-Ÿ) squared Higher r squared accounts for mods a rotation in the dev

SS contrast

Sum of aj mj / sum of aj/n if balanced If unbalanced n x sumof aj mj squared/ sum of aj mj

What are the advantages of using a standardised mean difference when undertaking planned comparisons?

The advantages of using a standardised mean contrast value is that the size of the effect being calculated can be better understood when using a DV that has an arbitrary metric that has no real interpretable meaning. The standardised contrast mean is measured in standard deviation units, and can be more meaningfully interpreted using that metric.

Where would main effects be observed in the cross-classified table?

The main effects would be found in the marginal row and column cells of the two-way crossclassified table of means.

What are the observed frequencies in a contingency table?

The observed frequencies in a contingency table are the number of people in the sample who are uniquely defined by one of the row categories and by one of the column categories, with the set of all observed frequencies in the I x J cells in the contingency table summing up to the sample size.

Population

The set of all individuals (N) relevant to the constructs in a research question OR to whom a psychological theory applies (often defined by construct themselves, depressed, women etc.) denoted by N

What are the three statistical assumptions for the one-way between-subjects ANOVA that are also assumptions for the independent two-group t test?

The three assumptions are: (i) independence of observations; (ii) normally distributed scores; and (iii) homogeneity of group variances

What does the total sum of squares of a dependent variable get decomposed into in a linear regression model?

The total sum of squares Total SS is decomposed into (i) that part being explained, or accounted for, by the regression model (i.e., by the set of independent variables being specified in the model) and therefore called the sum of squares of the regression model Reg SS , and (ii) that part not being explained, or accounted for, by the regression model— which is called the residual sum of squares and is signified as res SS .

Q5 In which of the following conditions will the actual false rejection rate differ from the nominal false rejection rate for a hypothesis test using an independent samples t test? (a) Unequal group sample sizes; normal distribution of scores; and unequal group variances. Why is this statement true?

This statement is true because a hypothesis test is not robust when actual false rejection rate differ from the nominal rate defined by alpha. The two key conditions for lack of robustness for an independent samples t-test are (i) unequal sample sizes, and (ii) unequal group variances (whether or not the observed scores are normally distributed is not as critical, compared to these two). The latter condition is a violation of the assumption of homogeneity of variance (but there is no assumption about equal sample size in each group).

How is this decomposition different to that undertaken in a one-way between-subjects ANOVA?

This within-subjects decomposition of total sum of squares differs from a between-subjects design in that (i) the within-subjects sum of squares is further decomposed in two sources (between-occasions and subject-by-occasions interaction), and (ii) the between-subjects sum of squares is not considered relevant after it has been excluded from initial consideration by separating it from the proportion of total sum of squares explained by within-subjects variation.

Z score

Type of standardised score. Z score mean equals 0 and each numerical unit of z score = 1 SD. Raw score- sample mean (aka deviation score) / sample standard deviation Z = (Xi - M)/SD = xi/SD xi = (Xi - M) To go from z score to another standardised score we rescale the z score by the mean and SD of desired standardised score e.g. 100+(z score X 15)

Bob's raw score on a resilience measure is 23; Ali's deviation score on the same measure is -4. (a) Which one of the two people has greater resilience? (b) What reason(s) did you base your answer on? (c) What can you conclude about Ali's level of resilience? (d) The sample mean for scores on the resilience measure that included both Bob and Ali's score equals 35. What can you know conclude (if at all) after knowing this additional piece of information?

(a) We cannot say, without knowing the value of the sample mean. This is because if the sample mean for resilience was, e.g., 50 (we find out below that it is in fact 35, but I am just using this possible value to explain the issue), then Bob's deviation score would be 23 - 50 = -27 (which is must less than Ali's deviation score). (b) The way in which deviation scores are defined and calculated implies that that raw scores and deviation scores cannot be compared directly without knowing the value of the sample mean used to transform the original raw score into a deviation score. (c) All we know is that Ali's score is 4 units below the mean (but it is not clear what these 4 units represent, because the metric of this score has no specified scale) (d) Bob's score was in raw score form, whereas Ali's was in deviation score form. To make a meaningful comparison of their two scores, they need to either be both in raw score form, or both in deviation score. We can obtain Bob's deviation score using the formula x = X - M ; i.e., 23 - 35 = -12; we can also obtain Ali's raw score using the formula X = x + M ; i.e., -4 + 35 = 31. Therefore, in raw score form, Bob = 23 and Ali = 31; in deviation score form, Bob = -12 and Ali = -4. In either form, Bob's score is 8 units below that of Ali's, and therefore Ali's score indicates she is exhibiting a higher amount of resilience.

Properties of Pearson correlation

- Gives strength and direction - natural effect size - between -1 and 0 = negative association of the two variables - between 0 and +1 = positive association of the two variables - Values closer to -1 and +1 ==> stronger association - 0 = no linear association between two scores - Is symmetrical in size and direction e.g. +0.7 and -0.7 - Has df of 2 - Is a biased but consistent estimator - Often has confidence Intel on the fishers r to z table or xeci t obs = r-0/standard error SMR root (1-r squared/ n-2

Understanding the Sampling Distribution for an assumed null hypothesis

- How consistent is the SS with the sampling distribution? -How close is it to the PP value (mean)? - If the sample value is sufficiently far away from the null hypothesis value - then we reject it and say it is more likely to occur in a sampling distribution from some other PP value. The alpha value (0.05) determines how far away the sample value is. I.e. a small p value, smaller than 0.05 = the PP value is unlikely to be the same as our sample data value - not a true representative of the PP.

What justifies using matemathical probability distributions to make our inferences given we calculate a single sample statistic like the mean?

- In many instances, a mathematical probability distribution = the sampling distribution of a SS as the number of replication samples gets larger. - Everything relevant to sampling distributions can be obtained from a mathematical probability distribution.

Benefits of Ci

- Range of plausible of null hypothesis values Directly meaningful to infer from Upper and lower bound, at worst Of captures 0 is insignificant Probability if ran test 100 to,Es 95 to,Es would be in this region.

Criteria of a good research question (6)

1) has a question mark at end (?) 2) lists all relevant constructs 3) indicates relationship among constructs 4) specifies relevant population 5) derived from rationale 6) indicate what type of analysis/design (predictive, group difference, association)

Define group

1) naturally occurring/observable categories e.g male or female - independent 2) defined by researcher e.g. control and treatment groups - independent 3) matched set of measurements e.g time 1 and time 2 - dependent i.e. each person can belong to each group or can be paired in two groups for comparison

Rational for group differences

1) people may be identified by a grouping variable of two or more categories [which represents a type of group] 2) Evidence for a group difference implies that they represent *different populations* in this construct = the mean of the sampling distribution does not equal zero 3) No evidence for difference suggests that each group comes from *same* population in the construct = the mean of the sampling distribution equals zero

14 principles of psych research

1) research questions 2) populations and samples 3) constructs, measurements and scores 4) types of construct scores 5) parameters and statistics 6) distribution of individuals scores 7) summing, squaring and averaging 8) distributions of sample statistics 9) standardising transformations of statistics 10) theoretical probability distributions 11) estimation, estimators and estimates 12) inferences using null hypothesis tests 13) types and meaning of probability 14) inferences using confidence intervals

Properties of a sampling distribution

1) shape depends on statistic used to construct it, sample size, population distribution 2) mean of sampling distribution pf an unbiased statistic equals the population parameter value 3) the SD of a sampling distribution = standard error of that sampling distribution 4) sampling distribution of many but not all sample statistics increase into a mathematical probability distribution as the replication gets large

Ritual of NHST

1) specify null (H₀) and alternative hypothesis (H-alpha) 2) decide on alpha level (0.05) 3) obtain sample statistics, mean and standard error 4) calculate observed t statistics If using p value 5) find p value in theoretical probability distribution 6) if p is less the defined alpha we can reject NHST i.e. reject null and infer that the population mean is unlikely to be/not compatible to μ=100 Otherwise Compare the t observed value to a critical test significant value

How can we justify making an inference from a sample to the unknown population using NHST?

1. We make an initial assumption about the value of the population parameter. 2. Then we look for sufficient evidence that this assumption is not true/compatible. 3. We wish to nullify this assumption (i.e. rule out this assumed value based on evidence from our sample statistic value)

How can you increase the correlation between test scores and constructs?

1. increase the relationship between psychological construct and the test 2. remove sources of inconsistency in test administration and interpretation 3. increase the number of items on the test

Different types of construct scores

> raw score (Xi, Yi) > deviation score > standardised and z scores

What is a per comparison false rejection error rate (ERPC)? How is it related to, and distinctfrom, a per comparison α value (αPC)?

A *per comparison false rejection error rate* is the probability that the null hypothesis is true, given that we have rejected it for one hypothesis test among a set of null hypotheses being undertaken; i.e., Pr(H0 = true | H0 is rejected for c), where j indicates a particular comparison each time that a decision is made following the application of a hypothesis test. If we undertake, for example, 10 hypothesis tests then ERPC is the probability of each null hypothesis being true when considered on its own given that we might reject the null hypothesis for one of the tests. The term "per comparison" can be used generically and applied to a set of multiple hypothesis tests, or it can be used specifically to refer to a set of j planned comparisons in an ANOVA context. The ERPC probability therefore applies to each one of the j hypothesis tests underlying the set of planned comparisons. The per comparison α value (αPC) is the value we assign to the α criterion in each one of the planned comparisons, and αPC therefore sets the maximum proportion of times a ERPC will occur in each one of the j hypothesis tests.

What is the general interpretation of a 95% confidence interval placed around a Hedges' g sample estimate, either from an independent or dependent two-group design?

A 95% confidence interval around a Cohen's d or Hedges' g point estimate can be equivalently interpreted as indicating (i) the range of null hypothesised population standardised mean difference values that cannot be rejected if α is set at .05, or (ii) the range of values within which we would expect the unknown population parameter value of the standardised mean difference to be captured 95% of time over repeated samplings.

If a 95% confidence interval for a correlation does include zero, what can be inferred from this result?

A 95% confidence interval on a sample correlation that does capture the value 0 indicates that 0 is one of the plausible values for the population correlation parameter that is between the lower and upper bound values of the confidence interval. Therefore, we cannot rule out no association (i.e., a correlation of 0) as being a likely results based on your sample data. Furthermore, the fact that the lower and upper bound values must be different in the sign (i.e., one is positive and the other is negative) means that we cannot reasonably infer what the direction of the association might be, even if one of the non-zero values within the interval were the actual population parameter value.

What is meant by a Bonferroni correction to an α value?

A Bonferroni correction to an α value is an adjustment being made to α to continue to maintain control of the false rejection error rate at the family level by enforcing a more restrictive per comparison α value.

What is a P value function? What does it tell us about the correspondence between confidence intervals and null hypotheses?

A P value function is a plot of obtained p values (on the Y-axis) against null hypothesised values (on the X-axis) for a null hypothesis test being applied to a sample statistic. If a horizontal line is drawn across the plot at a level of .05 on the Y-axis, then the intersection of that horizontal line with the line defining the P value function delineates two boundaries on the X-axis at which the corresponding null hypothesised value equals .05 in each instance. These two values on the Xaxis also represent the lower bound and the upper bound of a 95% confidence interval.

Between subjects factors

A between-subjects factor is one in which each person can only belong to *one* of the *k levels* of the *factor* (i.e., only belong to one of the k groups), either because they have been randomly allocated to one group by virtue of the *research design* being used, or because they can only belong to one of the *k naturally occurring*, independently identified groups used to form the factor.

Why might we want to calculate a confidence interval around the estimate of an odds ratio from a 2 x 2 contingency table? Why might we prefer using it to make inferences rather than a χ2 null hypothesis test?

A confidence interval around a sample odds ratio provides all the information that a hypothesis test gives about the unknown population parameter value plus it also gives us additional information that is not contained in the chi-square value of the hypothesis test. If the confidence interval does not include the value 1 of (i.e., both the lower and upper bounds of the interval are either both < 1 or they are both > 1), then we immediately know that the null hypothesis of no association would be rejected. In addition, the confidence interval provides a range of plausible values for the population parameter that would not be rejected by a null hypothesis value test (cf. the null hypothesis test only tells as about one plausible value—that of no association, which equals an odds ratio of 1). Finally, the confidence interval tells us how precisely this range of plausible values are being estimated: If the difference between the lower and upper bounds of the interval is relatively small (e.g., less than about 5 units of odds ratio), then it reasonable to say that the unknown population value of the odds ratio is being estimated quite precisely.

What are the three components needed to construct a confidence interval? What are the resultant two features of a confidence interval?

A confidence interval requires (i) a sample statistic value that has been calculated from the sample data; (ii) a standard error value, also calculated from the sample data (or it may sometimes use a known population standard deviation value); and (iii) a critical test statistic value from an appropriate theoretical probability distribution for the sample statistic that corresponds to the α value inherent in the size of the confidence interval (e.g., if α is set .05, and the appropriate theoretical probability distribution is normal, then Tcrit = 1.96).

What is meant by a consistent estimator?

A consistent estimator is one for which the formula provides an estimate that becomes increasingly closer to the population parameter value as sample size gets increasingly larger. A biased estimator may still be consistent in the sense that the degree of bias gets increasingly smaller as sample size gets increasingly bigger. As an example, the following is obviously a biased estimator of the sample mean because the term 1/n on the right is being added to the usual formula for the mean (wk 2, q5)

How is a contingency table constructed? What are the typical units of observation in a contingency table?

A contingency table is constructed by cross-classifying each category in one variable with all categories in a second variable. Therefore, any contingency table is made up of I x J cells, where I is the number of categories in the variable making up the rows of the table and J is the number of categories in the variable making up the columns of the table. The values in each cell of the contingency table represent the number of people in the sample who are members of the row category and the column category defining that cell. Because the values are "number of people", the values are referred to equivalently by any one of frequencies, frequency counts or just counts.

What is a contrast mean?

A contrast mean is the resultant value of the additive function formed by contrast weights and group sample means. It equals the value obtained with each sample mean is weighted by its contrast value and all resultant weighted means are added up.

How does a raw score differ from a deviation score?

A deviation scores is the value obtained from a raw (i.e., observed) score when some constant value is subtracted from it. The most typical form of a deviation score is when the sample mean is subtracted from each person's raw score. For example, if a person's raw score was +7.1 and the sample mean was 5, then that person's deviation score would be 7.1 - 5 = +2.1. It follows that any deviation score that is positive in value implies that the original raw score is above the sample mean. In contrast, any deviation score that is negative in value corresponds to a raw score than is less than the sample mean. Deviation scores always have a mean of zero when the sample mean was used to create them.

Why can a dichotomous categorical variable be used directly in linear regression whereas a categorical variable containing three of more categories cannot be so used directly?

A dichotomous categorical variable has only two values (typically, e.g., 0 and 1, or 1 and 2, etc). If these dichotomous values (which are arbitrary in how they correspond to each category) are used as scores on an IV directly in a linear regression, then the regression coefficient can still be directly and meaningfully interpreted. This is because a unit change on the IV represents a change from one category to another (e.g., 0 to 1, or 1 to 2) and therefore indicates the expected change in the DV for being in category compared to the other (holding constant scores on all other IVs). When a categorical variable contains three of more categories and these categories are given arbitrary values such as (e.g., {0,1,2} or {1,2,3} for a three category variable) then the resultant regression coefficient value will depend on which categories are assigned to which values—in short, the regression coefficient value is uninterpretable and without meaning because the correspondence between category and value is arbitrary.

How are sample distributions often displayed graphically, and what features of the distribution does each method emphasise?

A distribution of sample scores can be displayed graphically using a histogram or a boxplot (and by a Q-Q plot; but the last will not be explained and compared here). A histogram emphasises the relative frequency of small groupings of scores by the height of the bins used to construct it. It can also indicate symmetry in the frequency of scores around a central point (i.e., bell-shaped look) or skewness in which scores are asymmetrically thinned out to the left (negative skewness) or to the right (positive skewness). A box plot clearly indicates the 25th and 75th percentile values of the distribution of scores (i.e., the lower and upper quartile respectively) by the top and bottom of the box. It also indicates the median value (i.e., the 50th percentile) in the distribution by the line within the box. An equal distance above and below this middle line in the box indicates symmetric distributions (if the distance is unequal, then this indicates skewness in the middle 50% of scores). The two whiskers attached to either end of the box can indicate the spread of scores from the lower and upper quartile to approximately the 1% and 99% percentiles respectively. If they are also of equal length, then this again indicates symmetry in the shape of the distribution.

What is a false rejection error, when can it be made, and how is it related to α?

A false rejection error occurs when a decision is made to reject a null hypothesis when it is actually true. We never know in undertaking a single hypothesis test whether or not we have committed a false rejection error. All that we can ever know is that we would only make this error 5% of the time in the long run if the null hypothesis was true and if α was set to equal .05. It therefore follows that α controls the maximum number of times that a false rejection error can be made in the long run. If the null hypothesis is false, then it is not possible to make a false rejection error. Even in these circumstances α still remains at the value it was set to originally, because it's value is independent of the actual truth or falsity of the null hypothesis (this must be so, because its values was specified before even looking at the data and doing the hypothesis test).

Why might we prefer to use a focused approach to investigating group differences rather than an omnibus approach?

A focused approach will provided a final, unambiguous answer to the research question or research hypothesis being posed. An omnibus approach will at best provide an interim answer that will then require some additional investigation or analysis to come to a final answer about where group differences may reside.

What is a joint cell in a contingency table? What do the subscripts in the general notation fo-ij mean? How is the concept of a joint cell related to this kind of notation?

A joint cell in a contingency table is the cell in the table uniquely defined by one of the row categories and by one of the column categories. Therefore, an I x J contingency table will have I x J joint cells. The subscripts ij in the notation fo-ij mean the joint cell defined by i and j, where i = 1, 2,...,I and j = 1, 2,...,J .

What is meant by a hypothesis test being too liberal, and what would be the implication of this occurring when using a hypothesis test?

A liberal hypothesis test is one in which the actual rate of false rejections of a true null hypothesis is *larger* than the nominal maximum rate set by the α value. The implications of a liberal test are that (i) we will reject a true null hypothesis more often than we should, because (ii) the obtained p value is smaller on average than the value it would actually be if the test was robust.

What do we mean by the term a linear contrast (or, equivalently, a linear combination)?

A linear contrast is a set of weights (i.e., hypothesised values that are positive or negative in sign) that combine the means in a linear fashion (i.e., by the each group mean being multiplied by its corresponding weight and then all weighted means being added together). Linear contrasts have the property that the sum of the complete set of weights equals 0. In notation form, each mean Mj is multiplied by its corresponding comparison weight aj to form the linear combination of means depicted as: ∑ = ajMj = a1M1 + a2M2 ...

What is meant by a main effect in a two-way factorial design?

A main effect in a two way factorial design corresponds to the effects found in the crossclassified table when the grand mean is subtracted from each of the marginal row means and the marginal column means. Main effects for the rows always sum to zero, and main effects for column always sum to zero.

What is meant by a planned comparison of group means, and what is used to construct it?

A planned comparison of group means is reflected by an a priori-focused research question or focused research hypothesis that specifies a particular linear combination of group means making up the comparison. The main feature used to undertake a planned comparison is a set of a priori-defined comparison weights that determine which of the means of the k groups are going to be compared and in what way. Each group mean has its own comparison weight in the planned comparison.

What is a region of rejection in a theoretical probability distribution? How is it related to the alpha criterion and to the obtained p value? What is its role in null hypothesis testing?

A region of rejection in a theoretical probability distribution corresponds to values of the distribution (e.g., Z statistic values for standard normal distribution) in the extremes of the distribution whose probability value equals the specified value for α. That is, the region of rejection is defined by a criterion value (e.g. a Z statistic of ±1.96 when α = .05) that defines the boundary cut point beyond which any observed test statistic value located in that region would result in a null hypothesis being rejected.

What is meant by a robust hypothesis test?

A robust hypothesis test is one in which the rate of false rejection of a true null hypothesis is unaffected by violation of the statistical assumptions of the test, and therefore the actual false rejection rate will remain very close to the upper bound defined by α.

How does a sample differ from a population in psychological research?

A sample differs from a population in that the former is a subset of people selected from the latter. It is feasible that many different samples can be selected from the same population.

How do a sample distribution and population distribution differ? How are they similar?

A sample distribution is the set of values obtained from measuring people on a construct in a single sample drawn from a population; sample statistics are summary characteristics of a sample distribution. A population distribution is the set of values for all people in a population on a construct; population parameters are summary characteristics of a population distribution.

How is a sample related to a population?

A sample is related to a population by the fact that each sample is considered representative of the population in some way due to the way it is selected. We use the sample as a basis for making inferences about relationships among constructs at the population level.

How does a statistic differ from a parameter?

A sample statistic differs from a population parameter in the following three ways: (a) There is only one possible value of parameter, whereas there can be many different values of a statistic. (b) The value of a population parameter is invariably unknown, whereas the value of a sample statistic is always known. (c) The value of a sample statistic will almost certainly be different to the population parameter.

How is a statistic related to a parameter?

A sample statistic is related to a population parameter by (i) the value of the former being an estimate of value of the latter, and (ii) a statistical inference being made about the value of an unknown population parameter from the known value of a single sample statistic.

Briefly explain what a sampling distribution is. In doing so, explain what it comprises, and what is its role in understanding hypothesis testing:

A sampling distribution represents the relative frequency for the set of values that may occur for a sample statistic over a large number of independent random draws of size n from the population. Its role in hypothesis testing is primarily twofold: (i) the standard deviation of the sampling distribution corresponds to the standard error of the sample statistic, which is used to transform the raw sample statistic into a value that is distributed according to some theoretical probability distribution; and (ii) the sampling distribution provides a justification for using a theoretical probability distribution as a basis for undertaking null hypothesis testing.

Explain briefly in words how a semipartial correlation is calculated when the number of IVs in a regression analysis is three, including which variables are involved, how are they involved, and what scores are used in calculating the correlation:

A semipartial correlation is the Pearson correlation between observed scores on the dependent variable in a regression model and residual scores of an independent variable after the overlapping associations of all remaining IVs have been partialled out from the focal independent variable

Distribution

A set of different numerical values on a random variable that has particular observable characteristics i.e shape, height spread, frequency of occurence

Sample

A set of individuals (n) who are selected by some sampling scheme AND assumed to be representative of that population. Denoted by n Selected by random sampling scheme (simple, clustered or stratified) or non-random (convenience sampling, volunteer like REP

Sample statistic

A single numerical value for a particular summary feature that is calculated on scores from one sample. Typically an *estimate* of the PP. Assume only one population parameter Fact that many smaller statistics because may be obtained from a separate sample but also drawn from the same population parameter - therefore SS is a random variable But in practise usually only have one sample statistic available

What is the standard error, and what is the effect of using it as a standardised in the calculation of an observed test statistic?

A standard error is a measure of the variation in a sample statistic. It corresponds to the standard deviation of the sampling distribution for that sample statistic. In null hypothesis testing, the standard error is used to transform the difference between the value of the sample statistic and the null hypothesised value into an observed test statistic that is distributed as a theoretical probability distribution. This standard error is used as the denominator in this transformation, much like the standard deviation in a Z score transformation.

In order to obtain an accurate standardised mean contrast estimate and/or an accurate confidence interval, what assumption(s) needs to be meet in undertaking the analysis?

A standardised mean contrast requires (i) the mean contrast value, and (ii) the square root of the error mean sum of squares. A prerequisite for the calculation of (i) is that the contrast weights have been chosen such that all positive weights sum to +1 and all negative weights sum to -1. If this has been specified, and values for (i) and (ii) have been obtained, then the standardised mean contrast is given by the contrast mean divided by the square root of the error mean sum of squares. This calculation rests on the homogeneity of variance assumption not being violated.

What are two standardised effect size measures that can be derived for the mean difference in the independent samples t test, and how do these two measures differ?

A standardised mean difference in an independent two-group design can be calculated by either *Cohen's d* or *Hedges' g*. These two measures differ by the former using the total sample size N = n1 + n2 in the denominator of the pooled variance calculation, whereas the latter uses n1 + n2 - 2 in the denominator of the pooled variance. In large samples, the difference between the two forms is negligible.

What is meant by the term a standardiser?

A standardiser is a measure of average variation, which is expressed in units *analogous* to a standard deviation, that is used to *transform* a raw difference between means into a standardised difference between means; the transformation is directly analogous to the Z transformation used to calculate standardised scores from row scores.

For two different value of x in the function y =mx + b , what will the corresponding values of y form in a two-dimensional scatterplot?

A straight line, because the two different values of x will result in two different values of y (given that the slope value m and intercept b are the same for both x values).

Mean deviation score

A transformed score that involves subtracting the mean value of each sample participant's raw score Signified by xi or yi. I.e Raw score - mean (xi = Xi - M) Mean of all deviation scores always equal zero *Negative* deviations scores correspond to raw score *below* sample mean (Mx) and *positive* score is *above* the Mx

What should any well-formulated research contain?

A well-formulated research question should contain the following features: (a) It should be grammatically expressed as a question, with a question mark "?" at the end. (b) It should list all relevant constructs to be investigated. (c) It should indicate the population relevant for these constructs and the research context. (d) It should specify the general form of the relationship among constructs to be investigated by use of a term such as "associate", "predict", or "different".

List two advantages in using a confidence interval on R² instead of using a hypothesis test:

Advantages of using a confidence interval on R² instead of a hypothesis test include: (i) the CI can indicate whether a null hypothesis of population R² equalling zero would be rejected if the lower bound of the interval is greater than 0; (ii) the width of the interval provides an immediately interpretable indication of the precision of the interval estimation of population R² (iii) the lower bound being very close to, but not strictly equal to, 0 indicates that the regression model may be explaining a trivial amount of variation in the dependent variable; and (iv) the extent to which the observed R² value is close to the upper bound of the interval is indicative of the observed R² point estimate having significant upward bias in its estimation of the unknown population parameter value.

Family wise error rate

Aka bonefronni inequality!! Less than or equal to per comparison error rate for each comparison Therefore alpha per comparison with error rate is .o5/ j Tells us the upper bound of Fa,ily wise error rate in worst circumstance, ie null hypothesis true of all

Alpha criterion

Also significance level. Decided 'a priori' where we reject the null hypothesises given the null hypothesis is true i.e. a = Pr(rejecting H₀|H₀ = T)

How do an estimator and an estimate differ?

An estimator is a mathematical formula or procedure used to obtain a sample statistic from raw data. The numerical value of a sample statistic is an estimate (of the unknown population parameter value).

What are different, but equivalent, names for referring to an independent samples t test that have exactly the same meaning?

An independent samples t test can equivalently be referred to as either an independent groups t test or a between-subjects two-group test (the last of these is not commonly used, however).

Why can an independent two-group design be analysed as a linear regression model, and what features in the regression model are most relevant to the analysis of group differences?

An independent two-group design can be analysed as a regression model because the two possible scores on the IV in the regression model (i.e., coding that indicates which group a person belongs) can be used to predict peoples' score on the DV, such that the difference between predicted scores on the DV is interpreted as the difference between groups. If the coding is 0/1 on the IV, then the regression coefficient for the IV just represents the expected increase (i.e., difference) on the DV for a unit increase on the IV (i.e., going from a score of 0 to a score of 1), which is the usual interpretation we give to a regression coefficient. Moreover, the regression intercept indicates the expected score on the DV for people who have a score of 0 on the IV (which corresponds to the people who belong to the group coded 0).

Where would interaction effects be observed in the cross-classified table?

An interaction effect is demonstrated by finding non-zero values in the cells of the crossclassified table of means after removing the grand mean and the marginal main effects.

What are the three components (in general terms) making up an observed test statistic used in assessing how many units a sample statistic is from an assumed population parameter value?

An observed test statistic in null hypothesis testing generally comprises: (i) the value of a sample statistic, (ii) a null hypothesised population parameter value (iii) a standard error for the sample statistic.

What is the range of possible values for an odds ratio? What does a value of 1 for an odds ratio signify?

An odds ratio can range in value from 0 to +∞, and it can never be negative (because all frequency counts by definition cannot be negative themselves). A population or sample odds ratio of 1 indicates independence between the row and column variables in a contingency table.

Why might we want to calculate an odds ratio from the sample cell frequencies in a 2 x 2 contingency table? Why might we prefer using it rather than a χ2 null hypothesis test?

An odds ratio provides a direct measure of the strength of the association between the two categorical variables. The chi-square value for null hypothesis test does not provide a direct measure of strength of association (i.e., the size of the chi-square value cannot be used as an indicator of how strongly associated the two variable are, because the chi-square value can be affected by sample size).

What is meant by an omnibus approach (to either or both research questions and statistical tests)?

An omnibus approach is one in which either the research question or the statistical analysis is specified in such a way that each one only leads to interim answers about groups differences that require further research questions or hypothesis testing to arrive at an unambiguous answer. Within statistical testing, an omnibus approach is one characterised by *more than one degree of freedom in the numerator* of the F test in linear regression and in ANOVA (and by more than 1 df in the chi-square test for contingency tables).

What is meant by an orthogonal contrast? How would we identify if two contrasts were orthogonal?

An orthogonal contrast is one in which the cross-product of its weights with the weights of a second contrast sum to zero (which implies that the second contrast is also an orthogonal contrast to the first contrast being considered here). A contrast can only be defined as orthogonal in relation to one other contrast (or, more specifically, the weights of one contrast can only be defined as orthogonal to the weights of second contrast). It follows from this that one set of weights can be orthogonal to a second set of weights (if the cross-products of their respective weights sum to zero) and non-orthogonal to a third set of weights (if the crossproducts of their respective weights do not sum to zero).

How is the standardized regression coefficient in a simple regression model interpreted?

An standardised regression coefficient is the expected (or, equivalently, predicted) change in the dependent variable for a change of one standard deviation (i.e., an increase, or decrease, of one standard deviation/a Z score unit) on the independent variable.

How is the unstandardized partial regression coefficient in a multiple regression model interpreted?

An unstandardised partial regression coefficient is the expected (or, equivalently, predicted) change in the dependent variable for a unit change (i.e., an increase, or decrease, of one unit) on the independent variable., while holding constant (i.e., not changing) scores on all other independent variables in the multiple regression model.

How is the unstandardized regression coefficient in a simple regression model interpreted?

An unstandardised regression coefficient is the expected (or, equivalently, predicted) change in the dependent variable for a unit change (i.e., an increase, or decrease, of one unit) on the independent variable.

Distribution of sample statistics

Any sample statistic is a random variable because different samples from same population will yield different results. Distribution obtained by drawing independent random samples from a population and calculating ss of each. E.g. Mean of each sample can be plotted on distribution - As number of samples increase, increasingly normal sample distribution. - Shape depends on statistic measured, underlying distribution of population If size of each sample is larger, then the *variance* of the sampling distribution is *narrower* - Mean: an unbiased statistic corresponding to PP - Standard deviation in sampling distribution called the standard error

Linear regression

Asymmetric. Y = mx+b Want line drawn such that least amount of residuals (distance between obeserved data points and line imposed) Residuals will change by data point will have diff spdstance ro lien and same datapoint will have diff distances depending on how line drawn Answered by least square estimator

Why would you expect the partial regression coefficients for a set of IVs in a multiple regression analysis to differ from the value of the regression coefficients obtained when each IV is used separately in a set of simple regression analyses?

Because in general IVs correlate with each other, as well as with the DV. The partial regression coefficient in a multiple regression represents the effect of each IV where the overlap with other IVs in predicting the DV has been removed (or partialled out) through using the least squares estimator. When separate simple regression models are used in contrast, the overlap in effects with other IVS cannot be partialled out because no other IV is in the regression model.

Why is the numerator degrees of freedom in an F test for a planned comparison always equal to this value?

Because the linear contrast in the planned comparison essentially ends up defining a two "group" mean difference—i.e., the linear combination of group means with positive comparison weights is compared to the linear combination of group means with negative comparison weights. By definition, a two group comparison has only one degree of freedom in the numerator of the F test (or it can be equivalently expressed as a t test), because there can only be one fundamentally possible way in which a difference can occur between two groups (i.e.,, the difference defined by (Group 1 - Group 2) is not fundamentally different from the difference defined by (Group 2 - Group 1), apart from them being opposite in sign).

Why can linear regression be regarded as an approach to analysing variation in the DVs?

Because the result from linear regression is a decomposition of total variation in the DV into that part being accounted for by the set of IVs (the sum of squares for the regression) and that part *not* accounted for by the set of IVs (i.e., the sum of squared residuals - within groups). If the IVs are dummy variables created to account for a strictly categorical variable being used as a predictor, then the sum of squares for the regression equation indicates the proportion of variation in the DV that is being explained by differences among the categories

What is meant by an efficient estimator?

Because the same estimator will give a different estimated value of the population parameter for different samples drawn from the same population, the set of values of a sample statistic obtained from a large number of repeated samplings will form a distribution. One characteristic of such distribution is the degree of variability in the estimates; i.e., the average amount of deviation in the distribution from the true population parameter. The smaller the amount of average variability, the more efficient the estimator is. A more efficient estimator implies a smaller standard error for the sample statistic (because the standard error measures the variability of a sampling distribution---i.e., standard error is the square root of the variance of a sampling distribution).

What is the difference between a Z score and a Z statistic?

Both Z scores and Z statistics have a mean of 0 and a standard deviation of 1. Both indicate how many standard deviation units any scores is ocated from the mean, and therefore can convey information about the location of any value relative to the mean. Z statistics differ from Z scores in that Z statistics always follow a standard normal distribution, whereas Z scores may not necessarily be normally distributed. Z statistics can therefore be viewed as Z scores that are normally distributed.

What do both the Brown-Forsyth and Welch tests achieve that the usual F test in ANOVA does not achieve in the same circumstances referred to in Q3?

Both the Brown-Forsyth (F*) and Welch (W) tests make a correction to both the observed F test statistic value and to its denominator degrees of freedom such the test statistic value for F* and W result in actual false reject rates of a true omnibus null hypothesis that is much closer to the nominal value set by α when homogeneity of variance is not holding and the design is unbalanced.

Orthogonal polynomial planned contrasts

Breaks down the change into its constituent shapes defined by the order of the polynomial, depends on levels in a factor Can be Linear, quadratic, cubic, quartic A full set would account for SS occasions 4 bends suggest 4th order Assumes homogeneity met because uses SS error Mean contrast Su,s of squares should add to SS within Can be positive or negative as a function of how weights are ordered by size of means, eventually evens out

How do we decide when a SS is "far enough away" from an assumed population value to infer that the null hypothesis assumption is unlikely to be correct?

By declaring regions in the tails of a probability distribution that corresponds to a sampling distribution. If the p value falls within that tail, it is sufficiently far away to make the null hypothesis unlikely and more likely to be some other value.

Expression of Relative Frequency over 'the long run'

By repeating a process over and over again, leading to uncertainty about the possible outcome causes the relative frequency of each outcome to reflect the probability - underlies *frequentist* statistics (most commonly used)

Robust confidence interval

Ci on sample stat is robust if actual coverage matches the nominal coverage i.e., if the interval is set as 95%, the actual correct coverage rate—the proportion of times the interval contains the actual population parameter value—will be 95% of the time over repeated constructions of the interval using a large number of independent samples drawn from the population If more that 95% is too liberal If less than 95%

What is the metric being used in effect size measures like Hedges' g or Dunlap's d or Bonnet's delta?

Cohen's d, Hedges' g, Dunlap's d, and Bonnet's delta are all measures of a difference in group means that is scaled in standard deviation units. Dunlaps d and Bonnet's d are specifically used for matched groups Hedge's g (and cohen's d) are used for individual change differences

Population distribution of individual scores

Complete set of individuals in a construct measured for entire population. Main characteristics of population distribution are summarised by population parameter values. Assume population scores follow a probability distribution. Typically we assume individual variability is quantitative i.e people differ in degree Will either be *continuous* variability (quantitative) on normal distribution bell curve OR *categorical* difference on a chi square (positively skewed) - differ in kind or type: (observed- expected) squared / expected

Continuous examples and categorical examples

Continuous --> age, summed/overall scores on questionaires, IQ scores - No distinction between integers and real numbers, may not be strictly interval in scaling Categorical --> religions, football teams, gender, grades - No distinction between nominal or ordinal scaling - Numbers assigned to categories are arbitrary in meaning

What are contrasts weights used for in planned comparisons?

Contrasts weights are either positive, negative or zero values that are multiplied into the sample means of groups (with each group mean having its own contrast weight). The set of contrast weights and sample means form an additive function that effectively compares the means of groups with positive contrast values to the means of the groups with negative contrast value. Any group mean with a zero contrast weight is effectively removed from the additive function.

What is a Cook's d statistic useful for? How many values of a Cook's d statistic will a linear regression model typically contain?

Cook's d statistic is used to identify cases in your sample data which are aberrant in their values on the DV and IVs relative to the scores of other members of the sample, which may substantial change the results of the regression model (i.e., the R2 value or more particularly one or more of the partial regression coefficient values) as a consequence of the very different pattern of scores compared to most other cases in the sample data. This effect is likely to be more noticeable in small sample sizes than in large ones. There will be as many Cook's d values as there are cases in the sample data (i.e., number of Cook's d values = n), with each person in the sample having one Cook's d value calculated on them.

How are a covariance, a correlation, and a variance similar to each other?

Covariance, correlation, and variance are all similar to each other in that (i) they are calculated using the summation of products of deviation scores of one form or other, and (ii) they are an average measure of covariability (i.e., covariance and correlation) or variability (i.e., variance).

Odds and Odds ratio calculation

Cross multiplication of the ratios from respective odds: i.e. (9/1)/(1/4) = (9x4)/(1x1) = 36 Black cavier is 36 times more likely to win the race than Hoof Hearted (horses)

How does a deviation score differ from a Z score?

Deviation scores have a mean of zero, but their standard deviation will equal whatever the standard deviation of the sample raw scores (i.e., creating deviation scores does not change the average amount of variability in the scores themselves). Z scores, on the other hand, can be thought of as standardized deviation scores, because Z scores also have a mean of zero but they additionally have a standard deviation equal to one. This is because Z scores are calculated as deviation scores that have each been divided by the standard deviation. Deviation scores tell us whether a value is above or below the mean, but they do not tell us how far any value is from the mean. Z scores, in comparison, tell us (i) whether a person's value is above or below the mean and (ii) how far that person's score is from the mean (because Z scores are measured in standard deviation units). Both deviation scores and Z scores can be regarded as ways of applying a transformation to rows scores.

What are the three fundamental ways that differences between three or more groups can be investigated in one-way ANOVA?

Differences between means on one-way ANOVA can be investigated by: (i) undertaking an *omnibus null hypothesis* for all populations means considered together; OR (ii) proposing a priori defined, *focused null hypotheses* concerning specifically *chosen subsets of levels of a factor* that accord with the research rationale; OR (iii) investigating *differences between means* of all possible non-duplicated pairs of groups in a *post hoc* fashion.

Sample distribution

Different numerical values of the observed scores in a sample. Main characteristic summarised by distribution of SS e.g. Shape, height, spread, frequency of occurring etc) Often displayed graphically (histogram, qq plot) Will not contain all possibly relevant individuals Scores from different samples will have different distributions

Theoretical probability distributions

Distribution of values for a sample statistic defined by a mathematical function involving one or more parameters. Indicates the probability of observing a value for a sample statistic for specific parameter values Sampling distributions in standardised form = theoretical probability distributions (given assumed population parameter, the location if this sample statistic relative to all possible sample statistics) we use TPD instead though so we don't need to have heaps of samples and form a distribution, we can use TPD as a proxy. Process of standardising sample distribution v similar to z scores - subtract population parameter from sample spastic to get near zero, then divide by standard error

How are dummy variables formed from a categorical variable so that they former can be used in linear regression? Write out the required stages based on the explanation provided in the lecture:

Dummy variables are formed by: (i) identifying the number of discrete categories in the categorical variable (e.g., let's use 4 categories to make the explanation more clear); (ii) choosing (arbitrarily) one category out of the four to be the reference category for the set of dummy variables (iii) defining one dummy variable with values of 0 and 1 for each of the three remaining categories left over after the reference category has been defined (iv) assigning the value of 1 in each dummy variable in turn to one of the three categories remaining after the reference category has been excluded, otherwise the value of 0 is assigned. Each case with a value of 1 on the dummy variables therefore identifies a different category from the original categorical variable. In general, if a categorical variable has k categories, then k - 1 dummy variables need to be created.

What are the properties of the weights used in any planned comparison?

Each additive function of group means needs to be translated into a two sets of linear contrast weights, one set being positive in value and the other set being negative value, such that the sum of the positive sets of contrast weights equals the absolute sum of negative weights (and therefore the sum of all the linear contrast weights is zero). The set of k weights [a1 a2 a3 ... ak] in a planned comparison are linear contrasts that sum to zero. This property enables us to develop a meaningful way to translate research questions into a statistical analysis.

Epsilon values

Equal 1 is spherecity is being met Lowest possible value is 1/k-1 Greenhouse Geisel lst conservative use this if in doubt Adjust the df for the f test so will be more robust if sohereicty not assumed - eg ê x k-1 for df 1 and ê x (n-1)(k-1)

test retest reliability

Estimate reliability by obtaining scores from the original test and a retest and working out he correlation between them. Problems = carryover effects e.g participants googled test answers b/w test and retest; bored during second test so slack off - participants fail to return during retest -works well for stable traits not transitionary traits

Parallel form reliability

Estimate reliability by obtaining scores from two parallel forms of the test and work out the the correlation between them - parallel forms must : measure the same set of true scores and have equal variance - practical (no need to worry about memory for specific items) problems = hard to know whether really parallel and might not fix carryover effects

split half reliability

Estimate reliability by splitting your test into two parallel subtests and work out the correlation b/w the two subtests Problems = assumption that two subtests are parallel is questionable the fact that each subtests is half the size of the main test will tend to deflate the reliability estimate

Assumptions of Classical Test Theiry

Expected error value is zero Errors do not correlate with each other or with true scores Expected value of the test is equal to the true score

How do expected frequencies differ from observed frequencies?

Expected frequencies differ from observed frequencies in that the latter count the number of people who are measured in each of the cells in a contingency table defined by joint categories from the two variables making up the table. The former, in contrast are calculated from the observed marginal row and column frequencies under the assumption of independence between row and column variables.

Persons correlation formula

Extent to score vary together. Standardise (opposed to co variances mean dev scores) Sum(Zx)(Zy)/n-1

Planned comparison

Focused statement about difference dined a priori. Must be carried out in spss with contrast weights Benefits because can focus on one or more pre defined sets of group means Takes into account design of study and hypothesis through distributions of weights

Omnibus f test

Global hypothesi that there's some difference without specifying where. weak and uni formative Equivalent to f gets in mult linear regression Tested by Tobs = ms between/ms within

What are three ways in which group differences can be defined and identified?

Group differences can be specified in terms of (i) *between-subjects differences*, whereby people are either allocated or belong to only *one* of the groups being defined by the factor; (ii) *withinsubjects differences*, whereby people are either being *repeatedly measured* at different time points or individuals are being paired with one or more other individuals according to some shared characteristic or feature; or (iii) mixed = both between-subjects and within-subjects differences being investigated in the same research design, whereby group membership is according to the methods identified in (i) and (ii) previously.

How is heteroscedasticity identified in a linear regression analysis?

Heteroscedasticity can be identified in linear regression by a scatterplot of studentised deleted residuals (Y-axis) against standardised predicted values of the DV (X-axis). If the range of variability of the residuals in a vertical direction changes as we go from lower predicted values to higher predicted values (i.e., by examining the variability of residuals going from left to right along the X axis), then this indicates the presence of heteroscedasticity. We need to be concerned about the presence of heteroscedasticity only when it is systematic and very obvious from the scatterplot. Most real data will either (i) display minor levels of it, or (ii) have insufficient sample size to clearly identify its presence, especially at the extremes of predictive values on the far left and far right of the X axis.

What is meant by the term "heteroscedasticity"?

Heteroscedasticity means that the residuals from the regression model do not have the same variance for different predicted values of the dependent variable. The strong and obvious presence of heteroscedasticity represents a threat to making valid inferences from a regression model.

Properties of odds ratios

I'd odds ratio is 1 then independent with no relationship (Ci co reign this can't reject NHST) further from 1 stronger the assoc Can go to zero but not below If any cell freq is 0 is undefined Symmetric relationship Calc Ci on exci or spas Of or is less than 1 can take reciprocal by inverting rows or explain effect it in terms of odds being lowered

If the confidence interval in Q21 directly above was calculated on a larger sample, and the sample statistic was the same value, what one feature of the new interval do we immediately know will result without any further information being required? Why will that occur? [HINT: THIS QUESTION ALSO RELIES ON HAVING DONE THE LECTURE PROBLEM SET FOR THIS WEEK'S LECTURE).

If a larger sample size had been used and a 95% confidence interval was calculated on the same value of the sample statistic, then the width of the confidence interval would be more narrow (all other things being equal). That is, the lower bound of a 95% interval using a larger n would be a value greater than 23, and the upper bound of the 95% interval would be a value less than 37. This is because, all other things being equal, a larger sample size will mean that the standard error is smaller; and therefore a smaller standard error will result in a more narrow confidence interval.

If a hypothesis test finds that a sample R² value is not significantly different from zero when α is set at .05, what would be the lower bound value of a 95% confidence interval?

If a null hypothesis test on R2 is not rejected when the null hypothesized value is 0, then we can immediately know that the lower bound of a 95% confidence interval would be equal to 0.

What do the values of the contrast sum of squares from polynomial planned comparisons equal when summed up? Why does this always occur?

If a within-subjects factor contains k levels, and a set of k - 1 orthogonal polynomial contrasts are used, the additive sum of the k - 1 contrast sum of squares will equal the between-occasions sum of squares. This implies that all variation being explained by change of time is being accounted for by the k - 1 polynomial contrasts.

What do the values of the error sum of squares from polynomial planned comparisons equal?

If a within-subjects factor contains k levels, and a set of k - 1 orthogonal polynomial contrasts are used, the additive sum of the k - 1 error sum of squares for the k - 1 planned contrasts will equal the subject-by occasions interaction sum of squares (i.e., the error sum of squares in the ANOVA table). This implies that all error variation used to assess omnibus change over time is also being accounted for by the k - 1 polynomial contrasts.

What are the two different ways in which a standardised mean difference in dependent samples can be conceptualised, and how would researchers know which one to use in which context?

If researchers are investigating differences over time using a dependent samples design, then their primary interest in forming a standardised mean difference is to measure *average individual change over time* in a standardised metric. In contrast, if researchers are examining differences between two *matched groups* using a dependent samples design, then their primary interests in forming a standardised mean difference is to measure the *average difference between the two groups* in a standardised metric. Standardised mean differences for individual change and for group differences use a different kind of standardiser in the calculation. The former uses the *standard deviation of the difference scores* (change over time), whereas the latter uses either the *standard deviation of one group* viewed as a "control" or the *pooled standard deviation of both* groups. It follows that individual change takes into account the strength of correlation between the two time points whereas group differences do not take the correlation into account.

Summary of robust hypothesis testing

If the *actual* proportion of false rejections of true null hypothesis remains the same as *nominal* value set by the priori alpha value, when statistical assumptions are not met, then the null hypothesis test is robust will we falsely reject a TRUE null hypothesis only 5% of the time over many replications of the same test when an assumption is violated? If the actual number of false rejections is: 1) too large: false rejection is too liberal so: P value is smaller than expected value (p < a) --> More likely to reject null hypothesis 2) too small: the test is too conservative so: the obtained p value is larger than its expected value (p > a) - Less likely to reject a true null hypothesis than the proportion defined by alpha

What would we infer about the plausible population value of the regression coefficient referred to above?

If the 95% confidence interval does not contain 0, then we can immediately infer that a zero regression coefficient is not a plausible population value for the corresponding independent variable. This can be done because the values contained within a confidence interval indicate the set of null hypothesised values that would not be rejected if any one of those values was defined as the null hypothesised population value in a null hypothesis test).

If a sample covariance matrix is not derived from a population matrix demonstrating sphericity, what will be the effect on the observed F test statistic from the usual ANOVA table derived from the decomposition of total sum of squares?

If the assumption of sphericity is not being met by a sample covariance matrix then the observed F test from the usual ANOVA Table summarising the decomposition of the total sum of squares will not be robust to this violation and it will become too liberal (i.e., the observed F value will be falsely rejected more times than it should be given the nominal α value being set for the null hypothesis test).

If the confidence interval in Q21 directly above was changed to a 99% interval, what one feature of the new interval do we immediately know will result without any further information being required? Why is will that occur? [HINT: THIS QUESTION RELIES ON HAVING DONE THE LECTURE PROBLEM SET FOR THIS WEEK'S LECTURE).

If the confidence interval had been set at 99% instead of 95% for this sample statistic, then the width of the interval would automatically be larger than the 95% width. That is, the lower bound of the 99% interval would be a value smaller than 23, and the upper bound of the 99% interval would be a value larger than 37. This is because the width of a confidence interval placed around a sample statistic gets wider as the level of confidence being sought gets bigger.

If the full set of interaction contrasts are all orthogonal to each other and the design is balanced, what do we immediately know about the sum of the contrast sum of squares for each main effect and for the interaction effect?

If the design is balanced (i.e., same sample size in all cells of the design), and the set of interaction contrasts are orthogonal, then the omnibus sum of squares for the interaction effects in the ANOVA table can be completely decomposed into independent, non-overlapping sums of squares for the contrasts such that their sum total will equal the omnibus sum of squares for the interaction. Therefore, the set of interaction contrasts are completely accounting for all differences among levels of both factors over and above that part being explained by omnibus sum of squares for the two main effects.

List three features of the research design and/or results of the linear regression analysis that would affect how far towards 1 the upper bound of the confidence interval above would be?

If the lower bound of a confidence interval was 0, then the degree to which the upper bound extends towards 1 is an indication of the precision of the confidence interval—lower precision (or, equivalently, greater imprecision) is shown by the upper bound being closer towards 1, all other things being equal. Therefore, the precision of confidence interval on R² is affected by (i) sample size (with smaller n resulting in a wider interval, all other things being equal), (ii) the number of independent variables (with a larger number of IVs resulting a wider interval, all other things being equal), and (iii) the value of sample R² itself (with a smaller sample value also resulting in a wider interval, all other things being equal).

What is the difference between the terms odds in favour and odds against? How can one be converted into the other?

If the probability of some event occurring is given by P, the odds in favour of that event are given by P / (1 - P), and the odds against that event occurring are given by (1 - P)/P. We can therefore always convert odds in favour into odds against by taking the reciprocal (or inverse) of the former; i.e., by taking 1 / (P / (1 - P)) = (1 - P) / P; and convert odds against into odds in favour by again taking the reciprocal of the value being converted...i.e., 1/((1 - P) / P) = P / (1 - P).

What is the meaning of the terms odds?

If the probability of some event occurring is given by P, then the probability of that event not occurring must be equal to 1 - P, because the probability of an event occurring plus an event not occurring must always equal 1. The ratio of the probability of event occurring to it not occurring is given by P / (1 - P), and this ratio is called the odds. Odds are therefore a different way of expressing the probability of any event occurring.

How can we immediately identify if a 95% confidence interval on an unstandardised partial regression coefficient contains zero?

If the sign of the lower and upper bound values are either both negative or both positive, then we immediately know that the confidence interval does not capture 0. If the lower bound value is negative (positive) and the upper bound value is positive (negative) then the change in sign immediately alerts us to 0 being within the confidence interval.

If the ANOVA table from a two-way ANOVA was compared to ANOVA tables from two separate one-way ANOVAs, what features would be the same?

If the two separate one-way ANOVAs were investigating differences among means of the two respective factors (i.e., analysing the marginal means for the row and column factors of the two-way table), then the only features that would the same in all three analyses would be the following. First, the total sum of squares for the DV (assuming that the same set of responses on the DV are being analysed in all instances), and second, the respective numerator degrees of freedom for each factor. If a one-way ANOVA used all the crossclassified cell means from the two-way ANOVA, then the within sum of squares in the oneway analyses would be equal to the corresponding value in the two ANOVA as well.

How is the confidence level for a confidence interval related to the alpha criterion in NHST?

If α is set at some defined value, then this defines a chosen level of confidence for a confidence interval according to the relation (1-α)x100%. It follows that, if α is set at .05, then the corresponding confidence interval is set at (1-.05)x100% = 95%.

What is meant in general by the term degree of freedom, when applied to sample data?

In general, degrees of freedom in reference to sample data refers to the number of sample values on a variable that can freely vary in value, given that the value of a particular sample statistic obtained from all sample scores. For example, if there are 10 observed scores, and the sample mean value is calculated and therefore known, then only 9 of those scores are free to vary in value—once the value of those 9 scores are chosen, then the value of the final score is determined by those nine values and the known value of the sample. It does not matter which particular nine scores are chosen, the restriction that is placed on the 10th and final score remains the same.

What is meant by the assumption of independence of observations in statistical testing? Illustrate you answer using different examples of the way in which this assumption can be violated:

Independence of observations means that (i) each person's sample score can only be counted as occurring once in the sample for each variable, and (ii) that the score has been not occurred as a result of any restriction in scores on other variables occurring in their measurement or from the design of the study. Observations would not be independent if, e.g., a person's score was used twice in sample of scores because of duplication, or if people only responded to one variable according to having given a particular response to a second variable that is related to the first.

Independent verse dependent groups

Independent groups aka between subject/designs/samples Independent samples Dependent groups aka within-groups/designs/samples or dependent samples, repeated measures or paired groups

Assumptions of dependent sample t test

Independent obs in each group Difference scores normally distributed (robust if over 30, investigated using std mean diff

Stat assutioms of one way bs anova

Independent observations Normal distribution of scores robust if moderate and over 30 Equal variance within the factor - if equal sample, size then both robust, if hetero and unbalanced the small sample size with late variance liberal etc,

Assumptions of Ind samples T test

Independent observations (each person in only one group) Normal dist Variances are equal homogenous

Odds ratio

Indicates the odds of one category occurring in one variable relative to the odds of a second and different category occurring in an another variable Odds = a probability measure of some event occurring compared to the probability of that event not occurring - an uncertainty expressed as a probability

Obtained p value

Is a conditional probability - probability of a given Pr(aIb) The obtained p value tells us nothing about the probability that the null hypothesis is either true or false. It is assumed but never known until after the test.

Eplisons and f stats

Is gg estimate is under 1 then the df for adjusted f stat will be smaller than sphercity assumed Teh sample episode me blue for gg and hf mit.iplied into numerator and denominator of ratio of means sum of squares forming f statistic. They cancel out so gg adjusted will equal spherecity assumed f stat

If the sample size differs for the two groups, what effect (if any) does this have on the independent samples t test calculation?

It does not have any effect on the degrees of freedom, because they are given by n1 + n2 - 2 and it has no effect if the variances of the two groups are the same (or very close to being the same). However, it can have an indirect effect if the variances of the two groups differ, especially if this difference is large, in that this occurrence can adversely affect the accuracy of the null hypothesis test when the sample sizes of the two groups are also different. If the sample size for each group is the same, then the effect of difference variances is much less (if not negligible).

How many different sets of orthogonal contrasts can be constructed for a factor containing k levels, and what is common to all these different sets?

It is always possible to specify more than one set of k - 1 orthogonal contrasts for a factor with k levels, such that sum of squares SSContrast for the k - 1 orthogonal contrasts within each set will always sum to the SSbetween. The actual values of the k - 1 SSContrast across different sets will differ for the same set of group means according to the value of the contrast weights in each orthogonal set. If, e.g., k = 3, then the following two sets of weights each contain k - 1 = 2

Why is linear regression said to reflect an asymmetric relationship among variables?

Linear regression represents an asymmetric form of relationship between construct measures because not all variables have the same functional form and role in the analysis. In linear regression, one variable is identified and specified as being in a dependent relationship with the other variables, each of whom in the latter is called an independent variable in the analysis. The basis for specifying one measure as the dependent variable and allowing all other measures to be independent variables is determined by the research question being asked. If a different measure is chosen to be the dependent variable (e.g., one of the existing independent variables becomes the dependent variable, and the existing dependent variable swaps places to become the replacement independent variable), then the outcome of the analysis will be different.

F statistic for r squared

MS reg / MS resid Larger r squared bigger f statistics because as r squared bigger p value smaller which only happens with large f stat

Construct measure

Method of measuring people on a construct, used to obtain a construct score for which there is reliability and validity E.g. Tests, questionnaires, coding of clinical symptoms in interview, observation of behaviour, reaction time, psychophysiological observations eg MRI

What way can non-linearity be identified in linear regression?

Non-linearity in linear regression can be identified through a scatterplot of studentised deleted residuals versus standardised predicted values in which the data points display an obvious systematic U-shaped pattern (or inverted U-shaped pattern), or any other pattern that displays a systematic change in residual values along the X-axis.

What are two ways that non-normality of residuals can be identified in linear regression?

Non-normality of residuals can be identified in linear regression by (i) a histogram of the residuals, (ii) a Q-Q plot of the residuals, or (iii) examining the scatterplot of residuals versus predicted values and observed a larger proportion of data points either below or above an imagined horizontal line drawn from 0 on the Y-axis).

Statistical assumptions of regression

Normally distributed residuals (doesn't matter if IVS nt nor,al so long as their residuals are) robust if minor Homoscedastic residuals: residual scores are the same regardless of score on the dv (which is linear combo of IVS) can't check on bar plot , large studentized residual values or cooks statistics) Independent scores Linear relationships (scatter plot of residuals and predicted value) Error free measurement

NHST in regression

Null hypo is that rho equals 0 no correlation T obs MS Reg /MS resid MS is SS divided by appropriate df Df for reg = no of IVS Df for resid = n - no of IVS - 1 Corresponds to f statistics

Null hypothesis and alternative hypothesis

Null hypothesis: a focused statement about the value of an unknown PP that is assumed and is wished to be nullified by finding evidence against (using the NHST) i.e. μ=100 Alternative hypothesis: an unfocused statement about the possible values of an unknown PP that excludes the null hypothesis value i.e. μ not equal to 100

Confidence interval calculation using t crit

Observed score +/- (t crit (1.96) x standard error)

Observed test statistics

Observed test statistic = (sample statistic - assumed population parameter) / standard error Signified by Tobs A Sample statistic transformed by using the standard error into the standardised metric corresponding to a theoretical mathematical probability distribution. It is used in NHST and confidence intervals. Similar form of equation to z score calculation

Odds calculations

Odds:* p/(1-p), where p = the probability of one event occurring out of all possible relevant events e.g. the probability of a horse winning is 0.2 probability of not winning = 1-0.2 = 0.8 so 0.2/0.8= 1:4 chance ON winning but 4:1 chance AGAINST winning (not in horses favour) *Odd ratio* = a x d/ b x c

In what circumstances does an unbalanced one-way ANOVA design have an F test statistic that is not robust?

One-way ANOVA with an unbalanced design, together with lack of homogeneity of variances for the different levels of the factor, can combine to make the usual F test statistic either too liberal or too conservative (depending on the mix of sample size and size of variances).

In what ways does a one-way within-subjects ANOVA design differ from a one-way between-subjects ANOVA design?

One-way within-subjects design differ from one-way between-subjects design in the following ways: (i) The design is always balanced because values on the DV are obtained either by repeated measurements of the same person or by matching people across three or more levels of the factor (e.g., taking measurements on triplets); (ii) a dependency exists between scores on the different levels of the within-subjects factor arising from the structure of the design, and this imparts a correlation between scores on the different levels; (iii) people in the betweensubjects design can belong to only one level of the factor, whereas the same people (or matched cases) below to all levels of the within-subject factor; (iv) the error sums of squares in the two designs differ in terms of how they are calculated; (v) within-subjects design involve a decomposition of the within-subjects sum of squares whereas this sum of squares is not decomposed in between-subjects ANOVA; (vi) the between-subjects sum of squares in a within-subjects design is excluded from any consideration after it has been calculated, whereas it is the primary focus of explaining differences between groups in a between-subjects analysis.

Why are orthogonal polynomial contrasts often used to form a set of planned comparisons in within-subjects ANOVA?

Orthogonal polynomial contrasts are frequently used for planned contrasts of a within-subject factor because, if the factor represents repeated measurements of people over a number of occasions, then the polynomial method allows the between-occasions sum of squares to be decomposed into an additive set of polynomial effects that account for linear change over time, plus different orders of non-linear change (e.g., quadratic, cubic, quartic, etc).

What kind(s) of analysis immediately comes to mind when we talk of prediction of one variable by other variables?

Prediction immediately brings to mind the notion of regression of some form, most notably and commonly linear regression which involves the prediction of scores on a continuous dependent variable by one or more independent variables (the latter of which can be either continuous or categorical).

Expression of Belief/Reasonable expectation

Probability is the plausibility i.e. belief, about each possible outcome of some process, which may be changed after gathering further evidence about it i.e. belief that the chance of getting a heads from tossing a biased coin is 60% - undelies *bayesian* statistics

Per comparison alpha

Probability of rejecting a Nh given its true for a single comparison amongst a set of all comparisons

Describe the progressive transformation from raw score to z scores to another standardised score

Raw score to deviation score: xi = Xi - M Deviation to Z score: Z = (Xi - M)/SD Z score to another Standardised score IQ = 100 + (Z x 15) - converts Z score into standardised IQ score with population mean of 100 and population variance of 15

Ind sample t test as regression model

Reg intercept is sample mean of no group Reg coefficent is mean difference Observed f stat is squared t test result R squared predicts variation

Semi partial correlation

Regresses the focal IV onto he remaining IVS - that correlation is associated with the dv Shown as part in spps Pearson correlation between scores on do ns that part of scores on an IV not accounted for but remaining IVS in regression model

Cross classified contingency table

Relative frequency of group members in two or more categories - if group membership varies she statically they are dead met or contingent

Research vs. Statistical hypothesis vs. Statistical alternative hypothesis

Research: what we expect to find out about hypothesised relationship between variables/psychological constructs Statistical null hypothesises: focused statement about the value of unknown population parameter that researchers aim to *nullify* by applying a statistical significance test to sample statistic Statistical alternative hypothesis: Unfocused statement about the possible values of an unknown population parameter that excludes the null hypothesised value.

SS in mixed anova

SS total SS between (between and within (between error) SS within (between occasions, groups occasion interaction, individual occasion interaction)

Compound symmetry

SUFFICENT not necessary condition of siege ricoto Same value for variance and same value for off diagonal co variance At population level, sample likely to not be exact

Research design for dependent group t test

Sample size always balances Mean diff time 1- time 2 Standard error uses variance of ind scores T obs = md-md/smd Sample variance depends on how strongly scores correlate, if strongly the SDm will be small D also rep as SD = s1^2 + s2^2 - 2 r s1 s2

Sampling distributions of the mean difference

Sampling distributions are formulated by recognising: - Two groups - Observations between the two groups - Means in each of the two groups - Difference between the sample means obtained - Replication of these group differences to obtain a sampling distribution If the sample means is unlikely to occur (no difference), there is a chance to represent a mean difference i.e. Through sample distributions (repetition), confidence intervals and hypothesis testing

Observed t score

Score - assumed population parameter/ standard error Compared SD of each score to PP

Construct score

Score on construct measure typically a numerical value assigned to individual. Scaling units are arbitrary i.e does not include a true zero and is not strictly interval. A score of zero on neuroticism doesn't mean a complete absence of neuroticism. Eg. Summed total score on neuroticism questionairre Milliseconds reaction time Frequency count of errors

Regression coefficient

Slope function b Partial regression coefficient bj in my Tople regression model

Types of residuals

Standardised residuals - z transformed raw residual scores M = 0 and SD = 1 Studentized residuals - each resid David ed by standard error studentized deleted residuals - successively fit reg model to data when one data point excluded at a time. Difference between the score and score minus its include is deleted residual and then divide. Y standard error

Statistical vs scientific inference

Statistic: process of drawing a principled but **not definitive conclusion about the value of an unknown population from known sample. Mainly through NHST and CI Scientific: process of making rational conclusions about the proven truth of theory based on empirical research. May involve statistical inferences but takes into account theory, evidence, assumptions etc Former does not prove the latter

Summary feature

Summary feature of a mean we use to make inference about the corresponding population parameter . Not to explain single result but the entire sample= summary feature. In a sample- sample statistic In a population - population parameter E.g. of summary features = Mean, group difference, regression coefficient, correlation

Implied research design of group difference

T test, ANOVA or linear contrast

How are sample estimates of epsilon used in omnibus null hypothesis testing in a one-way within-subjects ANOVA?

The Greenhouse-Geisser and Huyhn-Feldt estimates of epsilon are used to adjust the numerator and denominator degrees of freedom for F test used to assess the omnibus null hypothesis for the within-subjects factor. Because both estimates are less than 1 when sphericity has been violated, the correction means that the Greenhouse-Geisser and Huyhn-Feldt adjusted F statistics have fewer degrees of freedom in their numerator and denominator than the corresponding sphericity-assumed values.

What do samples estimates of epsilon like Greenhouse-Geisser and Huyhn-Feldt measure? What would be their population values if a population covariance matrix demonstrated sphericity?

The Greenhouse-Geisser and Huyhn-Feldt estimators of epsilon measure the extent to which a covariance matrix departs from sphericity. If sphericity was being strictly met by a population covariance matrix, then population epsilon would equal 1 (which is its maximum value). The greater the difference from 1 towards 0, the greater the departure from sphericity. The minimum value for epsilon for any within0subjects factor is given by 1/(k - 1), where k equals the number of levels of the within-subjects factor.

How might a set of orthogonal contrasts be related to the omnibus F test in an ANOVA table, and how many contrasts are required in the set for this relationship to be considered complete?

The SSbetween used in the omnibus F test, with k degrees of freedom in the numerator, can be additively decomposed into a set of k - 1 orthogonal contrasts, such that the sum of squares for the k - 1 contrasts will sum to the between sum of squares;

In a plot of residual versus predicted values from a linear regression analysis, which variable (usually) corresponds to the Y-axis and which one is placed on the X-axis?

The Y axis is specified to be studentised deleted residuals (which have a mean of zero and a standard deviation approximately equal to one), and the X-axis is defined as standardised predicted values on the dependent variable (standardised predicted values are used because it enables the deviation of any predicted value from the mean in standard deviation units to be easily identified).

What is an alternative hypothesis? How is it related to a null hypothesis? Can a null hypothesised value ever be used as the value of an alternative hypothesis? If so, what restrictions might apply to the value?

The alternative hypothesis is a value (or range of values) for the unknown population parameter that is proposed as a replacement for the null hypothesised value when the latter is rejected in using a null hypothesis test. It invariably is set to be the negated value of the null hypothesised value: e.g., if H0 = 0, then Ha 0 (where Ha signifies the alternative hypothesis). It is, however, also quite acceptable to set the alternative hypothesis to equal a specific competing value to the null hypothesised one, such as Ha = 3 (if desired). Any null hypothesised value in one null hypothesis test may be used as an alternative hypothesised value in a second null hypothesis test (if desired). The only restriction on specifying H0 and Ha in a single null hypothesis test is that the value in H0 is mutually exclusive of the value(s) used in Ha (and vice versa).

What is the advantage of always using Bonett's delta and its associated confidence interval when estimating a standardised mean contrast in a within-subjects design?

The available evidence currently for the robustness of standardised mean contrasts in within-subjects ANOVA is that Bonett's delta is more robust to violation of the sphericity assumption than is the pooled estimator based on the error mean sum of squares. Therefore, because the sphericity assumption is frequently violated in practice in within-subject designs, Bonett's delta provides a better general option for calculating a standardised mean contrast estimate and its corresponding confidence interval.

What is the primary objective of one-way ANOVA that is in common with linear regression?

The common thread between the objectives of one-way ANOVA and linear regression is to maximise the amount of variation in the dependent variable being accounted for by scores on one or more independent variables. In linear regression, this is achieved through application of *the method of least squares* as the estimator. In ANOVA, this is also achieved by *squaring the deviation of groups means* from the overall *grand mean*, summing up these squared deviations, and *multiplying by the sample size in the groups*. While it may not look like it, this formula is equivalent to applying the least squares estimator in ANOVA.

What does the term partial mean when we speak of a partial regression coefficient?

The concept of partial when referred to a regression coefficient implies (i) that the overlapping effects of other independent variables in the regression model on the dependent variable that are shared between the independent variables have been removed (i.e., partialled out). By doing so, each independent variable's effect on the dependent variable is separate from all other independent variables in the model. A related notion underlying partial is that the regression coefficient represents that part of the independent variable, which is uniquely related to dependent variable. It should follow that the independent effect of on variable on the dependent variable will depend in general on what other independent variables are also in the model.

How are the value of observed test statistic and the critical test statistic value related, and how do they differ? How is the obtained p value related to the observed test statistic value?

The critical test statistic is the value from a theoretical probability distribution (e.g., the standard normal distribution) that corresponds to a getting probability equal to the defined alpha value used in the null hypothesis test. For instance, if alpha = .05 and the relevant theoretical probability is the standard normal distribution then Tcrit = ± 1.96 . It is related to Tobs by it being the criterion value against which Tobs is compared in performing the null hypothesis test. If the absolute size of Tobs is greater than the absolute size of Tcrit, then the null hypothesised value is rejected; otherwise, it is not rejected. The p value is the probability value obtained from a theoretical probability distribution for a particular value of Tobs. For instance, if Tobs = 1.64 and the relevant theoretical probability distribution is again the standard normal distribution, then p = 0.10.

Describe in words how the decomposition of total sum of squares of the dependent variable is undertaken in a one-way within-subjects design:

The decomposition of the total sum of squares in a within-subjects ANOVA initially derives two main sources of variation: a between-subject sum of squares and a within-subject sum of squares. The latter source is then further decomposed into a between-occasions sum of squares and subject-by-occasion interaction sum of squares (which is equivalent to the error sum of squares in a within-subject design). The between-subjects sum of squares in the first step is not considered further once it has been accounted for.

How in general are degrees of freedom calculated for contingency table?

The degrees of freedom for a contingency table equal (I ‒ 1) x (J ‒1) in value (where I = the number of rows and J = the number of columns). They represent the number of cells for which the frequency count is free to vary in value, given the constraints that the values for the sample size and the row and column frequencies are all known.

What are different, but equivalent, names for referring to a dependent samples t test that have exactly the same meaning?

The dependent samples t test can be referred to equivalently as (i) a repeated measures t test, (ii) a repeated samples t test; (iii) a paired samples t test; (v) a paired groups t test; (vi) a dependent groups t test; and (vii) a within-subjects two-group t test (think that's covered all possible variants J).

What is the dependent variable in a one-way design?

The dependent variable in a one-way ANOVA design is the outcome variable on which participants in the research are measured and on which the researchers are expecting to observe differences between the groups according to the group means.

What is the distance between the observed and predicted Y values called in such a graph?

The distance between any observed value i Y and its corresponding predicted score ˆiY is called the residual and it is signified as i e .

What are the expected frequencies in a contingency table?

The expected frequencies are the number of people who would have been uniquely defined by one of the row categories and by one of the column categories if there actually was independence between the row and column categories

What are three different ways that the assumption of sphericity can be investigated in output from the GLM procedure in SPSS?

The extent to which the sample covariance matrix of scores on the levels of the within-subjects factor demonstrate evidence of sphericity can be investigated by (i) Mauchly's statistical test of sphericity, (ii) the value of the Greenhouse-Geisser estimator for epsilon, and (iii) the value of the Huyhn-Feldt estimator for epsilon.

What is a familywise false rejection error rate (ERFW)? How is it related to, and distinct from, a familywise α value (αFW)?

The familywise false rejection error rate ERFW is the probability of making at least one false rejection error among j hypothesis tests when all j null hypotheses are true. If αPC is set equal to some value, e.g. .05, then this α value no longer ensures that ERFW ≤ .05 (i.e., α no longer maintains control of the maximum false rejection error rate that may occur when more than one null hypothesis is being tested within the same family (see below for the meaning of a family). The term family just means a set of hypothesis tests for which we desire to achieve some know level of control over the possibility of making a false rejection error. It could be a set of hypothesis tests used in a series of planned comparisons, the set of hypothesis tests for all partial regression coefficients in a multiple linear regression, or any defined set of hypothesis tests being undertaken in a statistical analysis

How is familywise α value (αFW) related to a per comparison α value (αPC)?

The familywise α value αFW is the value being specified to maintain control of the maximal ERFW by reducing the per comparison α value αPC to some smaller value (i.e., αPC will always be less than αFW). If there is only one hypothesis test being undertaken in the family, then αPC will equal αFW, and researchers typically drop any subscripting by just referring to α per se. Therefore, αPC and αFW can be thought of as particular kinds or forms of α, which have particular properties and functions when more than one hypothesis test is being undertaken within the one family.

What are the implications of violating the assumption of sphericity for planned comparisons in a within-subjects ANOVA?

The implications of violating the assumption of sphericity for planned comparisons in a within-subjects ANOVA is that the statistical testing associated with the planned comparisons are likely to be false rejection rates higher than the nominal levels set by the defined α for each comparison. This will mean that some contrast tests will be rejected more often when the null hypothesis is true than what the rate is defined to be by the α value set for each comparison.

In (a) what circumstances will the independent samples t test not be robust to violation of (b) which kind of assumptions?

The independent samples t test will not be robust to violation of the assumption of equal variances in the two groups when the sample size of the two groups is noticeably different (e.g., one being more than about 25-50% smaller than the other). This violation will result in either the test becoming too liberal (which the smaller sample also has the larger variance) or too conservative (when the smaller group has the smaller variance). If samples sizes are the same and the variances differ, then the test can be shown to be reasonably robust to violation of the equality of variances assumption.

What is the independent variable is a one-way design?

The independent variable in a one-way ANOVA design corresponds to the *factor* for the ANOVA design. It represents the variable being "manipulated" by researchers, either through random allocation to some treatment or control condition in an experimental design, or through selection of naturally occurring groups that the researchers wish to identify differences between. Different values on the independent variable correspond to different levels of the factor and therefore, if the factor contains 4 levels, the independent variable will contain 4 distinct scores (or codings). The term independent variable and factor are therefore interchangeable and equivalent in their function.

What is meant by an interaction effect in a factorial design?

The interaction effect in a factor design is one where the effects of one factor vary according to different levels of the other factor that occurs in addition to any main effects of the two factors themselves.

Levels

The levels of a factor correspond to the *specific grouping categories* used to comprise the factor; the minimum number of levels is 2 (because we need at least two groups to compare means and potentially identify a mean difference)

What is the main characteristic in data being assessed by an independent samples t test?

The main data features being assessed in an independent samples t test are the means of the two groups (because the test is predicated on there being an expected difference between the two groups). This comparison assumes that the variances of the two groups are the same (i.e., that the variances are homogenous).

Which effects in a two-way ANOVA design would have order-0, and which effects would have order-1?

The main effects for each of the two factors in a two-way between-subjects ANOVA would have order-0 linear contrast weights. The interaction effects in a two-way between-subjects ANOVA would have order-1 linear contrast weights.

What are the main features in the data being assessed in a dependent samples t test?

The main feature of the data being assessed in a dependent samples t test is the mean of the difference scores that are (typically) formed by subtracting the paired score at time 2 (i.e., in the second group) from the paired score at time 1 (i.e., in the first group). The variances of the difference scores are used as the basis for forming a standard error.

What are the four main types of means in a cross-classified table from a factorial design? How are they similar and different?

The main types of means in a factorial design are: (i) the grand mean, (ii) the marginal row means, (iii) the marginal column means, and (iv) the cell means. The grand mean is the mean of all the means found in each of the three other types (e.g., the mean of the marginal rows means is equal to the grand mean, etc, etc). The marginal row and column means are the means of the cell means making up each row and each column in the cross-classified table.

How are the two kinds of mean sums of squares calculated in this test?

The mean regression sum of squares is obtained by dividing the sum of squares for the regression model by its degrees of freedom, which equal the number of independent variables being used in the regression model (signified as J). The mean residual sum of squares is obtained by dividing the residual sum of squares by its degrees of freedom, which equal the sample size minus the number of independent variables being used in the regression model minus one (i.e., n - J - 1). The ratio of the regression mean sum of squares to the residual mean sum of squares gives the observed test statistic which has a F probability distribution (i.e., it is a F test statistic), with J and n - J - 1 degrees of freedom.

What is the minimum number of levels necessary in a factor in a factorial design?

The minimum number of levels for a factor in factorial design is the same as that required in a one-way design, namely two.

What is the kind of factor typically used in a repeated-measures one-way ANOVA? How many levels at the least must that factor have? When would it be equivalent to a dependent-samples t test?

The most typical kind of factor used in a within-subjects (repeated-measures) one-way ANOVA is different measurements over time being made on the same group of people (or different treatment conditions being applied to the same group of people). A within-subjects factor typically comprises 3 or more levels; but the minimum requirement is 2 levels (which then makes the one-way design directly equivalent to a dependent samples t test).

What is the most typical standardiser to use in calculating a standardised mean difference in planned comparisons?

The most typical standardiser used is the square root of the error mean sum of squares that is reported in the usual ANOVA table.

How can the null hypothesis of zero mean difference be expressed in two equivalent ways for an independent samples t test?

The null hypothesis can be equivalently expressed as either H0 : μ1 =μ2 or as H0 : μ1 -μ2 = 0 , where μ1 and μ2 indicate the population means of Group 1 and Group 2.

How many independent interaction contrasts can be specified to investigate interaction effects in a two-way ANOVA?

The number of independent interaction contrasts will equal the number of numerator degrees of freedom for the omnibus interaction effect in the ANOVA table. In general, if Factor A has a levels and Factor B has b levels, then the numerator degrees of freedom equal (a ‒ 1) (b ‒ 1). This value—i.e., (a ‒ 1) (b ‒ 1)—also equals the number of independent interaction contrasts.

What value is the numerator degrees of freedom of an F test for a planned comparison always equal to, and how is this value related to a t test?

The numerator degrees of freedom df1 for an F test will always be equal to 1 for a planned comparison. The square root of any F test statistic with df1 = 1 will be equal to a t statistic with the same degree of freedom as the denominator degrees of freedom in the F test.

What is the range of values that R2 can potentially take?

The observed R2 value can range between 0 (whereby the set of independent variables explains no variation in the dependent variable) and 1 (whereby the set of independent variables explain all the variation in the dependent variable). In practice the observed R2 value will be strictly greater than 0 and strictly less than 1 for sample data

How do we interpret what the observed R2 value means in a multiple regression?

The observed R2 value in a multiple linear regression (and also in a simple linear regression) is interpreted as the proportion of variation in the dependent variable scores that is being explained (or, equivalently, accounted for) by the set of independent variables being used in the regression model.

What is the difference between peoples' observed scores and their raw scores?

The observed score and the raw score in sample data mean exactly the same thing: They are the actual untransformed values recorded for each person when measuring a construct.

How are the obtained p value and the alpha criterion related, and how do they differ? How is alpha related to the critical test statistic value?

The obtained p value is the conditional probability we get from applying a null hypothesis test to a sample statistic calculated on data from a single sample (i.e., p = Pr(T obs | H0 = True). The α value is the conditional probability we set as a criterion for deciding to reject, or not reject, the null hypothesised population parameter value (i.e., α = Pr(Rejecting H0 | H0 = True). For this decision, the obtained p value is compared to α, and the decision rule is: reject the null hypothesis if p < α, otherwise do not reject the null hypothesis. Both values are conditional on assuming that the null hypothesis is true.

What is meaning of the obtained p value from a null hypothesis test? What does it tell us about the state of the null hypothesised value of the population parameter? In doing so, write down and explain the notation typically used to indicate the obtained p value.

The obtained p value is the probability of obtaining an observed test value as large as the one observed for the sample statistic (or larger), given the null hypothesised population parameter value is true (i.e., the correct population parameter value was assumed in the null hypothesis). This p value can be represented in notation form as Pr( obs | 0 ) T H = True . Because the p value obtained from the null hypothesis test is conditional on the assumption of the null hypothesised value being true, the p value tells us nothing about whether or not this assumed value is the actual true population value.

What is the opposite to heteroscedasticity, and what does it mean (apart from being the opposite of heteroscedasticity)?

The opposite of heteroscedasticity is called homoscedasticity, and it refers to the residuals having the same variance for any chosen predicted value on the dependent variable.

How is familywise false rejection error rate (ERFW) related to per comparison false rejection error rate (ERPC)?

The per comparison false rejection error rate ERPC applies to each separate individual hypothesis test being undertaken within a family of hypothesis tests. The familywise false rejection error rate ERFW applies to all hypothesis test being undertaken within a family of hypothesis tests.

What can we use to quantify the precision of a sample correlation estimate? How can the degree of precision for a sample correlation be clearly indicated when using this method?

The precision of a sample correlation is quantified by the width of its 95% confidence interval (the range of correlation values between the lower bound and the upper bound). That is, we can say that an interval equal to (0.10, 0.15) is more precise than one that is (0.05, 0.20) because the former has a more narrow width than the latter. The closer the two bounds are together, the more precisely is the population correlation being estimated by the interval (for a given level of confidence). The main determinant of precision is sample size, with larger sample sizes resulting in more narrow intervals (all other aspects being equal). For a positive correlation, the range of possible values lies between 0 and +1. Therefore, the extent to which a confidence interval covers that potential range of possible values is a direct indication of the degree of precision. If the interval is (0.20, 0.90) than the interval is imprecise because it covers nearly 70 percent of the possible range. In contrast, if the interval is (0.20, 0.30) then the interval is much more precise because it covers only 10 percent of the possible range.

If we have a scatterplot between two variables and a regression line is placed on the graph, where will we find the predicted values from the regression of Y on X?

The predicted values of the dependent variable will be located on the regression line. This is because the regression line is a straight-line function expressed by ˆY= a + bX , where a is the intercept parameter and b is the slope parameter (more commonly called the regression coefficient) of the regression equation. A predicted value Yˆ is obtained by choosing any value of the independent variable X, multiplying it by the regression coefficient value b, and adding the intercept value a. For successive values of X, the resultant Yˆ values will form a straight line in the scatterplot.

Why might we apply a Bonferroni correction to the α value?

The rationale for undertaking a Bonferroni correction to our overall α is to define an upper limit on the familywise false rejection error rate ERFW when undertaking j planned comparisons (where j equals the number of hypothesis tests being undertaken) by making the per comparison αPC to equal α/j (where α is typical set at the conventional .05 value).

If we calculate the reciprocal of an odds ratio value that is < 1 (i.e., we invert the value), what is the resultant value of the odds ratio now?

The reciprocal of an odds ratio value < 1 will be a value that is > 1; e.g., if the odds ratio is 0.25, then 1/0.25 = 4.

If we calculate the reciprocal of an odds ratio value that is > 1, what is the resultant value?

The reciprocal of an odds ratio value > 1 will be a value that is < 1; e.g., if the odds ratio is 5, then 1/5 = 0.20.

Why is a regression coefficient in a multiple regression analysis referred to as being a partial regression coefficient?

The regression coefficient is called a partial regression coefficient because the value of this coefficient indicates the expected change in the dependent variable for that independent variable when the overlap in the effect of that IV with all other independent variables due to their joint correlation has been removed (or partialled out). The partial regression coefficient therefore represents the expected change in the DV from each IV that is due to each IV unrelated to any other IV also in the regression model (the partialling out of the overlap in effects of IVs on the DV is obtained through using the method of least squares estimator).

What is the residual term in linear regression equal to?

The residual i e indicates the difference between the observed and predicted scores on the dependent variable and it is calculated as ei =Yi - ^Yi (by convention, predicted scores are subtracted from observed scores when calculating a residual).

How is ANOVA as an analytic technique for investigating group differences typically different to multiple linear regression with dummy variables for investigating differences among categories?

The results obtained from both analytic methods are equivalent in terms of the F statistic and degrees of freedom obtained in the ANOVA table for linear regression and for formal analysis of variance. However, linear regression will provide statistics like R2 and adjusted R2 when using dummy variables whereas formal analysis of variance from the ONEWAY or GLM procedures in SPSS do not provide R2 values by default

If a correction is made to the usual independent samples t test to ensure that it is robust, which features of the t test have an adjustment made to them?

The robust version of the independent samples t test makes a correction to (i) the degrees of freedom (which is why the df can be real numbers with decimal values rather than just whole numbers), and (ii) the observed t statistic value. The net effect of these two corrections is to make the actual false rejection rate more closely approximate the maximum nominal rate being set by the α value.

What will always be the ordering of the three sample estimates of population epsilon from smallest to largest in size? What is the smallest possible size that epsilon can be in a withinsubjects design?

The sample estimates of epsilon will always be ordered in size by lower-bound (smallest value), Greenhouse-Geisser (middle-sized value), and Huynh-Feldt (largest value). The lower-bound estimate will invariably be too conservatively biased (i.e., smaller than the population value), the Greenhouse-Geisser can be conservative, and the Huynh-Feldt estimate may sometimes be a little too liberally biased (i.e., bigger than the true value). The Greenhouse-Geisser estimate is probably the safest option to choose when uncertain.

What do we always know for sure about the sample size in each group in a dependent two group design?

The sample size in each group must be the *same* when used in the dependent samples t test, because the basis of the test is the difference score formed from each pairing of scores. There will be as many paired scores as there are people in the sample. If there are more people one group than the other (e.g., people drop out between time 1 and time 2, and therefore time 2 has a smaller sample than time 1), then any person who does not have a paired score at each time point is dropped from the analysis.

Irrespective of whether or not we know the population standard deviation value, what feature of a sample affects the size of a standard error, and how is the effect manifested?

The size of a standard error is inversely proportional to the square root of the sample size. This means that, for any given value of a standard deviation (either population- or sample-based), the size of the standard error will get smaller as sample size gets larger. The general formula for a standard error is SE = σ/ sq of n.

What is the square root of R2 called, and what does it represent?

The square root of R2 is called the multiple correlation coefficient and it is signified by R. It is equal to the Pearson correlation between the set of observed scores and the set of predicted scores on the dependent variable in the regression model.

How does the interpretation of a standardised partial regression coefficient differ from that of its corresponding unstandardised partial regression coefficient?

The standardised partial regression coefficient is interpreted in terms of standard deviation units of all variables in the model. It indicate the size of the expected change in standard deviation units on the DV for a one standard deviation increase (or decrease) in scores on the focal IV, while holding constant scores on all remaining IVs. Standardised partial regression coefficients can be compared in terms of relative size of effects, because they are all on a common metric (i.e., standard deviations). In contrast, the unstandardised partial regression coefficient is in the raw score units of each IV and therefore it indicate the size of the expected change in the DV for a one unit increase (or decrease) in scores on the focal IV, while holding constant scores on all remaining IVs. Unstandardised partial regression coefficients in general cannot be directly compared in size because their size depends on the metric of the IV to which it is attached (which may have no relationship to the strength of the effect being observed in the coefficient).

What is the role or function of each of these three basic procedures?

The summing procedure enables us to get a total value over all people who are in the sample (i.e., the total "amount" of what is being measured), from which some kind of summary characteristic of the scores can be derived. The squaring procedure is a way to identify the amount of variability in scores (rather than the total amount per se). Dividing the amount obtained from either or both summing and squaring enables some kind of an average summary characteristic (by virtues of having summed up all scores over everyone in the sample). Averaging enables a summary characteristic value (e.g., a sample statistic like the mean, the variance, the standard deviation, etc) to be obtained that does not depend on the number of people in the sample. That is, a mean of 10 will be obtained from a summed total of individual scores equal to 50 (when n = 5) or a summed total of scores equal to 500 (when n = 50). The average score (i.e., 10 is the same in each instance), whereas the summed total score differs because of the size of each sample. Averaging removes this kind of influence from sample size.

What is the meaning of the term null in the concept of a null hypothesis test? To what is it referring?

The term *null* in null hypothesis testing refers to the general aim in hypothesis testing being to have sufficient evidence that nullifies the assumed hypothesised population parameter value in the test. This nullification is based on the argument that it is rational to assume a certain condition exists, and then test the strength of that assumption against available evidence. If sufficient evidence is found which is counter to the assumed condition being true, then it is reasonable to reject the initial assumption in favour of some alternative (analogous to a jury trial in which a person is initially presumed innocent of a crime until the evidence suggests beyond a reasonable doubt that the person is guilty of that crime).

What is meant by referring to a focused research question or focused research hypothesis as being a priori?

The term a priori in this context means that the research question or research hypothesis has been determined, selected, and specified before looking at any sample data, and therefore they are established fully before researchers see any results from analysing the data.

What is the term omnibus mean in reference to the concept of an omnibus null hypothesis in ANOVA?

The term omnibus in the concept of an omnibus null hypothesis refers to a non-specific null hypothesis being expressed in terms of *all k populations means* for the *k levels of the factor being equal to each other* (i.e., there are no mean differences). The omnibus approach does not entail any specific consideration on a subset of population means. Its rejection can only provide evidence for at least one population mean being different from all remaining population means without providing any evidence of where that difference may reside among the k levels of the factor.

What are the similarities and differences among the terms association, dependency, contingency, and independence?

The terms *association*, *dependency*, and *contingency*, in the content of a cross-classified table, all mean that the number of people in particular categories of one variable tend to also occur with particular categories in a second variable. The term independence, on the other hand, means the opposite of this: The probability of occurrence for each category in one variable is the same for each category of the other variable making up the table. Independence means that there is no association or contingency among the categories in each variable.

What are the two basic kinds of standardised mean contrast values that can be estimated in a one-way within-subjects ANOVA?

The two basic kinds of standardised mean contrast values that can be estimated from a oneway within-subjects ANOVA are a measure of: (i) individual change, or (ii) group differences. The former is measured by an analogous measure to Hedges' g, whereas the latter is measured by either (a) standardised d based on the pooled error term, or (b) Bonett's delta.

What assumption does the standardised mean contrasts based on the pooled within groups variance rest on for it to be robust? Under what conditions will it definitely not be robust to violation of assumptions?

The use of MSwithin in calculating a standardiser for the standardised contrast mean difference rests on the assumption of homogeneity of group variances holding for the groups defining the levels of the between-subjects factor. Its use will definitely not be robust when this assumption is violated and the design is unbalanced (i.e., when the k level of the factor do not all have the same group sample size). Even when the design is balanced, there has not been detailed investigation into the robustness of the pooled standardised mean contrast using MSwithin when heterogeneity of group variances is definitely present.

In a dependent samples t test, what is the equivalent measure to a pooled variance which is used in the independent samples t test?

The variance of the difference scores is the equivalent statistic in a dependent samples t test to the pooled variance in an independent samples t test. The variance of the difference scores is calculated in the normal way for a variance using the individual difference score for each set of paired scores being measured in the sample. The number of *paired scores* is the same as the *sample size*, and therefore the number of individual difference scores is the same as the sample size.

How are the weights for an interaction contrast derived from the weights of planned comparisons from the two main effects in a two-way ANOVA?

The weights of the interaction contrast are obtained by calculating the cross-product of weights from a main effect contrast for one factor, with the main effect contrast weights of the other factor.

What is meant by the α criterion in hypothesis testing, and what is its function in the procedure?

The α criterion in hypothesis test is the probability of rejecting the null hypothesis, given that the null hypothesis is true; i.e., Pr(Rejecting H0 | H0 = true). It is specified before any data are examined and analysed, and it defines a critical value against which to decide whether or not the null hypothesis is to be rejected after the results of the hypothesis test are obtained. The α criterion functions by defining a critical value for comparing the obtained p value from the hypothesis test against. If p < α, then reject the null hypothesis; otherwise we do not reject it (NB: we do not accept the null hypothesis if p > α; instead we do not reject). If we set the α criterion to be equal to .05, then this value ensures that we would incorrectly reject a true null hypothesis no more than 5% of the time when this hypothesis test was performed a very large number of times (i.e., in the long run) on data for which the null hypothesis is in fact true.

How many main effects are there in a two-way factorial design?

There are two kinds of main effects in a factorial design: one for the factor defining the rows in the cross-classified table, and another for the factor defining the columns in the table.

Some textbooks advocate only undertaking planned comparisons if the omnibus null hypothesis test for mean differences is rejected first. What might be a fundamental counter-argument against this practice?

This recommendation, although not completely without merit, implicitly assumes that the result of the omnibus null hypothesis allows researchers to infer the truth or falsity of the omnibus null hypothesis. We know from General Principle 12 that both a null hypothesis test and a confidence interval provide no basis on which to make firm conclusions about the truth or otherwise of the null hypothesis—both provide evidence of some form, but not proof. Therefore, imposing a rule for using the omnibus F test as a prior screening test for planned comparisons is no better in an inferential sense than immediately undertaking planned comparisons as the primary form of analysis in between-subjects designs (and taking secondary note of the omnibus result if desired).

If a 95% confidence interval for a sample statistic is (23, 37), what information about the value of the unknown population parameter does this tell us?

This result tells us the following information: (i) any null hypothesised population parameter value between 23 (i.e., the lower bound value) and 37 (i.e., the upper bound value) would not be rejected if α = .05 and one of those values was used in a null hypothesis test; (ii) any value less than 23 or greater than 37 would be rejected if it was used in a null hypothesis test (and therefore 0 is a null hypothesised value which would be rejected in this instance); (iii) the width of the interval is 37-23 = 14 units (the metric of the sample statistic is not specified, so we do not know if the interval width indicates good precision in estimating the unknown population parameter value); and (iv) if we repeatedly calculated the value of the same sample statistic in a large number of repeated samplings of size n, then 95% of the resultant confidence intervals would contain the unknown population parameter value—in that sense, we can state that we are 95% confident that the unknown population parameter value is between 23 and 37.

Effect sizes for standardised mean diff in ind groups

Trans to Std mean diff because. Need. full metric for score on obs units: Use bonnets d or hedges g Both do m1-m2/s* with diff s* Hedges g n1+ n2 -2 and cohens d n1+ n2 Bonnets delta used if can't assume equal variances Hedges g has upward but consistent bias (gone if over 120)

What are two different ways that two dependent groups can be formed for undertaking research, and how do these two forms fundamentally differ?

Two dependent groups in research can be formed by: (i) having each person being measured on two separate occasions, whereby the two occasions are designated as being two groups and the scores in each group indicate the score of each person at each time point; or (ii) having each person in one group being matched on a pairwise basis with the score of a different individual person in a second group.

What are three different ways that two groups can be formed for undertaking research, and how do these three forms differ fundamentally?

Two groups for investigating differences between groups can be formed by: (i) each person being allocated, or belonging, to only one of the two groups, which then makes the two-group design classified as being independent; (ii) each person being measured on two separate occasions, whereby the two occasions are designated as being the two groups, which then makes the two-group design classified as being dependent; and (iii) each person in one group being matched on a pairwise basis with a different individual person in a second group, which then makes the two-group design classified as being dependent (because of the pairwise matching).

What are three different ways that two groups can be formed for undertaking research, and how do these three forms differ fundamentally?

Two groups for investigating differences between groups can be formed by: (i) each person being allocated, or belonging, to only one of the two groups, which then makes the two-group design classified as being independent; (ii) each person being measured on two separate occasions, whereby the two occasions are designated as being the two groups, which then makes the two-group design classified as being dependent; and (iii) each person in one group being matched on a pairwise basis with a different individual person in a second group, which then makes the two-group design classified as being dependent (because of the pairwise matching).

Regression coeffiecnt a

Un standardised regression coeffiecnt: expected change in DV for 1 unit change in IV (subject to arbitrary scaling of tests on IV etc) Standardised regression coeffiecnt: change in SD for df for every one unit change on IV - beta coeffiecnt Partial regression coeffiecnt: change n dv for one unit change in IV when holding all other IVS constant.

Conditional vs unconditional probability

Unconditional probably - chance of this relative to all other possible outcomes Conditional- probability of this occurring amongst all possible outcomes, given a particular outcome in a second set Gives no indication about what's happening in the second set or its chances. But now it's happened what's likihood of other Only known likihood of getting. T obs given null hypothesis, not sure if null hypothesis is true or not - assumption

What is the theoretical probability distribution of a partial regression coefficient?

Unstandardised partial regression coefficients are distributed according to a t probability distribution (note that the observed t value and its obtained p value for any unstandardised partial regression coefficient also directly apply to a standardised partial regression coefficient and to the semipartial correlation for that IV—that is, the one hypothesis test applies to all three sample statistic values that are the typical focus of interpretation in a multiple regression model).

How are the degrees of freedom calculated for testing the significance of a partial regression coefficient?

Unstandardised partial regression have degrees of freedom equal to the residual degrees of freedom for assessing R2 in a hypothesis test (i.e., n - J - 1, where n is the total sample size and J is the number of IVs in the regression model).

Critical test statistic

Value from a theoretical probability distribution that corresponds to the area in the tail of the distribution that's the equivalent to the alpha level

Parameter estimate

Value of single sample statistic that we calculate from sample scores is an estimate of usually unknown population parameter - just another term for sample statistic

Mauchly tests of sphercity

Want to accept null hoythsies that spherecity But test can be effected by mild departures form normality in dv so look at epsilon values a s well if tis significant bc could be reflecting this

What are the two features of a multiple regression model that are typically subject to statistical testing of their significance?

We can test whether or not the set of independent variables as a whole significantly predict the dependent variable by undertaking a hypothesis test of the observed R2 value through the observed F test statistic in the associated ANOVA table. We can also test whether or not each individual independent variable considered on its own significantly predicts the dependent variable by undertaking a hypothesis test of the partial regression coefficient by taking the ratio of the coefficient to its standard error which is distributed a t test statistic.

What are two ways of transforming an odds ratio value that is < 1 to a value that is > 1? [Hard Question]

We can transform an odds ratio value from being < 1 to it being > 1 by one of two possible ways. First. We can take the reciprocal of the original odds ratio value that was < 1 (i.e., invert the value by calculating 1/OR). Second, we could swap around the two rows of the contingency table on which the odds ratio was calculated using the formula (a x d)/(c x d). This second way has the equivalent effect of taking the reciprocal of the original odds ratio < 1 (of course, we could equally swap the two columns around in the contingency table also—but we cannot swap both the rows and the columns at the same time, because the odds ratio value will then still be < 1). See q24, wk3

Why do we refer to an observed test statistic as being observed?

We refer to an observed test statistic as being *observed* because the value of the test statistic is a transformation of the observed value of a sample statistic calculated from sample data.

Why is a covariance or correlation said to reflect a symmetric relationship among variables?

We say that covariances and correlations (and contingency tables) reflect a symmetric relationship between variables because the variables have the same functional form and role in the analysis. This is because a covariance or correlation does not depend on a different specification for each variable (i.e., the correlation can be equally considered to be occurring between variables A and B, as it can between variables B and A), and the contingency table does not depend on a different specification for each variable (i.e., variables A and B can either be specified as forming the rows and the columns respectively, or the columns and the rows respectively, without any resultant change in the outcome of the analysis).

Linear regression model

Y = a + bx + e Where e is residual score A indecent B reg coefficient If doesn't include the e residual then residual can be calculated using Y -ÿ If one IV is simple. Multiple IV is multiple regression coeffiecnt

How does a Z score differ from a standardised score?

Z scores are a particular kind of standardised score, which have a mean equal to zero and a standard deviation equal to one. Z scores are typically calculated by transforming people's raw score using the sample mean and sample standard deviation (or the population mean and standard deviation can be used if they are known, e.g., like for IQ tests).

P value interpretations

if P < alpha then: - the null hypothesis is *rejected* - does not mean the null hypothesis is false or incorrect - people will say the p value is significant - not necessarily important - should say *statistically significant* if P ≥ alpha then: - the null hypothesis is *not rejected* - it is never said to be *accepted* not is it said to be *true* or *correct* - people will say the p value is not significant which is *misleading*

What is the term post hoc mean in reference to the concept of post hoc testing in ANOVA?

post hoc is "after this". In the context of ANOVA and post hoc testing, it refers to the practice of either undertaking an omnibus null hypothesis test or inspecting the sample means and then undertaking a series of null hypothesis tests on all nonredundant pairings of group means (where non-redundant implies testing the difference given by A - B makes B - A redundant). If there are k groups, then the number of non-redundant pairings of group means is given by k*(k - 1)/2.

Unconditional probability

the long-run relative frequency of one outcome among the set of all possible relative outcomes --> p(A) or Pr(A)

The relative size of different covariances depends on

the metric of the two variables used to calculate it - has to be common between variables or its difficult to suggest a weak or strong association e.g. Z scores are just a deviation score in a metric of SD units of the variable and are therefore standardised scores

4 types of distributions in psych research

1) observed *sample distribution* 2) unobserved *population distributions * 3) sampling distribution of *sample statistics* 4) theoretical *probability* distribution

If a 95% confidence interval for a correlation does not capture zero, what can be inferred from this result?

A 95% confidence interval on a sample correlation that does not have the value 0 between the lower and upper bound values of the interval indicates that 0 (i.e., no linear association between the two variables) is not a plausible value for the population correlation, and therefore we can infer with 95% confidence that a non-zero association is occurring at the population level.

If a sample statistic is a biased, inconsistent estimator of a population parameter, what might be the implications of the bias and inconsistency for this statistic's sampling distribution?

A biased estimator is one for which the mean of the sampling distribution of the sample statistic is not equal to the population parameter value. An inconsistent estimator is one for which the degree of bias does not get smaller as the sample size increases. Therefore, the mean of the sampling distribution of an estimator that is both biased and inconsistent will not be increasingly equal to the population parameter value, even as sample size gets larger and larger.

Factor

A factor in a one-way design is the set of k discrete grouping categories used to define the *independent variable* on which researchers investigate differences in means among the groups.

What is meant by a linear contrast effect of order-1?

A linear contrast effect of order-1 is one in which (i) all positively valued weights sum to +2 and (ii) all negatively valued weights sum to -2.

What do we mean by a population in psychological research? How is it defined?

A population is the complete set of all persons for whom the research question or research hypothesis is relevant. It is defined in terms of either (i) a particular psychological construct (e.g., depression) or some defined condition (e.g., people with epilepsy), which identifies a homogeneous grouping, or (ii) the general population (when theory and the research question/hypothesis are not restricted to a homogeneous group). A research population can be theoretically infinite in size, or it may be finite

What do we mean by a statistic? What is its role?

A statistic is a quantitative summary characteristic of all people in one particular sample of size n drawn from the population. The value of a sample statistic will vary from one sample to the next when different samples are selected from the same population.

Least square estimator

Aka linear least squares regression function. Uses the squared deviation score of sum of residuals (If not squired then would equal zero) in order to ,a sure how much the line does not account for (want smallest possible) Estimates the intercept and slope of line sum of y-mx+a squared

Type of relationship is distinguished by...

Association, prediction or difference

Properties of estimators

Bias Efficiency Consistency

2x2 calculati contrast mean

Each row value * its contrast weight summed

A p value of .000

does not mean 0 but 0.001!


Related study sets

11. Quiz 2: Eating the Right Foods

View Set

Human Resource Management - Exam 2

View Set

Economics Ch. 5-6 Quiz Questions

View Set

All Questions on eye/ear/pain/PVD i could find

View Set

Prioritization, Delegation, and Assignment, 5th Edition- Respiratory, Neruo, & Mental Health

View Set

Linear Algebra Exam 2 - Terms and Concepts

View Set

Health Assessment- PrepU Chapter 15 Assessing Head and Neck

View Set

Health Insurance Policy Provisions, Clauses, & Riders

View Set

Lab 7 Pre-Reading Joints Connect

View Set