stats test 2
a correlational coefficient can be evaluated for significance (T/F)
T
what do subscripts mean in a cross tabulation chi square table
the different subscripts tell us that these proportions are significantly different. NOT COUNT
a calculated value of chi square compares
the frequencies of categories of items in a sample to the frequencies that are expected in the population
normal distribution of residuals
the residuals of the model are random, normally distributed; the means differences between the model and observed data are close to zero
linear regression means that for an increase in the X variable
there will be a constant change to Y
homoscedasticity
at each level of the predictor variable, the variance of the residual terms should be constant
SS T SS R SS M
total variability between scores and the mean residual/error variability (variability between the regression model and the actual data) model variability (difference in variability between the model and the mean)
interpret p<0.05, d=0.59
difference between Brandeis students and the general college students is greater than chance. This difference is statistically significant and has a moderate functional effect.
what do you need to know to calculate r(critical)
directionality, alpha level, and df
the chi square statistic is a __ of distributions. The shape of the distribution depends on the ___. Chi square statistic is __ and __. the chi square will be small when ___
family df nonnegative positively skewed null hypothesis is true
how to compute sample statistic
first find standard error. then calculate z(observed)=(Msample-mu)/SE
independence
for any two observations, the residual terms should be uncorrelated
a test that uses the sample data to test a hypothesis about the proportions in the general population
goodness of fit chi square
asymp sig
p value
if a significant result is obtained, how might you interpret the findings more thoroughly for a chi square test of independence?
refer to percentage totals in contingency table
total deviation
regression deviation plus residual deviation
what is a decision rule
reject H0 if Z(observed)</=-z(critical) reject H0 if Z(observed) >/= +z(critical)
assumptions for a z-test
N=30 or greater for the sample the distribution of raw variables does not have to be normal and the sampling distribution will still be normal
r^2 tells you
% of variance accounted for
degrees of freedom for chi square test of independence
(r-1)(c-1)
how are degrees of freedom calculated for a chi square test
(r-1)(c-1)
correlation varies between __ and __ what means no relationship
-1 to +1 0
chi square assumptions
1) independence of observations i.e independent groups; each person, item, or entity contributes to only one cell of the contigency table 2) size of expected cell frequency a chi square test should not be performed when the expected cell frequency is less than 5
regression in SPSS conclusions
1. Overall model fit is good (p=0.047), suggesting that the model is a significantly better fit to the data compared to the mean. 2. Anxiety score was a significant predictor of exam performance (b=-1.16, 95%CI, [-2.31, -0.02], p=0.047). 3. In addition, 41% of the variation in exam performance is explained by variation in anxiety scores (R^2=0.407)
how do you calculate covariance
1. calculate error between mean and each subjects score for the first variable (x) 2. calculate the error between the mean and their score for the second variable (y) 3. multiply these error values 4. Add these values and you get cross product deviations 5. The covariance is the average cross product deviations
Z test- formal hypothesis testing steps
1. convert the research question into a statistical hypothesis 2. set the decision criteria -select confidence level (alpha level) -calculate critical statistic (Zcritical) -establish decision rule 3. collect data and compute sample statistic (Zobserved) 4. Make the decision -compare observed and critical statistics -draw inference -calculate effect size 5. Report result (APA)
parts of a hypothesis test (11)
1. research question 2. statistical question 3. null and alternative hypothesis 4. alpha level 5. critical z 6. decision rule 7. observed z 8. decision 9. effect size 10. APA style report 11. conclusion
list three assumptions that must hold for correlation to be valid. In addition, list at least two additional misleading factors and/or possible problems with correlation
1. there is a linear relationship between variables 2. there is a pair of values for each participant or observation 3. an absence of outliers in either variable 1. there could be a third variable (measured or not) that is affecting the results, so causality can't be assumed between the variables 2. the correlation coefficient does not tell us which variable causes the change in the other variable.
what is the critical z value for one-tailed test alpha level 0.05
1.65
contamination with protein will give an A260 value slightly less than
1.8
what is the minimum number of observations required for the expected frequencies in chi square
5
how much variance is explained when a correlation of .9 in your analysis
81%
APA style reporting (z-test brandeis creativity scores )
A 2-tailed (non-directional) z-test revealed that typical brandeis creativity scores (M=110) were significantly higher than the typical college population (mu=100), z(two-tailed)=3.23,p<0.05, d=0.59
b. Below is an image of the gel that you ran to give you the data described above. Lane 1 is 0 minutes, lane 2 is 10 minutes, lane 3 is 20 minutes, and lane 4 is 30 minutes.
ALWAYS run a molecular weight ladder to determine size unfolded control to make sure you are looking at the right band.
what Is the ordinary least squares method
B1=r(xy)(sy/sx) B0=My-b1Mx estimated model: Y=b0 +b1X
r and d effect size ranges
r=.1, d=.2 (small effect) r=.3, d=.5 (medium effect) r=.5, d=.8 (large effect)
how to find chi square goodness of fit statistic on SPSS
Data--weigh cases-- weigh cases by-- frequency -- frequency variable -- ok analyze -- nonparametric tests -- legacy dialog -- chi square drag frequency into test variable list -- ok
You would like to use Ni-NTA chromatography to purify the ω RNA Polymerase subunit from the pellet. How would you change the protocol we used in the lab in order to purify active, correctly-folded protein?
Denature with urea/gu-Hcl and then refold. The protein needs to be soluble to be run on a column. You could also purify it using a hydrophobic solvent
a correlation of .9 has less error than a correlation of .3 (T/F)
F???
suppose you have a non-directional hypothesis and you are testing at an alpha level at 0.05. Your observed z is 0.25. Can you reject the null hypothesis?
No, because z observed (0.25) is less than z critical (1.96)
measures of association
Phi: accurate for 2 x 2 contigency tables; may not lie between 0 and 1 contigency coefficient: seldom reaches upper limit of 1 cramer's V: when both variables have only two categories, phi and Cramer's V are identical. However, when variables have more than two categories, Cramer's statistic can attain its max of 1
B. Another student shakes their tube very vigorously, instead of gently, after cells are lysed, and neutralization buffer was added. How might this affect their sample? Why?
This vigorous shaking may shear the genomic DNA, meaning if would be unable to properly precipitate, and would have isolated many small pieces of genomic DNA in our plasmid sample)
what kinds of protein controls should you have in your experiment; why are controls important in experimental analysis
WT and a known aggregator - important to know what both look like to be able to interpret results and assess if your experimental procedure has worked.
Y(i)= b0 + b1Xi + e(i) describe each variable
Yi= outcome variable. b0= intercept, value of Y when X=0; point at which the regression line crosses the y-axis b1=slope of the regression line; regression coefficient for the predictor; direction/strength of relationship Xi=predictor variable e(i)= the residual (error); whats left in Y(i) that cannot be explained by X(i)
what type of table is used to summarize purely categorical data
a contingency table
goodness of fit APA reporting example (student's soft drinks preference)
a chi square test for goodness of fit showed that students do not have a preference among the tested soft drinks, X^2 (df, n=sample size) = observed chi square value, p > 0.05, phi=0.135
how to report results using chi square test of independence (relationship between presonality type and color preference).
a chi square test of independence showed that there was a significant association between the personality type and preferred color, X^2 (df, n =200) = 35.6, p < 0.05, V=0.422.
chi square test of independence
a test that uses frequencies found in sample data to test a hypothesis about the relationship between two variables in the population; the test determines whether the distribution in one variable depends on the distribution of the other variable in the population
what is regression
a way of predicting the value of one variable from another; it is a hypothetical model of relationship between two variables; the model used is a linear one; therefore, we describe the relationship using the equation of a straight line.
wash lane
all nonspecific binders
SDS-PAGE separates strictly by linear size because the SDS coats the denatured protein and gives all proteins a uniform charge to mass ratio. By which of the following protein characteristics will a native gel separate?
amino acid charge shape overall length
use ___ gel for gel electrophoresis in lab 3 to separate plasmids use __ for sanger sequencing use __ for SDS PAGE
agarose gel acrylamide polyacrylamide/bisacrylamide
lysate
all protein with crystals
unbound lane
all soluble protein - cry and nonspecific binders
CL
all soluble protein including crystals
how do you find critical chi square value for goodness of fit
alpha and df
chi square is a test of
association
a test statistic is used to determine
whether the observed data is more than would be expected than by chance alone
A. You transformed your plasmid, plated onto LB-Amp and the next day counted your colonies. You know all the volumes you used during the procedure. What other information would you absolutely need in order to calculate transformation efficiency?
b. Number of colonies on the LB Amp plate AND THE concentration of plasmid you transformed
why does sample size effect goodness of fit
because you are dividing by the frequency expected which is bigger with a larger sample size
bootstrapping in SPSS
bootstrap CI tells us that population b1 is likely to fall between a certain interval table tells us - p value for CI - under bias is the bootstrap CI?? (the closer bias is to zero the better it is) - lower and upper 95% CI
what is special about calculating Cramers V for chi square test of independence
calculate both (r-1) and (C-1) and then use the smallest value for df
the test determines whether the distribution in one variable depends on the distribution of the other variable in the population
chi square test of independence
calculate the transformation efficiency
colonies that grew on LB/Amp divided by total colonies
direction of causality
correlational coefficients say nothing about which variable causes the other things to change
how do you calculate cohens d effect size
d=Msample-mu/sigma
how to do test of independence in on SPSS
data-- weigh cases select "weigh cases by" and drag "frequency" into "frequency variable" line and press "ok" Analyze--descriptive statistics--cross tables drag "personality" into "rows" drag "color" into "columns" click "statistics" -check "chi quare" "phi and crammers v" and "continue" click "cells" -counts "observed", "expected" (percentages "row" , "total" , "continue" check "display cluster bar charts" "ok"
regression in SPSS output
descriptive stats correlations -pearson's r -significance (probabiltiy fo getting the particular r observed value; we dont have r critical but we can still tell its significance because of p) model fit output model summary -absolute value of r -R^2: how much variation in output variable can be explained by variation in predictor variable; how much improvement are we making by using our model compared to using no predictors and just using the mean ANOVA predictor fit output coefficients -b0 -b1 -p value for t (testing significance of b1) -standardized beta is the same as r -when effect is significant, this 95%CI for b1 does not include 0
how do you standardize covariance
divide by the standard deviation of both variables. The standardized version of covariance is known as the correlational coefficient.
non-parametric tests
do not make any assumptions about the distribution of the population can operate on nominal data
b0=143 what does this mean
if someone had a score of 0 on X than their predicted value of Y would be 143 (hypothetical)
the third variable problem
in any correlation, causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results.
what is correlation
it is a way to measure the extent to which two variables are related (strength of association); describes linear relationships; used to generate new research questions, without answering cause-effect questions; a high positive correlation (r) indicates a strong association but does not prove a causal relationship
what extra info does SPSS give you if you say percentages "row" "total" "continue" versus percentages "column" "total" "continue"
it will say, for example, "% within personality" tells you 60% of people that are extraverted prefer color red. tells you that 90% of color red was picked by extraverts.
what are the regression assumptions
linearity outliers independence homoscedasticity Normality
correlation in SPSS check what assumptions
linearity and outliers and normality
how to determine if the regression model fits the obeserved data
look at the mean -- if there was no relationship between advertizing budget and album sales, then the regression model would be a flat line equal to the mean.
how do you find expected frequencies for chi square test of independence
marginal row frequency x marginal column frequenct / total number of people
when analyzing categorical variables, the __ of a categorical variable is meaningless; the numeric values we attach to different categories are___. Therefore, we analyze frequencies and we can tabulate these frequencuies in a __
mean arbitrary contingency table
frequency expected =
n x proportion expected
degrees of freedom for correlation
n-2 because theres two variables
df for goodness of fit chi square
number of categories -1
what does a b1=-1.16 represent
on average, a one point increase in X is related to a 1.16 decreaes in exam score.
__ can influence the value of a correlation
one extreme data point (outlier)
in the regression equation what does b1 denote
one unit change in X is related to b1 change in Y
assumptions for regression
outliers/ linearity- the means of each distribution of y at a given x can be joined by a straight line independence- for any two observations, the residual terms should be uncorrelated homoscedascticity - at each level of the predictor variable, the variance of residual terms should be constant normal distribution of residuals- the residuals of the model are random, normally distributed/ difference between model and observed data close to zero
Why do we see the fluorescence of the samples decrease after being heated to 50-55 deg. C
protein aggregates or dye dissociates
what does an SDS-PAGE gel tell you about your protein
purity size pre
misleading factors for pearson moment correlation r
restricted range, outliers, curvilinearity.
how to calculate chi square goodness of fit
sum of (fe-fo)^2/fe
durbin Watson test
tests for independence; you want the Durbin Watson value to be between 1 and 3
goodness of fit chi square
tests whether frequencies in a sample match frequencies in a population
chi square test of independence
tests whether two variables are associated
what does significance mean
that you have confidence(alpha) that the observed difference is bigger than sampling error
residual
the difference between each observation and the model fitted to the data
residual (chi square goodness of fit)
the error between what the model predicts (expected frequency) and the observed data (observed frequency) residual=observed-model model=row x column / total
linearity
the means of each distribution of y at a given x can be joined by a straight line
a small X^2 statistic means
the null hypothesis is true
in the regression equation what does X denote
the predictor variable
if you reject the null then
there is a difference
R^2 (regression)
the proportion of variance accounted for by the regression model. The pearson correlation coefficient squared SS M / SS T
what is represented by the R^2 Statistic
the proportion of variance in the outcome variable accounted for by the predictor variable.
when discussing the concept of covariance, the distance from a specific observation to that of the mean of a given variable is known as
the residual
when interpreting correlation coefficient it is important to know
the significance of the correlation coefficient the magnitude of correlation coefficient the sign of the correlation coefficient
what if you get an SPSS output that says .000 under "asymp sig (2-sided)"
this is your p value and you should report it as p < 0.001
assumptions of a test serve what function
to minimize sources of bias
how to report correlation in APA (exam performance and anxiety)
two-tailed pearson correlation was computed to evaluate the association for anxiety and exam performance. The correlation was negative and significantly greater than 0, r(df)=-.638, p<0.05. The analysis shows that increase in anxiety is associated with poorer exam performance. The calculated R^2=.407 indicates that 41% of variation in exam performance can be explained by variation in anxiety.
variance versus covariance
variance tells us how much scores deviate from the mean for a single variable whereas covariance tells us by how much scores on two variables differ from their respective means.
what is the covariance useful for
we need to see as one variable increases, the other increases, decreases, or stays the same. This can be done by calculating the covariance. We look at how much each score deviates from the mean. If both variables deviate from the mean by the same amount, they are likely to be related.
what do we look for when checking for normality (regression)
we want the dots to fall on the line or close to the line
what is homoscedasticity .
we want the shape of the scatterplot to be rectangular
when do you use correlation
when there is 2 dependent variables and no independent variables.
when can we use a z test
when you are comparing a sample mean to a population mean whose mean and standard deviaiton is known and sample mean comes from from a sample with n>30
does sample size effect goodness of fit chi square
yes, based on the sample size, the significance you calculate will be different
E. You then unfold the protein overnight in 4 M guanidine HCl. In the morning you come in and immediately add trypsin to the tube to digest. You are surprised to see that the full-length protein is not degraded over the thirty minute time course. a. Provide a scientific reasons that justifies this result.
you did not dilute so the 4M GdnHCl denatured the trypsin
if all the bands for the different trypsin time points look the exact same, what could have gone wrong
you forgot to add trypsin; you forgot to add PMSF
what happens by squaring the correlational coefficient
you get the proportion of variance in one variable shared by the other (coefficient of determination)
your band looks like its smiling what the fuk is up
you may have run at too high of a voltage you may have loaded your samples unevenly you may have run out of buffer during your run you may have had the wrong running buffer composition
what is the difference between a z-score and a z-statistic
z-score: X-Xbar/s Z-statistic: Xbar-mu/SE (compare a mean of a sample to the mean and standard deviation of a population)
how to do goodness of fit in SPSS
• Data->Weight Cases... • Select "Weight cases by" and drag "Frequency" into "Frequency Variable:" line and "OK" • Analyze->Nonparametric Tests->Legacy Dialogs- >Chi-square... • Drag "Frequency" into "Test Variable List:" • "OK"
correlation is an effect size
• It is an effect size • ±.1 = small effect • ±.3 = medium effect • ±.5 = large effect