Statistics

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Sampling with replacement

Each participant selected is replaced before the next selection. Replacing before the next selection ensures that the probability for each selection is the same. This method of sampling is used in the development of statistical theory

Two- Independent-Sample t Test

Do the two groups difference from each other (do we reach significance)?-- determining the extent to which scores between groups overlap(less overlap, more likely significant) Variance in one or both populations unknown Assume: Normality, Random Sampling, Independence, EQUAL VARIANCES( (larger s^2/ smaller s^2)< 2 ) 1. Null hypothesis (states that there is no difference between two groups μ -μ =0) 2. df = df1 + df2 OR (n-1) + (n-1) OR N-2 Find critical values or test (will be ± from mean) 3. Estimated standard error for the difference Pooled sample variance (formula depends on if sample sizes are equal or not) Plug into tobt = (M1-M2) - (μ1 -μ 2) / (sVM1-M2) 4. make decision Cohen's d = M1-M2 / spqrt(s^2Vp) (difference of means / pooled sample standard deviation [unbiased estimate for the standard deviation of the difference b/w two population means]) Proportion of Variance the same Confidence intervals : M1-M2 ± t(sVM1-M2)

Semi Interquartile Range (SIQR)

IQR/2. mean IQR.. smaller SIQR values indicate less spread / variability of scores in a dataset. good estimate but excludes 1/2 scores

Standard Deviation

Measure of variability for the average distance that scores deviate from their mean Population σ= Sqrt(σ^2) = sqrt(SS/N) Sample = s = sqrt(s^2) = sqrt(SS/n-1)

Simple frequency distribution

Summarize how often scores occur. W/ larger datasets, the frequency of scores contained in discrete intervals is summarized (group data). Summary of display for either the frequency of each score / category (ungrouped data) or frequency of scores w/in defined groups / intervals.

Bar Charts

Summarizes the frequency of discrete and categorical data that are distributed in whole units or classes. Same as histograms in looks but bars do not touch

Relative Frequency Distirbution

Summary display that distributes the proportion of scores in each interval. It is computed as the frequency in each interval divided b the total number of frequencies recorded

Cumulative Frequency distribution

Summary display that distributes the sum of frequencies across a series of intervals. Always sums to the total number of scores within a distribution

Cumulative Percent distribtuon

Summary display that distributes the sum of relative percents across a series of intervals,

Correlational Method

determines whether a relationship exists b/w variables, but lacks the appropriate controls needed to demonstrate cause and effect

effect size

difference b/w a samle mean and the population mean stated in the null. an effect is not significant when we retain the null hypothesis

Range

difference b/w largest and smallest value in a distribution... most informative for scores without outliers

between subjects design

different participants are observed one time in each group or at each level of one factor

sampling distrbution

distribution of all sample means that could be obtained in samples of a given size from the same population

Frequency polygon

dot and line graph where the dot is the midpoint of each interval, and the line connects each dot. Midpoint is distributed along x axis and is calculated by adding upper and lower boundary of an interval and then dividing by 2. Used for same types of data as histograms

Ratio Scale

has a true zero, measurements equidistant in equal units, most informative scale of measurement

Law of large numbers

increasing the number of observations / samples in a study will decrease the standard error. Larger samples are associated with closer estimates of the population mean on average

t statsitic

inferential statistic used to determine the number of standard deviations in a t distribution that a sample mean deviates from the mean value or mean difference stated in the null hypothesis t = (M -μ) /sVM, where sVM = SD /sqrt(n)

Continuous variable

measured along a continuum at any place beyond the decimal point (fractions)

Estimated standard error for the difference

sVM1-M2 = sqrt((sVp^2/n1) + (sVp^2/n2)) estimate of standard deviation of a sampling distribution of mean differences b/w two sample means.

estimated standard error for difference scores

sVMD = sqrt (sVD^2/nVD) = sVD/ sqrt(nVD) estimate of the standard deviation of a sampling distribution of mean differences scores

Pooled sample variance

sVp^2 mean sample variance for two samples. unequal sample sizes: s1^2(df1) +s2^2(df2)/(df1 +df2) equal sample sizes: s1^2 + s2^2 / 2

Repeated measures design

same participants observed twice(in each treatment) pre-post design: measure a dependent variable before and after a treatment within subjects design: researchers observe the same participants across many treatments but not necessarily before and after

Grouped data

set of scores distributed into intervals, where the frequency can fall into any given interval

Type I Error

Controlled for in experiment. probability of rejecting a null hypothesis that is actually true.

Cramer's V

V = sqrt (χ^2 / N x dfVsmaller) use when 2+ levels of one or more categorical variable dfsmaller 1: small = .10; med = .30; large =.50 2: small = .07; med = .21; large = .35 3: small = .06; med = .17; large = .29

Eta-squared for t test

η^2 = t^2 / (t^2 +df) measure of proportion of variance

characteristics of mean

-- changing an existing score will change the mean --adding/removing a score will change the mean... unless the value is = - Add a score above- increase, delete below- increase -- adding/subtracting/multiplying/dividing each score in a distribution by a constant will cause the mean to change by that constant --Sum of the differences of scores from their mean is zero, similar to placing weights on both sides of a scale --Sum of the squared differences of scores from their mean is minimal (aka if replace mean with any other value.. solution will be larger) --used to describe Interval & Ratio Scale Data: (normal distribution) data that can be described by how far away they are from the mean

Characteristics of Standard deviation

--always positive --used to describe quantitative data --standard deviation is most informative when reported with the mean --Value for the standard deviation is affected by the value of each score in a distribution -- adding/subtracting same constant of each score will not change the value of the standard deviation -- multiplying or dividing each score using the same constant will cause the standard deviation to change by that constant

Identifying Percentile point / Rank

1. Identify interval w/in which a specified percentile point falls 2. Identify real range for the interval identified (one more than the difference of scores) 3. Find position of percentile point within the interval-- distance from percentile point to (or bottom of interval) & divide by interval total... multiply fraction by width of real range 4. identify percentile point... subtract number of points from top (or bottom as used above)

Locate a score

1. locate a z score associated with a given proportion in the unit normal table 2. transform z score into a raw score

Non parametcir tests

1. test hypotheses that do not make inferences about parameters in a population 2. test hypotheses about data that can have any type of distribution 3. analyze data on a nominal or ordinal scale of measurement

Locate a proportion

1. transform a raw score (x) into a z score 2. locate corresponding proportion for the z score in he unit normal table

Bimodal distribution

2 scores occur most often (2 modes)... mean and median can be different values but typically b/w two modes

Histogram

Graphical display used to summarize the frequency of continuous data that are distributed in numeric intervals (grouped). Rule 1: vertical rectangle represents each interval, height of the rectangle = frequency recorded Rule 2: cannot be constructed for open intervals b/c open intervals do not have upper or lower boundary. Each rectangle should have same interval width Rule 3: All rectangles touch... assumed that data is continuous

Interpreting Chi-Square goodness of fit test

NOT INTERPRETED IN TERMS OF DIFFERENCES B/W CATEGORIES -- AKA not appropriate to make comparison across the levels of a categorical variable... Instead compare observed & expected frequencies at each level of the categorical variable with an increasing k (more conditions) more difficult it becomes to identify which observed frequencies significantly different form the expected frequencies(large discrepancies tend to be the focus of a significant result) each observed frequency must come from different & unrelated participants size of an expected frequency should never be smaller than 5 in a given category (smaller expected frequencies tend to overstate the size of a discrepancy) 1. increase sample size so that it is five times larger than the number of levels of the categorical variable 2. increase number of levels of the categorial variables (more k, larger the critical values)

Emirical rule

Normal distributions --- 99.7% data lie w/in three standard deviations, 95% (not exactally) falle w/in 2 SD. and at least 68% fall in lie w/ one standard deviation of the mean

Relative Frequency

Observed frequency / total frequency count

Sample Design (experimental)

Order does not matter Sample without replacement(samples of AA not possible from a population of ABC) Total number of samples possible= N!/(n!(N-n)!)

Related samples

Participants are observed in more than one group (repeated measures design) or they are matched, experimentally or naturally, based on common characteristics or traits (matched-pairs design)

Measure of effect size for chi-square test for independence

Phi Coefficient & Cramer's V

μ

Population Mean; μ = Σx/N

N

Population size

σ ^2

Population variance = Σ (x - μ )^2 / N or SS / N

Computational formula for variance

Population: SS = Σ x^2-(Σ x)^2/N, where σ^2 = SS/N Sample:SS = Σ x^2-(Σ x)^2/n, where σ^2 = SS/n-1

Skewed Distribution

Positively skewed: includes set of scores substantially above other scores (towards right tail in graph) Negatively skewed: includes set of scores that are substantially below the other scores (towards left tail in graph) When skewed use median to describe

Interval width

Real Range/ number of intervals

Cumulative Relative Frequency Distribution

Relative percents and relative frequencies summarize the percentage and proportion of scores falling into each interval, respectively

Standard Error of the mean

SEM or σvM or SE - standard deviation of a sampling distribution of sample means. standard error or distance that sample mean values deviate from the value of the population mean =sqrt(σ^2/n) = σ/sqrt(n) Larger the standard deviation the larger the error

M

Sample Mean; M = Σx/n

s^2

Sample Variance = Σ(x-M)^2/(n-1) divide by n-1 b/c as an unbiased estimator: or else typically less than the population variance b/c use smaller number of points

n

Sample size

μvM

The average sample mean. Is an unbiased estimator follows central limit theorem Minimum variance

one sample t test

Used to compare mean value measured in a sample to a known value in the population. Used to test hypotheses concerning the mean in a single population with an unknown variance.. sample variance used to estimate pop variance Assume: Normality, Random Sampling, Independence 1. state null hypothesis (μ = to whatever) 2. Find df (n-1) in order to find CV (based on alpha) 3. find estimated standard error (sVM) plug in value for sample mean (all scores summed divided by number of scores) 4. decision Estimated cohens d-- (a negative d = changes below the population mean) Proportion of variance( measure of effect size in terms of the proportion/percent variability in a dependent variable that can be explained for by a treatment = (variability explained / total variability) How much variability for a dependent variable can be accounted for by the treatment = eta-squared & Omega Squared Treatment: unique characteristic of a sample/way researcher treats a sample Conficence interval - interval or range of possible vales within which a population parameter is likely to be contained = M ±t(sVM) (when value stated by null is outside a confidence interval, significant effect on a population) level of confidence - 99% = α - .01 etc.

One sample z test

Used to test hypotheses concerning the mean in a single population with a known variance--MUST KNOW POPULATION VARIANCE OR SD Use α to find zscores at this place (critical values +/- for two tailed)... When have a two tailed test must split α/2 B/c rejection region in two areas

Mw

Weighted mean; combined mean of 2+ groups in which number of scores in each group is disproportionate / unequal Σ(Mxn)/Σx weighted sum / combined n

Inferential statistics

applying statistics to interpret the meaning of information

Descriptive Statistics

applying statistics to organize and summarize information

Quasi-Experimental Method

contains a quasi-independent variable (preexisting variable that is often a characteristic inherent to an individual, which differentiates the group or conditions being compared in a research study. B/c the levels of the variable are preexisting, it is not possible to randomly assign participants to groups

Ordinal Scale

convey order-- some value is greater than or less than another value

frequency observed (fo)

count/ frequency of participants recorded in each category or at each level of the categorical variable

frequency expected (fe)

count/frequency of participants in each category / at each level of the categorical variable, as determined by the proportion expected in each category fe = Np (number of participants = N) (proportion expected in each category = p)

Level of significance

criterion upon which a decision is made regarding the value stated in a null hypothesis. Based on the probability of obtaining a statistic measured in a sample if the value stated int eh null were true. typically set at 5%

Chi-square goodness-of-fit test

determine whether observed frequencies at each level of on categorical variable are similar to or different from the frequencies we expected at each level of categorical variable χ^2 is 0 when observed & expected frequencies are equal & gets larger as the differences get larger χ^2Vobt = Σ ((fo-fe)^2/fe) 1. null hypotheses - expected frequencies are correct(proportions expected in each catetgory are correct) 2.df = k- 1 (k = levels) & find critical values (rejection region alwawys in upper tail -- always one way) 3. compute test statistic 4. make decision Used to confirm that a null hyp is correct

z statistic

determines the number of standard deviations in a standard normal distribution that sa sample mean deviates from the population mean stated int eh null hyp

Sampling without replacement

each participant / item selected in snot replaced before the next selection. Most commonly used in behavior research... probability of each selection is conditional. the probabilities of each selection are not the same

z transformation

formula used to convert any normal distribution with any mean and any variance to a standard normal distribution with a mean equal to zero and a standard deviation = 1. -- z = x-μ /σ (population scores) --z = x-M / SD (sample scores) Used to determine the likelihood of measuring a particular sample mean, from a population with a given mean and variance z = M -μ / σvM

Unbiased estimator

it's found from a randomly selected sample that equals the value of its respective population parameter on average--equals population mean on average (sample mean) When M = (Σ x)/n, them M = μ on average

test statistic

mathematical formula that identifies how far or how many standard deviations a sample outcome is from the value stated in a null hypothesis. used to made a decision regarding a null hypothesis in comparison to criterion value

Variability

measure of dispersion/ spread of scores in a distribution and ranges from 0 to + infinity Determines how dispersed scores are in a set of data. Measures of variability = range, variance, standard dev

Cohen's d

measure of effect size in terms of the number of standard deviations that mean scores shifted above or below the population mean stated by the null hyp... the larger the d, the larger the effect on the population, when the d = 0 there is no difference b/w the two means Cohen's d = (M -μ) /σ Small d < .2 Med .2 < d <.8 Large .8 < d

Interval Scale

measurements that have no true zero and are distributed in equal units

Hypothesis Testing

method for testing a claim or hypothesis about a parameter in a population using data measured in a sample. In this method, we test a hypothesis by determining the likelihood that a sample statistic would be selected if the hypothesis regarding the population parameter were true 1. State hypotheses 2. Set criteria 3. Compute test stat 4. Make decision

Experimental method

methods/procedures t make observations in which the researcher fully controls the conditions and experiences of participants by applying three requires elements of control (manipulation, randomization, comparison/control) to isolate cause and effect relationships b/w variables

Medain

midpoint in a distribution... middle score Is not affected by outliers Graphically estimated by a cumulative percent distribution... 50th percentile of cumulative percent distribution = median

Multimodal distributions

more than two modes; only mode used to describe multimodal dsitrbtions

Nonmodal data

no modes, straight line, frequency of each score the same

Gosset's t distribtion

normal like distribution greater variability in the tails than normal b/c sample variance is substituted of the population variance to estimate the standard error in the distribution... leads to larger probability of obtaining sample means farther form the population mean symmetrical, asymptotic, mean/median/mode all located at center of the distribution More df in the t distribution the more closely distribution resembles the normal distribtuino

Nominal Scale

number is assigned to represent something or someone

Unimodal distributions

one mode in distribution

Real Range

one more than the difference b/w the largest and smallest value in a data set

directional test

one tailed test.. alternative hyp states as > or < null . researcher interested in a specific alternative to the null (α not /2) Associated with greater power, assuming the value stated in null is false

Probability

p(x) = f(x)/ sample space Sample space = totalk number of possible outcomes -- varies b/w 0 and 1 --probability can never be netagive

Matched-pairs design

participants are selected and then matched, experimentally or naturally, based on common characteristics or traits

p value

probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true. p value for obtaining a sample outcome is compared to the level of significance

Power

probability of rejecting a false hyp. Probability that a randomly selected sample will show that the null hypothesis is false when the null hyp is indeed false. = 1 - β As effect size increases, power increases As sample size increases, standard error decreases, increasing power Increasing alpha level will increase power (larger the rejection region, the larger the likelihood of rejecting the null, the greater the power) Decreasing Beta, standard deviation, or standard error will increase power

Independent samples

quasi experiment - take already known populations and sort them into groups experimental method- take a sample from the population and then split the sample into randomly assigned grous

Interquartile Range (IQR)

range of scores b/w the upper and lower quartiles of a distribution.. each interval continues 25% of data Q3-Q1

Central Limit Theorem

regardless of the distribution of scores in a population, sampling distribution of sample means selected at random from that population will approach the shape of a normal distribution, as the number of samples in the sampling distribution increases. Probability distribution for obtaining a sample mean from a population is normal -- FROM EMPIRICAL RULE, then know that at least 95%~ within 2 SD

Mode

score/scores that occur most often in a distribution; used to describe nominal scale data that identify something or someone, nothing more b/c nominal scale is not a quantity

Pic Chart

shape of a circle that is used to summarize the relative percent of discrete and categorical data into sectors (a particular portion that represents a category.

Sample Design (Theory)

specific plan / protocol for how individuals will be selected / sampled from a population of interest. Order matters- selecting A first and B second, and then B first and A second are two different possible samples that can be selected from the population Sample with replacement. Total number of samples possible = N^n

Effect size

statistical measure of size of an effect in a population... allows researchers to describe how far scores shifted in the population or the percent of variance that can be explained by a given variable.

Ogive

summarizes cumulative percents of continuous data at the upper boundary of each interval. Dot and line graph.

Estimated Standard error (svM)

svM = sqrt (s^2/n) = s/sqrt(n) estimate of the standard deviation of sampling distribution of sample means selected from a population with an unknown variance. estimate of standard error OR the standard distance that sample means deviate from the value of the population mean stated in the null

Normal distribution

symmetrical distribution in which score are similarly distributed above and below the mean, the median, and the mode at the center of the distribution. Typically use mean to describe such a dist b/c all scores are included --Normal distribution is theoretical --Mean median & mode are located @ 50th percentile --normal distribution is symmetrical -- the mean can equal any value --the standard deviation can equal any positive value --total area under the curve is equal to 1 --tails of a normal distribution are asymatotic

Parametric tests

t test, ANOVA, correlation, regression analysis. Tests are used to test hypotheses about parameters in a population in which the data are normally distributed and measured on an interval/ratio scale of measurement

Related-Samples t Test

test hypothesis concerning two related samples selected form populations in which the variance in one or both are unknown difference score: must find first, subtract ones score in each pair from the other error refers to any unexplained difference that cannot be attributed to having different treatments advantages: can be more practical, reduces standard error (computing difference scores prior eliminates between persons source of error), increases power (reduces standard error etc) tobt = MvD - μVD / sVMD (mean difference between two related samples from mean difference in hyp)/(estimated standard error for the difference of scores) df = nVD- 1 (number of difference scores -1) Assumptions: Normality, Independence within groups 1. state null(no difference ) 2. df --> ciritcal values 3. Compute mean(sum difference scores / number of difference scores-- sign matters), variance (same formula for sample variance - SS/nVD-1 where SS = ΣD^2 - ((ΣD)^2/ nVD), and standard deviation of difference scores (sVD = sqrt(sVD^2)) Compute estimated standard error for difference scores Compute test statistics 4. make decision Cohen's d =MvD/ sVD Proportion of Variance Confidence Interval : MvD ± t(sVMD)

nondirectional test

two-tailed test... alternative hyp stated as not equal to the value stated in the null more conservative and eliminate possibility of committing Type III Error (decision would have been to reject null but researcher looked in the wrong tail)

When is data grouped vs ungrouped?

typically ungrouped for data sets with only a few different scores and for qualitative / categorical variables. For ungrouped data,frequency of each individual score or category is counted

between groups effect

use test statistic to determine whether this difference is significance, the difference we are interested in testing

Chi-square test for independence

used to determine whether frequencies observed at the combination of levels of two categorical variables are similar to frequencies expected... when record observations across the levels of two categorical variables with any number of vairables if two categorical variables are independent, they are not related or correlated (frequencies do not vary across tables) BEFORE determine expected frequencies --row & column totals, when all equal expected frequencies will also be equal... divide number of participants observed (N) by number of cells to find expected frequency in each cell --row& column totals not equal .. compute fe = (rowtotal x column total) / N --sum of expected frequencies will equal N 1. null = two categorical variables are independent adn not related 2. df = (k1-1)(k2-1) -each categorical variable is associated with k-1 df find critical values (level of sig = .05) 3. compute test statistic 4. make a decision

z score

value on x-axis of a standard normal distribution. numerical values specified the distance / number of standard deviations that a value is above or below the mean

Discrete variable

who units / categories that are not distributed along a continuum

Error in related samples t test

within groups error: differences occur within each group, even though all participants experienced the same treatment in each group (differences are not due to having different groups Between persons error: Differences occur between participants... not consistent change and is therefore a source of error (not due to having different groups but based on who participated)

standard normal distribution

zdistribtion: mean = 0 and standard dev =1. distributed in z score units along the x axis used to determine the probability of a certain outcome in relation to all other outcomes... finding the area under the curve --> probability

Phi Coefficient

Φ = sqrt(χ^2 / N) N= total number of participants observed square root of the proportion of variance, tyoe of correlation coefficient for the relationship b/w two categorical variables computing variance is not required to calculate phi can only be sued for 2 X 2 chi square

Alpha level

α, level of significance/ criterion for hypothesis test. largest probability of committing a type I Error that we will allow and still decide to reject the null hyp When we make a decision compare p value to α level.. when the p value is less than α, reject the null hyp

Type II Error

β probability of retaining a null hypothesis that is actually false .. OKAY error to make

Variance of a sampling distribution

σ^2 = (M - μvM)^2/N^n

Chi-square test

χ^2 test hypotheses about the discrepancy b/w the observed and expected frequencies for the levels of a single or of two categorical variable(s) observed together... does not compute variacne Chi-square distribution is positively skewed, rejection region is always placed in the upper tail

Omega-squared for t test

ω^2 = (t^2 -1)/ (t^2 +df) smaller estimate of effect size... more conservative estimate making it less biased than eta-squared


Ensembles d'études connexes

Ch 4 Managing Income Taxes Part 1

View Set

PrepU Chapter 2: Factors Influencing Child Health

View Set

Coulombs Law: Physics Quiz Multiple Choice

View Set

Unit 5: Agriculture and rural land use vocabulary

View Set

Evolution of Philippine Politics and Governance [WEEK 5]

View Set

BCOM Correctness of Communication

View Set

Chapter 2 bio- elements atoms and bonds

View Set