Psych 9 Midterm
Describe Sample Mean with z-Score (Sample Mean to z-Score & Interpretation)
(1) Create the sampling distribution of means with a mean equal to the mean of the underlying raw score population. (2) Compute the z-score for the sample mean: Using the population SD of the underlying raw score population and your sample N, compute the standard error of the mean. Compute z, finding how far your sample mean (x bar) is from the mean (mu) of the sampling distribution, measured in standard error units. (3) Use the z-table to determine the relative frequency of z-scores above or below this z-score, which is the relative frequency of sample means above or below your mean.(C5: p84)
Condition
(aka level) A specific amount of or category of the independent variable is called a/an (C1: p12)
Power
(of a statistical test) the probability of rejecting a false null hypothesis (i.e. probability of not making a Type II error); alternate definition: probability that we will detect a relationship and correctly reject a false null hypothesis (correctly concluding that the sample data represents a real relationship); maximize it by maximizing the chances that the results will be significant: parametric procedures (e.g. z-test), one-tailed test, and large N does this (C7: p123)
z-Score
**If data is normal** reveals relative standing of the individual (location) among others in the sample or population: raw score's distance from the mean in standard deviations (zero distance: the raw score is equal to the mean); calculated from interval or ratio scores; focuses on the individual almost all the time; relative frequency of a particular z-score will be the same on all normal z-distributions; do NOT compute z-scores using the estimated population standard deviation (C5: p69-70)
Why N-1 for Populations?
1 is the number of parameters' variance being measured, when subtracted from N it allows an unbiased estimate via quotient inflation (increases size of variance score, preventing underestimate) (C4: p60)
Normal Distribution/Normal Curve
50% of scores above or below the mean; a symmetrical, bell-shaped polygon (curve); far left and right portions contain the relatively low-frequency, extreme high or low scores are called the tails of the distribution; a frequency distribution; for interval or ratio data; frequencies first decrease slightly but then decrease drastically; mean, median, and mode are equivalent (C2: p26)
Sampling Distribution of Means
A frequency distribution of all possible sample means occurring when an infinite number of samples of the same size N are selected from one raw score distribution (e.g. SAT scores at one college compared to the national average); alternate definition: frequency distribution showing all possible sample means that occur when samples of a particular size are drawn from population; 2nd alternate definition: a normal distribution having the same mean as the underlying raw score population used to create it, and it shows all possible sample means that can occur when sampling from that raw score population; always approximately normally distributed (C5: p80-81) Alternate definition: provides a picture of how often different sample means occur simply because of random chance (C6: p92)
Polygon
A frequency graph showing a data point above each score, with the adjacent points connected by straight lines; used with many different interval or ratio scores; e.g. normal distribution/curve, bimodal distribution (C2: p24)
Y-axis
A graph of a frequency distribution shows the frequencies on the ___, and scores on the x-axis
Parameter
A number that describes a population of scores; symbolized by a letter from the Greek alphabet; obtained when applying inferential procedures (C1: p11) require certain assumptions about the raw score population represented by the sample; used when we compute the mean (i.e. when (1) population of dependent scores is at least approximately normally distributed and (2) scores are interval or ratio) (C7: P108)
Statistic
A number that describes an aspect of the scores in a sample; obtained when applying descriptive procedures; represented by english letters (C1: p11)
Nominal Scale
A scale in which each score is used for identification and does not measure an amount; rather, it categorizes or classifies individuals--e.g. football jerseys; assumed to be discrete; no "true" zero value (C1: p15)
Non-parametric Statistics
Inferential procedures that do not require stringent assumptions about the raw score population represented by the sample; used with the median and mode (i.e. with nominal or ordinal scores or with skewed interval or ratio distributions) (C7: 108)
correlational study; experiment
In a/an ______________________, the experimenter measures scores on two variables without manipulation of either variable (if the means change as the conditions change, then the raw scores are changing, and a relationship is present). In a/an ______________________, the experimenter manipulates or changes one variable and measures scores on the other. (C1: p14 & 12 respectively)
Variables
In research, the aspects of the situation or behavior we measure--NOT people; qualitative or quantitative (C1: p6) Whether a variable is continuous or discrete and whether it is measured using a nominal, ordinal, interval, or ratio scale are factors that determine which statistical procedure to apply (C1: p17)
Measures of Variability
SD, variance, range; describe extent to which scores differ/spread out in a distribution (inconsistent behavior across participants); higher values=mean less accurately describes data; variance and standard deviation are the two measures of variability that indicate how much the scores are spread out around the mean for interval and ratio data that's normally distributed (C4: p53-55)
Random Selection/Sampling
Selecting samples so that all events or members of the population have the same, equal chance of being selected (C7: p89, p94)
Sample Mean
The best predictor of an individual score in a sample of scores; preferred measure of central tendency when distribution is symmetrical, and the scale of measurement is interval or ratio (C3: p41)
Simple Frequency
The number of times that a score occurs in data; multiply relative frequency by N (C2: p30)
Independent Variable
The variable systematically changed or manipulated by a researcher in an experiment is called the (C1: p12)
Why Square Deviations from Mean?
To compensate for the fact that deviations about the mean always sum to zero; prevents positive and negative deviations from cancelling (C4: p56)
Relationship
When a change in the scores of one variable is accompanied by a consistent change in the scores of another variable, we have what is known as a; positive, negative, perfectly consistent/constant types; if the means change as the conditions change, then the raw scores are changing, and a relationship is present (C1: p7) if one is indicated by the sample data: (1) relationship operates in nature and it produced our data OR (2) We are being misled by sampling error
Nonsignificant
When the z-obt does not fall beyond the critical value (statistic does not lie within the RoR) so we do not reject (fail to reject) H0; indicates the results are likely to occur if the predicted relationship does not exist in the population; sampling error could have produced our data; does NOT prove H0 is true, and Ha neither proven nor disproven (C7: p116-117) p>0.05 when reporting a nonsignificant result (C7: p120)
Data Point
a "dot" placed on any graph is called a
Bimodal Distribution
a distribution forming a symmetrical polygon with two humps where there are relatively high-frequency scores, with center scores that have the same frequency (C2: p28)
Empirical Probability Distribution
a probability distribution found by counting actual occurrences (relative frequency) of an event or by using a random sample (C6: 90-91)
Sample
a subset of a particular group of individuals selected from the population of individuals we are interested in to which a law is applied; accurately (if representative, i.e. no sampling error) or inaccurately represents population; (C1: p5)
Population Mean
average of the scores in the population (center of distribution); sum of the deviations around it equals zero; estimated by calculating the mean of a sample drawn from the population; the typical score, and the score we predict for any individual in the population (C3: p49); inferential description of how nature works (C3: p49)
Sample Variance
average of the squared deviations of scores around the sample mean (average squared deviation); biased underestimate of population variance; NOT inferential stat (C4: p55-56)
Population Variance
average squared deviation of scores around the population mean; underestimating bias (C4: p59-60)
Continuous
can be measured in fractional amounts so decimals make sense--e.g weight (C1: p16-17)
Discrete
can only be measured in fixed amounts, which cannot be broken into smaller amounts; cannot be measured in fractional amounts--e.g tickets sold (C1: p16-17)
General Variance
computed and found to be rather large: scores are spread out around the mean (not similar to mean); greater it is, the less accurately the scores are represented by one central score; can NEVER be negative; low: consistent scores (distribution not as spread out), central tendency accurately describes, less distance b/w scores; high: inconsistent scores, central tendency inaccurate, more distance b/w scores (C4: p53)
Positive Skew
contains extreme high scores having low frequency, but does not contain low frequency, extreme low scores. asymmetrical distribution with low- frequency, extreme high scores, but without corresponding low-frequency, extreme low scores (C2: p27)
Negative Skew
contains extreme low scores having low frequency, but does not contain low frequency, extreme high scores. asymmetrical distribution with low- frequency, extreme low scores, but without corresponding low-frequency, extreme high scores; its polygon has only one pronounced tail, over the lower scores (C2: p27)
Critical Value
defines the minimum absolute z-value to be in the region of rejection (non-inclusive); alternate definition: inner edge of the region of rejection; must be exceeded in order to reject the null hypothesis and be in RoR; criterion and number of tails determine it: z-crit=±1.96 for a 0.05 criterion in a two-tailed test, and z-crit=±1.645 for a one tailed test and 0.05 criterion (C6: p99)
Alternative Hypothesis (Ha)
describes the population parameters represented by the sample data if the predicted relationship exists in nature (C7: p110)
Null Hypothesis (H0)
describes the population parameters the sample data represent if the predicted relationship does not exist in nature; rejected when z-obt falls beyond the critical value (statistic lies in the RoR) (C7: p111)
Experimental Hypothesis (H1)
describes the predicted relationship we may or may not find in an experiment b/w IV and DV (C7: 108)
Range
distance b/w two most distant scores (highest-lowest); the lone variability measure for nominal or ordinal data (C4: p55-56)
z-Distribution (Standard Normal Curve is our model)
distribution produced by transforming all raw scores in the data into z-scores; SD: always be equal to 1; always has the same shape as the raw score distribution; mean of any z-distribution is 0; has the same frequency, relative frequency, and percentile as raw score; can be used to compare different variables (e.g. stat versus English classes' scores on same graph); Relative frequency can be computed using the proportion of the total area under the curve; z-table on p253-255 (C5: p72-75)
Type 2 Error
failing to reject the null hypothesis when it is false (and Ha is true; false negative of significance); sample mean is so close to the mean described by H0 we conclude the predicted relationship does not exist when it really does; oly researcher affected; if you can possibly make one error, you cannot make the other error (C7: p122)
Histogram
frequency graph similar to a bar graph but with adjacent bars touching; for a small range of interval or ratio data bc continuous data types (C2: p24)
Deviation
gives the score's location relative to (i.e. distance from) the mean (therefore more informative than raw score); equal to the score minus the mean; larger the deviation, the farther the score is from the mean; also, the amount of error between the X we predict for someone and the X actually gotten. The total error over all such predictions equals the sum of the deviations, which is zero (C3: p44-45)
Bar Graph
has a vertical bar centered over each X score and the height of the bar corresponds to the score's frequency. Notably, adjacent bars do NOT touch; for nominal or ordinal data, both are discrete data types (C2: p23-24)
Interval Scale
indicates an actual quantity, and there is an equal amount separating any adjacent scores. Interval scales do not have a "true" 0 and allows negative numbers--e.g. temperature; assumed to be continuous (C1: p16)
Ordinal Scale
indicates rank order. There is no score of 0 (zero), and the same amount does not separate every pair of adjacent scores--e.g. TV show preferences; assumed to be discrete (C1: p16)
Probability of Sample Means
larger the absolute value of a sample mean's z-score, the less likely the mean is to occur when samples are drawn from the underlying raw score population; sampling distribution of means is used to determine the probability of randomly obtaining any particular sample means (C6: p92)
Probability
likelihood of an event ONLY when a population is randomly sampled; equals the event's relative frequency over the long run in the population; Past relative frequency is an indicator of future frequency, always between zero and one, symbol is p (C6: p89-90) the proportion of the area under the curve for given scores is also the relative frequency of those scores. (C6: 92)
Quantitative
measures amounts (C1: p7)
Ratio Scale
measures an actual quantity with an equal amount separating any adjacent scores, and 0 truly means none of the variable is present--e.g height; assumed to be continuous (C1: p16)
Theoretical Probability Distribution
model based on how we assume nature distributes events in the population (i.e. expected relative frequency) (C6: 90-91) theoretical probability distribution is the standard normal curve; the proportion of the area under the curve for given scores is also the relative frequency of those scores. (C6: 92)
Frequency
number of times a given score occurs in a sample (C2: p21--includes f distribution)
Representative Sample
one in which the characteristics of the individuals and scores in the sample accurately reflect the characteristics of the individuals and scores in the population; lets one generalize the results of study to wider population; It is always possible to obtain a sample that is not representative=need for inferential statistics to narrow possibility (C6: p94) determined by where z-score (of sample mean) falls related to critical values (and the RoR they produce) examined on sampling distribution of means; if outside RoR: difference between sample and population means are acceptable and attributed to sampling error (C6: p100-101)
z-Test
parametric procedure used in a single sample experiment when the standard deviation of the raw score population is known; assumes (1) randomly selected one sample, (2) dependent variable is at least approximately normally distributed in the population and involves an interval or ratio scale, (3) know the mean of the population of raw scores under another condition of the independent variable, and (4) know the true standard deviation of the population described by the null hypothesis (C7: p113) mean of the sampling distribution always equals the mean of the raw score population that H0 says we are representing; final step is to interpret this z-obt by comparing it to z-crit (C7: p114) steps: (1) Identify predicted relationship and if one or two tailed test, (2) select criterion, RoR, and critical values, (3) compute standard error of the mean, then plug into formula for z-obt (value of the mean is the mean of the sampling distribution, which is also the mean of the raw score population that H0 says is being represented), and (4) compare z-obt to z-crit to interpret significance (C7: p117)
Standard Normal Curve
perfect normal curve that serves as our model of any approximately normal z-distribution; most accurate when (1) we have a large sample (or population) of (2) interval or ratio scores that (3) come close to forming a normal distribution (C5: p75) combine w/z-table to find relative frequency of sample means in any part of a sampling distribution (C5: p83)
One Tailed Test
predicts relevant results will only fall in one direction (increase or decrease) and criterion=0.05; interested in positive z-scores: reject the idea the sample mean is representative only if it falls in the positive tail; interested in negative z-scores: reject the idea the sample mean is representative only if it falls in the negative tail (C6: p102) APA style ex: H0: μ ≤ 11; Ha: μ > 11
Probability of Specific Sample Mean
probability of selecting a particular sample mean is the same as the probability of randomly selecting a sample of participants whose scores produce that mean (finding area under curve, aka relative frequency); a sample mean having a larger z-score is less likely to occur when we are dealing with the underlying raw score population (and vice versa) (C6: p93-94)
Descriptive Statistics
procedures (represented by english letters) for organizing and summarizing sample data; describes relationships (C1: p10)
Inferential Statistics
procedures for drawing inferences about the scores and relationship that would be found in the population; decides whether our sample accurately represents the relationship found in the population (C1: p10) alternate definition: Uses sample data to make statements about the scores and relationships in the general population used to determine if sampling error or sample represents other population (C6: p96) (C7: p107)
Probability of Individual Scores
proportion of the total area under the standard normal curve for particular scores equals the probability of those scores (C6: p93)
Two Tailed Test
reject the idea the sample mean is representative if it falls in either the negative tail or the positive tail of the distribution; ±1.96 is the critical value of z for a criterion of .05 in a two-tailed test (C6: p99) used when we do not predict the direction in which dependent scores will change (C7: p109) APA Style Ex: H0: μ = 11; Ha: μ ≠ 11
Type 1 Error
rejecting H0 when H0 is true (false positive of significance); probability of committing increases with criterion's value (p-value); so much sampling error we concluded the predicted relationship exists when it really does not; its theoretical probability=criterion (actual probability slightly less); never know when we make it because we never know whether our variables are related in nature; dangerous bc claims relationship (e.g. drug effectiveness) exists when it doesn't; if you can possibly make one error, you cannot make the other error; criterion of 0.05 or less minimizes risk (C7: p121)
General mean
score located at the exact mathematical center of a distribution (i.e. one score that more or less describes everyone's score, with the same amounts of more and less); used to summarize interval or ratio data in situations when the distribution is symmetrical (normal) and unimodal; the most common measure of central tendency; considers the magnitude of every score, so it does not ignore any information; inaccurately describes skewed distribution: pulled towards tail; if the means change as the conditions change, then the raw scores are changing, and a relationship is present (C3: p41-42)
Mode
score(s) having the highest frequency in the data; used to describe central tendency when the scores reflect a nominal scale or with a distinctly bimodal distribution of any score scale; When data are normally distributed, the mode is the same score as the median and mean; issues: (1) no mode when all scores occur w/same frequency, & (2) doesn't account for less frequent scores; has no symbol; best MCT with bimodal distributions (C3: p39)
Sample Standard Deviation
square root of average of the squared deviations of scores around the sample mean (sqrt of sample variance); biased underestimate of population SD; shows how accurately the mean summarizes the scores; "somewhat like" the average deviation of scores around the mean (C4: p57)
General Standard Deviation
square root of the average of the squared deviations around the mean (i.e. sqrt of variance); tells score consistency, how far scores are spread about mean (higher value=wider distribution), average deviation (of scores) from mean
Standard Error of the Mean
standard deviation of the sampling distribution of means; "average" amount that the sample means deviate from the mean of the sampling distribution; size depends on size of population SD bc more variable raw scores are likely to produce very different samples each time, so their means will differ more=standard error of the mean is larger; The mean of the sampling distribution equals the mean of the underlying raw score population the sample is selected from. (C5: p81-82)
Central Limit Theorem
statistical principle that defines the mean, standard deviation, and shape of a sampling distribution; not have to infinitely sample a population of raw scores bc: (1) A sampling distribution is always an approximately normal distribution, (2) The mean of the sampling distribution equals the mean of the underlying raw score population used to create the sampling distribution, and (3) The standard deviation of the sampling distribution is mathematically related to the standard deviation of the raw score population (C5: p80)
Population
the entire group to which a law of nature/behavior applies; represented by Greek symbols (C1: p5)
Cumulative Frequency
the number of scores in the data that are at or below a particular score; percentile preferred over it bc easier to interpret (C2: p33)
Percentile
the percent of all scores in the data located below a score; e.g. if you scored at the 75th percentile, then 75% of the group scored lower than you (and 25% scored above you); On normal curve, a score's percentile is the percent of the area under the curve to the left of the score (C2: p32-33)
Median (Mdn)
the point at or below which 50% of the scores fall (50th percentile); When data are normally distributed, the median is the same score as the mode; for ordinal data or when you have interval or ratio scores in a very skewed distribution; If N is odd: the median is the score in the middle position; If N is even, the median is the average of the two middle scores; Better than mode bc (1) can have only one median and (2) median will usually be where most scores in a distribution are located; best MCT for skewed distribution (C3: p40)
Measures of Central Tendency
the points around which most of the scores are located; tells whether scores are generally low or high; summarizes the location of a distribution on a variable; indicates where the center of the distribution tends to be located (describes group as a whole); located at the same score on a perfectly normal distribution (close to the same if roughly normal); INACCURATELY describes skewed distributions (off by even 0.03); Measure(s) used depends on: scale of measurement used on the dependent variable and shape of the distribution, if you have interval or ratio scores; first step in summarizing any set of data is to compute its central tendency (C3: p37-38)
Region of Rejection (RoR)
the portion of the sampling distribution in which values are considered too unlikely to have occurred by chance: we reject the idea that the sample represents the underlying raw score population; alternate definition: where a sample mean is so far above or below the population mean it is unbelievable that chance produced such an unrepresentative sample; if the z-value falls within the it, conclude that sample mean likely represents some other population (and vice versa) (C6: p98)
Criterion
the probability defining samples as unlikely to be representing the raw score population; 0.05 in psych research (aka alpha level) (C6: p99)
Relative Frequency
the proportion of the time a score occurs in a sample; proportion of the total area under the normal curve between given scores represents the relative frequency of those scores (f/N, where N=number of scores in given range) (C2: p28-31)
Population Standard Deviation
the square root of the population variance, or the square root of the average squared deviation of scores around the population mean; underestimating bias (C4: p59-60)
Sum of Deviations about the Mean
the sum of all the differences between the scores and the mean; for any distribution having any shape: sum of the deviations around the mean always equals zero (C3: p44)
One-Sample Experiment
to perform it, one must know the population mean under some condition of the independent variable other than the one being tested (e.g. IQ from sample representing population w/smart pill compared to IQ of population w/o smart pill) (C7: p109)
z-Score to find Relative Frequency of Raw Scores
transform raw scores into z-scores and then use the standard normal curve and z-table on p253-255 (C5: p77)
Estimated/Unbiased Population Standard Deviation
unbiased estimate of the population standard deviation calculated from sample data using N-1 (C4: p60-61)
Estimated/Unbiased Population Variance
unbiased estimate of the population variance calculated from sample data using N-1; amount each score deviates from the mean, which will then form our estimate of how much the scores in the population deviate from population mean (C4: p60-61)
Probability and Sampling Distribution of means
useful bc: without actually sampling the underlying raw score population, we can see all of the means that occur, and we can determine the probability of randomly obtaining any particular means; provides a picture of how often different sample means occur simply because of random chance (C6: p92)
Population Raw Score to z-Score
uses population SD, population mean, and individual score; conveys raw score's distance from population mean in population SD units (C5: p71)
Dependent Variable
variable measured as an experiment is being carried out is called the; behavior/attribute the IV expected to influence (C1: p12-13)
Sampling Error
when random chance produces a sample statistic (e.g., s2) not equal to the population parameter it represents (e.g., σ2) and is inaccurate (an error); It is always possible to obtain a sample that is not representative: any sample might poorly represent one population because of sampling error or accurately represent a different population (C6: p95)
(Statistical) Significance
when the null hypothesis (H0) has been rejected and (consequently) the alternative hypothesis (Ha) is accepted; does not prove IV caused score change (in experiment); indicates the results are unlikely to occur if the predicted relationship does not exist in the population (i.e. unlikely to be sampling error); never proves that H0 is false; we do not know the exact mean of the population represented by our sample (C7: p115-116) Reporting in APA style ex: z-obt= + 2.12, p<0.05