HTH 320 Midterm
weighted mean
Combined mean of two or more groups of scores, where the number of scores in each group are unequal -often used when samples are of disproportionate size -Mw = Σ(M x n)/ n
cumulative frequency
"Bottom up" -add frequencies beginning from bottom and working up -top frequency equals total number of measures recorded
summation
-Summation is done after operations in parentheses, squaring, and multiplication or division. -Summation is done before other addition or subtraction
ungrouped data
-a set of scores or categories distributed individually, where the frequency for each individual score or category is counted -ungrouped when number of different scores is small, and for qualitative or categorical variables -to distribute, skip to final step for constructing a frequency distribution -not constructed/distributed into intervals
frequency distribution
-a summary display for a distribution of data -an organized way to present data, -showing the number of individuals located in each category on the scale of measurement -can be either a table or a graph -always shows the categories that make up the scale, and the frequency, or number of individuals, in each category -to determine how many subjects were in the study, sum all the numbers in the frequency (f) column
Nominal
-categorical in nature -ex: gender, favorite color, seasons -does not imply any order among responses -ex: if i asked you all to tell me your favorite season (fall, winter, spring, summer), i couldn't put fall people ahead of spring people.
relative percent
-distributes the percent of scores in each interval -multiple each relative frequency times 100 -the sum of relative percents equals 100%
cumulative relative frequency
-distributes the sum of relative frequencies across a series of intervals -can be summarized from bottom up or top down -the total sum of relative frequencies is equal to 1.0
cumulative percent
-distributes the sum of relative percents across a series of intervals -presented from bottom up; also called a percentile rank -total cumulative percent equal to 100% -a cumulative percent of scores from bottom up (shaded); also called percentile ranks.
rules for a simple frequency distribution
-each interval is defined -each interval is equidistant -no interval overlaps -all values are rounded to same degree of accuracy measured in original data
percentages
-expresses relative frequency out of 100 -percentage = p(100) = f/N(100) -can be included as a separate column in a frequency distribution table
central tendency in skewed distributions
-mean, influenced by extreme scores, is found far toward the long tail (positive or negative) -median, in order to divide scores in half, is found toward the long tail, but not as far as the mean -mode is found near the short tail. -if Mean - Median > 0, the distribution is positively skewed. -if Mean - Median < 0, the distribution is negatively skewed
variability
-measure for the dispersion or spread of data in a distribution -a quantitative measure of the differences between scores -describes the degree to which the scores are spread out or clustered together -interval and ratio data are used -Ranges from 0 to + ∞ -variability of scores can never be negative; scores can either not vary (variability is 0) or they can vary (variability is greater than 0) -as soon as one score differs from the others, there is variability
semi-quartile range (SIQR)
-measure of half the distance between the cutoffs for the upper and lower quartiles of a data set -computed by dividing the IQR in half
Ordinal
-measurements that convey order or rank alone -allows us to see that one value is greater or less than another, but does not tell us meaningful info about the distance between levels -ex: order that you finished a race (first place, second place, third place) -ex: hospital rankings ("#1 U.S. cardio)
interval (NOIR)
-numerical scales in which the intervals have the same interpretation throughout -ex: Fahrenheit temperature -no true zero point, even if the scale has an interval at zero -ex: zero degrees Fahrenheit does not represent that absence of temperature
ratio
-ratio scales have all the properties of the interval scale, with the addition of a true zero (the zero position represents the absence of the quantity being measured) -ex: money... someone who has $50 has twice a much money as someone with $25
Quasi experiment
-structured similar to an experiment, but meet one or both of the following conditions: -the study does not include a manipulated IV -the study lacks a comparison/control group -occurs when variables are preexisting or inherent to the participants themselves -based upon group affiliation you can't change or manipulate (sex, age, smoker or not) ex: do boys and girls (IV) differ in number of aggressive behaviors (DV)? are there differences between 20-year-olds and 40-year-olds (IV) in reaction time (DV)?
characteristics of a normal distribution
1) Normal distribution is mathematically defined -the shape of a normal distribution is specified by an equation relating each score (along x-axis) with each frequency (along y-axis) 2) Normal distribution is theoretical 3) Mean, median, and mode are all located at the 50th percentile 4) Normal distribution is symmetrical 5) The mean can equal any value 6) The standard deviation can equal any positive value
empirical rule
1) at least 68% of all scores lie within one SD of the mean 2) at least 95% of all scores lie within two SD of the mean 3) at least 99.7% of all scores lie within three SD of the mean
Characteristics of mean
1. Changing an existing score will change the mean 2. Adding a new score or completely removing an existing score will change the mean, unless the value equals the mean •3. Adding, subtracting, multiplying, or dividing each score in a distribution by a constant will cause the mean to change by that constant •4. The sum of the differences of scores from their mean is zero •5. The sum of the squared differences of scores from their mean is minimal
characteristics of standard deviation
1. The SD is always positive -scores can either vary (greater than 0) or not vary (equal to 0) -a negative variability is meaningless 2. The SD is used to describe quantitative variables -SD is a numeric value and used to describe variables measured in numeric units 3. SD most informative when reported with the mean -reporting mean and SD can inform reader of distribution for close to all recorded data -reported as the mean plus or minus SD (M ± SD) 4. The value for the SD is affected by the value of every score in the distribution -adding or subtracting the same constant to each score will not change the value of the SD -multiplying or dividing each score using the same constant will cause the SD to change by that constant
why we square each deviation in numerator when computing variance
1. The sum of the differences of scores from their mean is zero -to avoid this result, each deviation is squared to produce the smallest positive solution 2. The sum of the squared differences of scores from their mean is minimal -squaring deviations provides a solution with minimal error 3. Squaring scores can be corrected by taking the square root
data set
A collection of measurements or observations
datum (singular)
A single measurement or observation (commonly called a score or raw score)
inferential statistics
Allows sample results to be generalized to representative populations and interpret meaning of data -generalize to populations ex: common terms in inferential stats that you've probably heard- "statistically" and "practically significant"
scales of measurement
Degree to which measured variables conform to the abstract number system -determines the type of statistical analyses possible -includes: identity, order, equal distance, and absolute zero (i.e., complete absence of the variable) -popularly remembered by its acronym, NOIR
range
Difference between the largest value (L) and smallest value (S) in a set of data -range = L - S -calculation only considers the largest and smallest value in the distribution -simplest way to describe the dispersion of scores -most informative for data without outliers -crude, unreliable measure of variability
relative frequency
Distributes the proportion of scores in each interval -equals the frequency in an interval divided by the total frequency count -often used to summarize large data sets
population
ENTIRE set of individuals or items of interest -data are termed "parameters", which are usually a numerical value that describe a population. (derived from measuring individuals in population.)
percentile points and ranks
Identify individual rank by converting a frequency distribution to a cumulative percent distribution •Percentile point--value of a score on a scale below which a specified percentage of scores in a distribution fall •Percentile rank--percentage of scores with values that fall below a specified score in a distribution Prime example here is class rank or results on a standardized test. "top ___% of the class" or "scored better than __% of all those who took the test"
standard deviation
Measure of variability for the average distance that scores deviate from their mean -calculated by taking the square root of the variance
variance
Measure of variability for the average squared distance that scores deviate from their mean -value can be 0 (no variability) or greater than 0 (there is variability) -negative variance is meaningless -preferred measure of variability because all scores are included in its calculation
data (plural)
Measurements or observations of a variable
proportions
Measures the fraction of the total group that is associated with each score -proportion= p= f/N -called relative frequencies because they describe the frequency ( f ) in relation to the total number (N)
mode is reported for
Modal distributions of data -unimodal distribution--one mode -bimodal distribution--two modes -multimodal distribution--more than two modes -nonmodal distribution--no modes Nominal scale data -nominal scale data represent something or someone; it is not a quantity -key phrases: most often, typical, or common
mean is reported for
Normal distributions of data -a normal distribution is a symmetrical distribution in which the mean, median, and mode all fall at the 50th percentile or center of the distribution -used because the mean includes all scores in its calculation Interval and ratio scale data -used because data on these scales can meaningfully convey information regarding differences between scores and their mean
median is reported for
Skewed distributions of data -skewed distributions occur when a data set includes a score or group of scores that fall substantially above (positively skewed) or below (negatively skewed) other scores -the median is not influenced by the value of outliers Ordinal scale data -ordinal data convey direction only; not distance -because the distance of ordinal scale data from their mean is not meaningful, the median is used to describe these data
State whether a cumulative frequency, relative frequency, relative percent, cumulative relative frequency, or cumulative percent is most appropriate for describing the following situations.
Q:The frequency of businesses with at least 20 employees. A: Cumulative frequency (from the top down). Q: The frequency of college students with less than a 3.0 GPA. A: Cumulative frequency (from the bottom up). Q: The proportion of elderly patients consuming at or above 1,400 calories per day. A: Cumulative relative frequency (top down).
sample
Representative subset of a population -data are termed "statistics", which are usually numeric and describe the sample. (derived from measuring individuals in the sample) -most behavioral research is done on samples
Meeting experimental criteria
Requirement 1 -must manipulate levels of an IV -IV --manipulated; the proposed cause -ex: effect of bold versus regular font on memory recall Requirement 2 -random assignment renders group equivalent Requirement 3 -at least two groups must be observed -DV -- what is measured; proposed effect -DV operational definition--how will DV be measured -experimental group-- exposed to IV -bolded key terms in short passage -control Group--not exposed to IV -regular font key terms in short passage
discrete
Separate and indivisible categories; whole numbers only ex: socioeconomic class, amount of siblings
steps to summarize grouped data
Step 1: Find the real range -the real range is one more than the difference between the largest and smallest value in a list of data Step 2: Find the interval width -the interval width is the range of scores in each interval -divide real range by number of intervals chosen -round quotient to nearest whole number Step 3: Construct the frequency distribution -same number of intervals as Step 2
finding percentiles step
Step 1: Identify the interval Step 2: Identify the real range for the interval -real limits: 0.5 less than lower limit; 0.5 greater than upper limit -real range one point greater than observed range Step 3: Find the position of the percentile point within the interval -distance of percentile from top of the interval -divide distance from top by total range width of percentages -multiply fraction by width of real range Step 4: Identify percentile point -subtract position of percentile point from top of the real interval
mean
Sum of a set of scores in a distribution, divided by the total number of scores summed -most commonly reported measure of central tendency -the "balance point" in a distribution -population mean: μ = Σx/N -sample mean= M = Σx/n
experimental method/design
The end goal is to demonstrate support for a CAUSAL relationship. -high level of control is needed to isolate cause and effect. Therefore, for a study to be considered an experiment, researchers must satisfy the following requirements. 1. Manipulation (of variables that operate in an experiment) 2. Randomization (of assigning participants to conditions) 3. Comparison/control
median
The middle value in a distribution of data listed in numeric order -represents the midpoint, where half the scores fall above and half fall below the value -not influenced by outliers in data -median position = n+1/2 -calculation slightly differs between odd and even number sets -always begin by placing scores in numeric order
experimental sampling
The order of selecting individuals does not matter and each individual selected is not replaced before selecting again
theoretical sampling
The order of selecting individuals matters and each individual selected is replaced before sampling again -to determine the number of samples of any size that can be selected from a population: -total number of samples possible =N^n -ex:if we had samples of two participants (n = 2) from a population of 3 people (N = 3): •N^n = 9 samples
standard error
The σM (standard error) tells us the typical distance that M (sample mean) lies from μ (population mean).
The upper boundary of one interval and the lower boundary of the next interval do not overlap in a simple frequency distribution. Why?
To ensure that a single score cannot be counted in more than one interval.
descriptive statistics
Used to summarize, organize, and simplify data. -summarize sample results Ex: Tables, graphs, averages
if all the scores in a data set are the same, the SD is equal to...
ZERO. when all the scores are the same, they are all equal to the mean. Their deviations = 0, as does their Standard Deviation.
research method
a set of systematic techniques used to acquire, modify, and integrate knowledge concerning observable and measurable phenomena -also known as scientific method -common research methods include experimental method, quasi-experimental method, and correlational method
median when n is even
add the two middle numbers and divide by two
Correlational
analyses for prediction 1. quantifies the strength and direction of a relationship between two (or more) variables (X and Y) 2. variables measured as they naturally occur 3. lack of random assignment -lacks control to determine cause effect -ex: what is the relationship between SAT scores (X) and freshman college GPA (Y)?
measure of variability
describe the distribution, particularly, how spread out the distribution is -measure how well an individual score represents the distribution -if there is little difference between the scores, a single value is informative -if there is a lot of difference between the scores, a single value is not so informative
interval
discrete range of values within which the frequency of a subset of scores is contained
Σx^2 + 47
instructs you to square each score, add up the squared scores, then add 47 to that sum. answer is 100
statistics
mathematical procedures used to summarize, analyze, andinterpret observations
continuous
measured along a continuum; measured at any place beyond the decimal point ex: Olympic sprinter's time to finish race
central tendency
measures that tend to be toward the center of a distribution -a statistical measure -a single score to define the center of a distribution -used to locate a single score that is most representative or descriptive of all scores in a distribution -mean, median, and mode (types of c.t.) -differences in notation: population size is N; Sample size is n -purpose: find the single score that is most typical or best presents the entire group
unbiased estimator
sample variance is an unbiased estimator. on average, the sample variance is equal to the population variance when we subtract one from n unbiased estimator--any sample statistic obtained from a randomly selected sample that equals the value of its respective population parameter, on average
grouped data
set of scores distributed into intervals, where the frequency of each score can fall into any one interval
ogive
summarizes the cumulative percent of continuous data at the upper boundary of each interval (dot-and-line)
frequency polygons
summarizes the frequency of CONTINUOUS data at the midpoint of each interval (dot-and-line) -Midpoint is calculated by adding upper and lower boundary of interval, then dividing by 2
bar chart
summarizes the frequency of discrete and categorical data in whole units or categories -each category is represented by a rectangle -rectangles do not touch long the x-axis
pie chart
summarizes the relative percent of discrete and categorical data into sectors -sector--represents the relative percent of a particular category
simple frequency distribution
summary display for: -the frequency of scores falling within defined groups or intervals (grouped data) in a distribution -generally more clear -the frequency of each individual score or category (ungrouped data) in a distribution
sampling distributions
tell us what values we might or might not expect to obtain for a particular statistic under a set of predefined conditions
purpose of z score
the number of standard deviations that X lies from μ on a standard scale. z = 0 ---> X = μ z > 0 ---> X > μ z < 0 ----> X < μ
quantitive
varies by amount
qualitative
varies by form or class
interquartile range (IQR)
•(IQR)--the 75th percentile minus the 25th percentile •Eliminating the top and bottom 25% of data rids data set of outliers •The location of the 1st quartile: (N+1)/4 •The location of the 3rd quartile: 3*(N+1)/4
independent variable
•Those that are manipulated by the experimenter •"Predictor"
dependent variable
•Those that are not under the experimenter's control •"Criterion"
frequency
•describes the number of times or how often a category, score, or range of scores occurs
histograms
•summarizes the frequency of CONTINUOUS data that are grouped To construct, follow three rules: Rule 1: Vertical rectangle represents each interval, and height of the rectangle equals the frequency recorded for each interval Rule 2: Base of each rectangle begins and ends at upper and lower boundaries of each interval Rule 3: Each rectangle touches adjacent rectangles at the boundaries of each interval
degrees of freedom
•the number of scores in a sample that are free to vary •all scores except one are free to vary in a sample: n - 1 •if you know the mean, and the value of all scores in the data set except one, you can perfectly predict the last score