Statistics
empirical study
study based on the observation or experience
Skewness Affect - Symmetric Unimodal Distribution
Mean = Median = Mode. Unimodal means it has ONE MODE. This is also an example of a NORMAL DISTRIBUTION.
Skewness Affect - Right-Skewed Distribution
Mean > Median > Mode
Baseline:
Measures taken at the start of a study before any interventions; sometimes referred to as the pretest.
Chebyshev's Rule
The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/k^2, where k is any positive number greater that 1.
Interquartile range:
The range of values extending from the 25th to the 50th percentile.
Mode
The value that occurs most often in a set of data.
Independent variable
The variable that is seen as having an effect on the dependent variable. In experimental designs, the treatment is manipulated.
Dependent variable:
The variable that measures the effect of some other variable (eg, the variable whose values are expected to be predicted by the independent variable). Also referred to as the outcome variable or the response variable.
Mean
This "measure of center" is the AVERAGE of the values in a data set. (Mean is sensitive to extreme values.)
The Empirical Rule
This says that, in a normal bell-shaped curve, 68% of the data fall within one standard deviation, 95% within two, and 99.7% within three.
Scatterplot Variables x and y
x is horizontal axis, and y is vertical axis. x is the "predictor" variable, and y is the "response" variable.
Standard scores:
z-scores; represent the deviation of scores around the mean in a distribution with a mean of "0" and a standard deviation of "1."
Percentile Calculation
i = (P/100)n. MORE TO THIS....
Comparison Test for Linear Correlation
1. Find the absolute value of the correlation coefficient r, |r|. |0.5|=0.5 and |-0.4|=0.4 2. Use the Table of Critical Values for the Correlation Coefficient and select the row corresponding to sample size n. 3. Compare the absolute value |r| from Step 1 to the critical value from the table in Step 2, and a.) If |r| is greater than the critical value, you can conclude that x and y are LINEARLY CORRELATED. i.) If r>0, then x and y are POSITIVELY CORRELATED. ii.) If r<0, then x and y are NEGATIVELY CORRELATED. And b.) If |r| is not greater than the critical value, then x and y are NOT LINEARLY CORRELATED.
The Empirical Rule in terms of z-Scores
68% of the data will have z-scores between -1 and 1, 95% between -2 and 2, and 99.7% -3 and 3.
Bartlett's test:
A chi-square statistic used to test the significance of lambda.
Analysis of covariance (ANCOVA):
A combination of regression and analysis of variance techniques that allows comparison of group means after adjustment for the effect of the covariate.
Standard Deviation
A common measure of the variability, or spread, of a data set. It is a typical deviation from the mean.
Detecting Outliers - IQR Method
A data value is an outlier is a. it is located 1.5(IQR) or more below Q1, or b. it is located 1.5(IQR) or more above Q3.
Bar graph:
A graph used for nominal or ordinal data. A space separates the bars.
Scatterplot
A graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation (little scatter indicates high correlation).
Boxplot
A graphic display that represents the distribution of data by focusing on five key measures: Min, Q1, Q2, Q3, Max.
Box plots:
A graphic display that uses descriptive statistics based on percentiles
Bell shaped:
A graphical shape, typical of the normal distribution.
Sample:
A group selected from the population in the hope that the smaller group will be representative of the entire population.
Median
A measure of center in a set of numerical data. The median of a list of values is the value appearing at the center of a sorted version of the list - or the mean of the two central values if the list contains an even number of values. (Median is NOT sensitive to extreme values)
Variance:
A measure of the dispersion of scores around the mean. It is equal to the standard deviation squared.
Skewness Affect
A measure of the shape of an asymmetrical distribution.
Variable:
A measured characteristic that can take on different values
Nominal measure:
A measurement scale in which the numbers have no intrinsic meaning but are merely used to label different categories. Ethnic identity, religion, and health insurance status (eg, none, Medicaid, Medicare, private) are all examples of nominal-level data.
Ratio scale:
A measurement scale in which there are both equal intervals between units and a true zero. Most biologic measures (eg, weight, pulse rate) are ratio-level variables.
Ordinal scale:
A measurement scale that ranks participants on some variable. The interval between the ranks does not necessarily have to be equal. Examples of ordinal variables are scale items that measure any subjective state (eg, happiness: very happy, somewhat happy, somewhat unhappy, very unhappy; attitude: strongly agree, somewhat agree, somewhat disagree, strongly disagree; and military rank: general, colonel, sergeant, private).
Analysis of variance (ANOVA):
A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors.
Interval-level measurement:
A rank-order scale with equal intervals between units but no true zero. IQ scores, SAT scores, and GRE scores are all examples of interval-level data.
Interquartile Range (IQR)
A robust measure of variability, calculated as IQR=Q3-Q1. It is interpreted as the spread of the middle 50% of the data, and it is NOT affected to outliers since it ignores the highest 25% and the lowest 25% of the data set.
Correlation Coefficient
A statistic, r, that summarizes the strength and direction of the linear relationship between two variables. It always takes on a value between -1 and 1, inclusive.
Normal distribution:
A theoretical probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability that these values will occur. Normal distributions are unimodal (mean, median, and mode are the same), symmetrical about the mean, and have a shape commonly described as a bell-shaped curve.
Continuous variable:
A variable that can take on any possible value within a range. For example, weight is a continuous variable because a weight of 152.5 lb makes sense. In contrast, number of children is a discrete variable because it can take on only certain values (0, 1, 2, and so on). A value of 1.2 for children does not make any conceptual sense.
Histogram:
A way of graphically displaying ordinal-, interval-, and ratio-level data. It shows the shape of the distribution.
Type II error:
Accepting the null hypothesis when it is false.
Five-Number Summary
An exploratory data analysis technique that uses five numbers to summarize the data: 1. smallest value, 2. first quartile, 3. median (second quartile), 4. third quartile, and 5. largest value.
Outlier
An extremely large or extremely small data value relative to the rest of the data set.
Sample Variance s²
Approximately the mean of the squared deviations in the sample.
Parameters:
Characteristics of the population.
Data set:
Collection of different values of all the variables used to measure the characteristics of the sample or population
Detecting Outliers - Z-score Method
Identify an outlier by determining is it is farther than 3 standard deviations from the mean, i.e., Z-score less than -3 or greater than 3.
Percentile Rank
Percentage of scores falling at or below a specific score. A percentile rank of 95 means that 95% of all of the scores fall at or below this point. In other words, the score is as good as or better than 95% of the scores.
Type I error:
Rejecting the null hypothesis when it is true.
z-Score
Standardized scores calculated by subtracting the mean from an individual score and dividing the result by the standard deviation; represents the deviation from the mean in a normal distribution.
Quartiles
The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data.
Deviation
The difference between a data value and the mean of the data set. (The distance between the data value and the mean) If data value x > mean, deviation will be positive. If data value x < mean, deviation will be negative. If data value x = mean, deviation will be zero.
Range
The difference between the largest value and smallest value of a data set. (Range = Largest Value - Smallest Value) (A larger range is an indication of greater VARIABILITY, or greater spread, in the data set)
Population:
The entire group having some characteristic (eg, all people with depression, all residents of the United States). Often a sample is taken of the population and then the results are generalized to that population.
Statistics:
The field of study that is concerned with obtaining, describing, and interpreting data; the characteristics of samples.
Quartile:
The four "quarters" of the data distribution. The first quartile is the 25th percentile, the second quartile is the 50th percentile, the third quartile is the 75th percentile, and the fourth quartile is the 100th percentile
Control group:
The group that is used for comparison in an experimental or quasi-experimental study
Ratio-level measurement:
The highest level of measurement. In addition to equal intervals between data points, there is an absolute zero.
Alternative hypothesis (Ha):
The hypothesis that states a statistically significant relationship exists between the variables. It is the hypothesis opposite to the null hypothesis. It is also referred to as the "acting" hypothesis or the research hypothesis.
Null hypothesis:
The hypothesis that states that two or more variables being compared will not be related to each other (ie, no significant relationship between the variables will be found).
Percentile
The location of a data value relative to other values in the data set, i.e., a score in the 90th percentile means that 90% of all scores are at or below the same level, and 10% scored higher than this score.
Nominal:
The lowest level of measurement; consists of organizing data into discrete units.
Population Variance ϭ²
The mean of the squared deviations in the population.
Zero-order correlation
The measured relationship between two variables
Absolute value:
The positive numeric value of a number (the minus sign in front of the number is disregarded).
Population Standard Deviation ϭ
The positive square root of the population variance.
Sample Standard Deviation s
The positive square root of the sample variance s².
Boxplot Upper and Lower Fences
Upper Fence = Q1 - 1.5(IQR) Lower Fence = Q3 + 1.5(IQR)
Central limit theorem:
When many samples are drawn from a population, the means of these samples tend to be normally distributed.