Definitions and equations physics: Statistics data types and distribution
Interquartile range
The range of values that lie between the 1st and 3rd quartiles and therefore represent 50% of the data Used in non parametric tests Negates effect of extreme values in a data set
Confidence intervals
The range of values that will contain the true population mean with a stated percentage confidence. Used in parametric tests 95% CI is +/- 1.96SD. There is 95% certainty that this range of values around the mean will contain the population mean
Positively skewed
mean > median > mode Can be made normal by logarithmic transformation
Negativley skewed data
mean median and mode sepearted opposite direction Mean< median < mode
Positivley skewed box and whisker plot
median closer to 1st quartile
Measures of the central tendency
1. Mean: The average value, the sum of the data values divided by the number of data points x̅ Sample mean µ Population mean 2. Median The middle value of a data series having 50% of the data points above and below it 3. Mode Most frequently occuring value in a set of data points
Types of data distribution: Normal
1. Normal symmetrical curve around mean which is the same as the mode and median
For normal distribution
1SD either side of mean contains 68% all data points 1.96SD 95% all data points 2SD 95.7% all data points
Sample
A group taken from the wider population. Aims to be representative of the population Samples described by statistics
Quartile
Any one of three values that divide a given data set into four equal parts Mathematical equivilant of deviding a piece of paper into 4 equal pieces. Middle quartile Q2: Median for middle set of data First quartile Q1: median for lower half set of data Third quartile Q3: median for upper half set of data
Qualitative data
Categorical data e.g. non numerical names and labels e.g blood group, pain scores or hair colour Nominal: data that has no numerically significant order eg blood groups Ordinal: Data that has implicit order of magnitude such as ASA score
Hierarchy of usefulness of data according to how well it can be statistically analysed
Continuos data > ordinal data > nominal data
Quantitative data
Numerical quanitative data 1. Discrete: Data that have finitie values e.g. number of children 2. Continuous: Data that can take any numnerical value including fractional values e.g weight or height 3. Ratio: Data series that has zero as its baseline value i.e. zero has no value or is no measurement e.g. heart rate 0 no heart rate 4. Interval data: Data series that includes zero as a point on a larger scale e.g. 0⁰c does not mean no temperature but rather is a point on a wider scale
Describing data
Once data has collected it will be distributed around a central point or points 1. Terms to describe the measure of central tendency 2. Spread of the dataδ
Parametric Vs non parametric tests
Pararmetric tests make assumptions about the nature of the data (i.e assume normal distribution) Nonparametric statistical procedures rely on no or few assumptions about the shape or parameters of the population distribution from which the sample was drawn. Non-parametric tests use less information and are therefore more conservative tests than their parametric alternatives. That means that if you use a non-parametric test when you have parametric data, you can decrease your power — i.e. you are less likely to get significant result when there truly IS a significant result (significant relationship, significant difference, etc.). Alternatively, parametric tests use more information than non-parametric tests, and are therefore usually more powerful. However, if you wrongly use a parametric test when you have non-parametric data, you may get the wrong outcome.
Population
The entire number of individuals of which the sample aims to be representative Populations described by parameters
Degrees of freedom
The number of values that can vary independently within a sample In order to calculate a mean there must be at least two values present. So n-1 is often used instead of the actual number. The size of the freely chosen sample must always be one less than actually present In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (which, in sample variance, is one, since the sample mean is the only intermediate step).
Standard error of the mean
The standard deviation of a group of sample means taken from the same population (SEM) SEM=SD/√(n-1) The SD is measure of spread aroud the mean the SEM is a measure of the spread of a group of sample means around the true population mean
Measures of spread Variance SD= sq root of variance
Variance a measure of the spread of data around a central point Var = ∑(x̅- x)² / n - 1 ∑ summation Standard Deviation A measure of the spread of data around a central point described by 1. Begin by finding the sample mean and substracting each data point to find the differences between the values x̅ - x 2. Results squared to ensure that values are positive (x̅ - x)² then summed ∑(x̅-x)² 3. Results divided by the number of observations -1 for statistical reasons ∑(x̅-x)²/ n-1 = Variance This may have strange units e.g seconds² so square root of the variance is the SD SD=√∑(x̅-x)²/ n-1