Descriptive Statistics: Summarizing the Data

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

sample size (n) population size (N)

units for sample/population size

Absolute Range: -"If we have a dataset that has extreme outliers (positive direction or negative) we're going to want to use the median to describe the central tendency of that dataset; if we use the median, we need to be able to measure variability too and absolute range is one of those ways."

- Difference b/t the largest and smallest observation in a dataset (ex: if your lowest value is 1 and your highest value is 10, then the absolute range can be written as 9 (difference between the two numbers, or it could be written as (1,10) - *Disadvantage*: the range is based solely on two observations and is likely not representative of the whole dataset. (if you have outliers, this doesn't tell you where they are) - Absolute range is particularly susceptible to outliers.

Standard Error of the Mean (SEM):

- defined as the variability of the sample means (very similar to SD) - calculated as the *SD divided by the square root of the sample size* - SEM will always be *smaller* than the sample SD ----> reported in the scientific and medical literature as an intermediary step in calculating confidence intervals. -*it's a variability within sample means itself*

Standard Deviation

- most common measure of the variability around the mean -used with the mean b/c it's the most informative when the dataset is normally distributed -mean and SD are commonly used together as the most informative measures of central tendency and variability of a dataset, respectively

Median

- number in the middle -50% of all ranked values are smaller and the other 50% are larger -*not influenced by extreme outliers* -therefore it's useful in situations in which there are unusually low or high values that would render the *mean* unrepresentative of the data

Central tendency

- tells you where average data, middle data, or the most common data located? -"a general term used to describe a single value near a point in the data that represents where the largest portion of data are located"

For a bell-curve *(normal distribution)*, the highest point on the Y-axis *(top of the bell)* is the mean *(on the X-axis)*. What happens when you go one or two standard deviations from the mean?

-68% of values will fall within +/- 1 SD from the mean -95% of the values will fall within +/- 2 SD from the mean -99.7% of the values will fall within +/- 3 SD from the mean -*this will be true when dealing with NORMAL DISTRIBUTION*

Distribution:

-Statistical test selection relies heavily on the distribution of data. These distributions can be summarized by centranl tendency and variation around the center. -Most important distribution is the normal or Gaussian curve (bell-shape curve)

When the curve is not distributed normally *(skewed to the positive or negative)*, the median is a better representation of central tendency because the skew will cause the mean to move to the left *(negatively skewed curves)* or the right *(positively skewed curves)*

-for skewed distributions, the median is a better representation of the central tendency than the mean, and the IQR is a better measure of the variability than the SD.

Mean

-most commonly used measure of central tendency (average of all observations in a dataset) -it *IS* influenced by extreme outliers -most useful when data are symmetrically distributed without outliers (i.e., normal distribution)

Interquartile Range (IQR): -use much more often than absolute range (which doesn't attest to reality due to the potential for outliers)

-quartiles are calculated in a way similar to the median, which splits a dataset into two equally sized groups (50th percentile) - *quartiles split the data into four approximate equal groups* [25th (lower quartile(Ql or Q1)) and 75th (upper quartile (Qu or Q3) percentile] -IQR is the range b/t the lower and upper quartiles ---->like the median, the IQR range is particularly *useful when data are not symmetrically distributed (there are outliers in either the pos OR neg direction, not both.*

Mode

-the value with the greatest frequency of occurrence - not generally used because it's often NOT representative of the data, particularly when the dataset is small

What is inferential statistics used for?

-to make predictions (i.e. infer) about a large amount of information (population) based on a sample. -used in decision making in many fields, including regulatory drug approval and clinical drug usage.

What is descriptive statistics used for?

-to organize, summarize, categorize, and display data (basically just summarizes data)

What graphical method best depicts variability through median and IQR *(used when you have extreme outliers)*?

-using *boxplots* -boxplots are one-dimensional graphs that can be drawn from the range, IQR, and median.

Variability measures: -Central Tendency (mean, median) does NOT provide any information about the variability in a given dataset therefore CT is usually is usually used with variability to better describe the dataset

1. Standard Deviation (SD) ----used with mean 2. Standard Error of the Mean (SEM) ----used with mean 3. Absolute Range ----used with median 4. Interquartile Range (IQR) ----used with median

Statistics is broken down into two fundamental parts:

1. descriptive statistics 2. inferential statistics

Most commonly used central tendency measures:

1. median (m) 2. mean (X or (micro sign) 3. mode (not used as much)

A dataset with a large number of outliers to best described by a *BLANK* and *BLANK* and opposed to a *BLANK2* and *BLANK2*.

BLANKS: median and IQR BLANKS2: mean and SD


Set pelajaran terkait

Chapter 67 - Care of Patients with Kidney Disorders

View Set

Real Estate, Level 4, Chapter 4 - LIENS

View Set

Accounting- Introduction to Managerial Accounting Chapter 2

View Set