math 140 stats unit one exam

Ace your homework & exams now with Quizwiz!

how to tell shape of a distribution in a box plot

- bigger plot on right side= skewed right since Lower half of the data has less variability than upper half -center described by IQR

where do probability distributions come from

- can come from observed data - then they calculate the relative frequency of each outcome which represents the empirical probability for that outcome]

center of the distribution

- can think of as a typical value and choose a single value of the variable to represent the entire group

other techniques to mitigate effects of confounding variables

- control group (provides a baseline for comparison since it does not receive treatment) -with human participants, use of a control group may not be enough to establish whether a treatment really has an effect since placebo effect

random assignment

- controls the effects of confounding variables that a researcher cannot control directly or that are difficult to identify in advance -assign children to random treatment groups - goal of random assignment is to create similar groups with respect to age, weight, and other characteristics that may be apart of confounding variables -If random assignment works, average age for each treatment group should be about equal -goal is to create similar treatment groups because then any differences in the response variable are due to treatments -often make random assignments by flipping a coin (each participant has an equal chance of receiving any of the treatment options)

first step in data analysis

- create a graph of the distribution of the variable - in a graph the summarizes the distribution, can see the possible values of the variable and the number of individuals with each variable or interval of values

analyze distribution of a quantitative variable

- describe overall pattern of data (shape, center, spread) and any deviations (outliers) - can use dot plots, histograms, and box plots

quantitative data analysis

- determine categorical variable -draw histogram

2 common strategies to control the effects of confounding variables

- direct control and random assignment

deviations from the mean for each point

- distances between each point and the mean (point-mean) - negative difference= data point to the left of the mean and positive difference= data point to the right of the mean - take absolute value of these differences, add them, and find the average which is a measure of spread about the mean called the average deviation from the mean (ADM)

quartile marks

- divide the data set into four groups with equal counts

to create a box plot from each distribution

- draw a box from 1q to q3 draw a vertical line in the box at q2( median) extend a tail from Q1 t the smallest value that is not an outlier and from Q3 to the largest value that is not an outlier -dinciate outliers with asterisks -long box =large IQR= middle half has large variability in data and vice versa

how can we determine if two events are independent

- e.g. if being female does not depend on being in a health science program and vice versa - when two events are dependent, this does not mean a "cause and effect" relationship exists between them -If there is a large enough difference to suggest a relationship between being female and being enrolled in the health science program, these events are dependent - can also compare the probability that a student is female with the probability that a health science student is female -P(A given B)= P(A)= two events A and B are independent

Population

- entire group of individuals or objects we want to study -since not possible to study the whole population, we collect data from a part of the population called a sample and we use the sample to draw conclusions about the population

probability affected discrete random variable

- for a discrete random variable like shoe size, the probability is affected by whether or not we include the end point of the interval -e.g. the area (and corresponding probability) is reduced if we consider only shoe sizes strictly less than 9

sample size

- for random samples, the bigger the same the more accurate -If not a random sample, does not necessarily guarantee reliable results, - large samles tend to be more accurate than smaller samples if chosen randomly - precision of sample depends on sample size not population size -size of population does not affect accuracy of a random sample as long as the population is large

relative frequency

- from these counts/frequnecy, we can determine a percentage of individuals with a given interval of variable values -this percentage= relative frequency -probability of an event A is the relative frequency with which that event occurs in a long series of repetitions

creating box plots in stat crunch

- graph box plot, select column, group by.... compute

how to create a histogram on stat crunch

- graph, histogram, choose variable - to compare multiple histograms, choose weight command click, hold mouse down, drag for another variable

continuous random variables

- have numeric values that can be any number in an interval - ex: the exact weight of a person, foot length, measurements, height - with a discrete variable, you can count the possible values for the variable without rounding off - with a continuous variable, you cannot -random here means that the outcomes are uncertain in the short run but have a regular distribution or predictable pattern in the long run - we reserve the term random variable for quantitative variables

symmetrical

- identifying shape of distribution sets up our analysis

when can we add probabilities

- if two events are disjoint ; this works because the events have no outcomes in common= disjoint (e.g. can't have type A and type O blood at the same time, can't have no eggs and 1 egg in a nest at the same time)

modified vs regular box plots

- in modified box plot, outliers are marked with an asterisk - for a box plot that is not modified, the tails extend to the minimum and maximum values (can't see outliers)

observations about histograms

- individual variable values are not visible -grouping individuals into bins of equal-sized intervals is particularly useful when analyzing large data sets -can easily use percentages, also called relative frequencies, to describe the distribution -descriptions of shape, center, and spread, are affected by how the bins are defined

observations about dot plots

- individual variable values are visible particularly when data set is small -descriptions of shape, center, and spread are not affected by how the dot plot is constructed - we can accurately calculate the overall range (largest-smallest)

mean and standard deviation symbols

- mean of a normal distribution locates its center (mu; u); - greek letter sigma (o) to represent the standard deviation of a normal distribution - the standard deviation determines the spread of the distribution; the shape of a normal curve is completely determined by specifying its standard deviation

measure of center when data has an outlier

- media is a better summary since outlier does not affect the median since it doesn't affect order of scores but lower outlier makes mean lower and higher outlier makes mean higher; too low/high to be a representative measure - the smaller the sample, the greater impact outliers have

confounding variables

- mix up our ability to determine if the explanatory variable causes a change in the response variable - weakens cause-effect relationship between explanatory and response variables

producing data

- need a representative sample from the population - observational studies and experiments -determine what you are measuring and collection of actual data

examples of quantitative variables

- number of boreal owl eggs in a nest - number of times a college student changes major -shoe size -weight of a student -foot lengths for adults - when the outcomes are quantitative, we call the variable a random variable

outliers

- observations that fall outside the overall pattern

common survey plans that produce unreliable and potentially biased results

- online polls; voluntary response sample (biased because only people with strong opinions participate) -mall surveys (ever-present white middle class/retired people); convenience sampling

how to add a distribution overlay to histogram stat crunch

- options edit, under display options say overlay distribution - which you click one (e.g. normal), can specify parameters for distribution (sample meal and sample standard deviation for normal distribution), compute

how to customize summary stats in stat crunch

- options, edit statistics has different options you can add to the table - to select or deselect options, hold common, click compute

how to modify a histogram to see the shape better on stat crunch

- options, edit, binds, with, change width to be larger - changes # of bins, easier to see skew - larger bin width= less bars= less detail - more detail= make bin width smaller = more bus

how to show numerical value of each bin on a histogram stat crunch

- options, edit, display, check value above bar for each bin and click compute

to add percentile stats to table (statcrunch)

- options, edit, in percentile box can enter different values (10, 90) for 10th and 90th percentiles, compute

to draw box plots horizontally instead of vertically

- options, edit, mark other option draw boxes horizontally, compute -numerical scale now on x-axis

to identify outliers in box plots stat crunch

- options, edit, other options, use fences to identify outliers, compute -click and drag to highlight outlier, corresponding row in data table is highlighted -to clear highlight, use clear button in bottom corner

how to create own custom statistic stat crunch

- options, edit, other statistic, type in (column=x) , to see different functions click build to give you list of functions that can be used -Ex, type in std(x)*std(x)= variance, add variance back into table (statistics) to verify custom statistic created, compute, should match

confounding variables observational static's

- other factors influencing results of an observational study - difficult to remove all of the factors that may have an influence which is why observational study gives weak/misleading effect of a cause effect relationship

features of a probability distribution

- outcomes described by the model are random. means that individual outcomes are uncertain, but there is a regular, predictable distribution of outcomes in a large number of repetitions - the model provides a way of assigning probabilities to all possible outcomes - the probability of each possible outcome can be viewed as the relative frequency of the outcome in a large number of repetitions, so like any other probability, it can be any value between 0 and 1 - the sum of the probabilities of all possible outcomes must be 1

information in a well stated research question

- population -variable (what we plan to measure) - numerical characteristics about the population related to variable (e.g. average, proportion, relationship, majority, etc.)

2 types of statistical research questions

- questions about population (select sample and observational study) - questions about cause and effect (experiment)

range vs IQR

- range measures the variability of a distribution by looking at the interval covered by all the data - the IQR measures the variability of a distribution by giving us the interval covered by the middle 50% of the data

role of the normal curve in statistical inference

- relate sample means or proportions to population means or proportions

research question cause and effect

- research question that focuses on a cause and effect relationship is common in disciplines that use experiments such as medicine or psychology - how one variable responds as another variable is manipulated -questions involve two variable -to provide convincing evidence, researcher designs an experiments

statistical adjustments

- researchers may use advanced techniques for making statistical adjustments within an observation study to control the effects of confounding variables that can influence the results - also use criteria when making cause effect relationship from observational studies -reasonable explanation: smoking in rats causes cancer so it would in humans to, multiple observational studies performed that vary in design so factors that confound one are not present in another

biased sampling plan elections

- sample using magazine subscriptions, lists of registered car owners, etc. -not represented of American public, systematically underrepresented democrats so the poll results did not represent the population

describe distribution of data

- shape -typical value (center) -spread - want less detail fro shape but more detail for center -f or spread, if total = 50, 50%= 25 bears, how far out do you have to go to reach 25 bears?

how to describe patterns in quantitative data

- shape, center, spread, and outliers

descriptions of distributions with box plots

- shape, center, spread, outliers, and 5 number summary

compute summary statistics stat crunch

- stat menu summary stats menu columns, pick which column compute, by default produces table with 11 summary stats

how to find q1 and q3 stat crunch

- stat, summary stats, columns, pick column, compute (or can personally add q1 and q3) - in small data sets, order numbers from lowest to highest divide into 4 parts - the value separating the 1 and 2 part= Q1, the value separating the 3rd and 4th part= Q3

contingency frequency table

- stat, table, contingency, data - for comparison of multiple categories: stat, table, contingency decided which variable (row and column), get 2 way table

categorical variables

- take category of label values and place an individual into one of several groups - each observation can be placed in only one category and the categories are mutually exclusion -e.g smoker or nonsmoker, gender, race,

quantitative variables

- take numerical values and represent some kind of measurement - age (can take on multiple numerical values), weight, height, etc.

how are quartiles used to measure variability about the median

- the IQR is the distance between first and third quartile marks - the IQR is a measurement of the variability about the median, tells us the range of the middle half of the data

in a table that summarizes the distribution of a categorical variable, we can see

- the different values (categories) the variable takes - how many times each value occurs (count) and how often each value occurs (converting counts to proportions)

joint probability equation

- the joint probability equals the product of the marginal and conditional probabilities - marginal probability*conditional probability= joint probability -P (A and B)= P(A) * P(B given A) - P(A and B)= P(A)*P(B0 only when two events are independent

Probability

- the machinery behind inferences since we infer something about a population based on a sample - since the sample is not necessarily the population, probability is involved since samples vary

margins of a two way table

- the numbers in the margins are totals for each row or column -where a row and column cross is where we see the number of individuals who fit a particular portion of each category

in a graph that summarizes the distribution of a quantitative variable, we can see...

- the possible values of the variable - the number of individuals with each variable value or interval of values

how to measure spread

- the spread of a distribution is a description of how the data varies -can use range, IQR, and std - when we use the median, Q1 to Q3 gives a typical range of values associated with the middle 50 %o f the data and when we use the mean, mean + or - SD gives a typical range of values

when do we use standard deviation

- to compare the variability of two distributions - incorporate the standard deviation into our description of the pattern in the distribution of a quantitative variable

explanatory variable vs response variable

- to establish a cause and effect relationship, want to make sure the explanatory variable is the only thing that impacts the response variable - remove other factors affecting the response and manipulate only explanatory variable

exploratory data analysis

- to make sense of the data, need to explore and summarize it using graphs and different numerical measures (percentages and averages)

range of typical values

- to respect common variable values for the group (bin widths)

two way tables

- two tables for two categorical variables give us a useful snapshot of all of the data organized in terms of the two variables of interest - give us a practical context for talking about probability

ADM

- typical range of values (within 1 ADM of the mean) contains more than half of the values in the data set - ADM measures the average distance of the data from the mean - the larger the ADM, the more variable it is

how to adjust a histogram to show relative frequency (density) on y axis stat crunch

- under options, choose edit - in the type box, change frequency to relative frequency then click compute

scores with an outlier

- use median and IQR - outlier increases (or decreases) standard deviation and mean which makes it seem the data is more variable - the typical range based on the first and 3rd quartiles give a better summary since outlier does not affect quartile marks - same applies to skewed data

measuring spread about the mean

- use standard deviation (spread= +/- one std above and below the mean)

when to use mean vs median

- use the mean as a measure of center only for distributions that are reasonably symmetric with a central peak -use the median as a measure of center for all other cases - use median when outliers are present

choosing numerical summaries

- use the mean standard deviation as measures of center and spread only for distributions that are reasonably symmetric with a central peak -when outliers are present, use IQR and 5 number summary and use for skewed data

investigating the relationship between 2 categorical variables

- use the values of the explanatory variable to define the comparison groups - we then compare the distributions of the response variable for values of the explanatory variable - we lookout how the pattern of conditional percentages differs between the values of the explanatory variable

likelihood/chance/probability statements

- use to data to make statement about the likelihood that a randomly selected student from a specific college is a health science major - the risk associated with not wearing a seat belt - the chance of a positive drug test for someone who does not use drugs when the test is 94% accurate

using the iQR to identify outliers

- value is greater than Q3+1.5IQR or less than Q1-1.5*IQR

what do we mean when we say an event is random or due to chance

- we mean the event is unpredictable in the short run but has a regular and predictable behavior in the long run -true when tossing. a coin; cannot predict whether an individual toss will be heads but in the long the outcomes have a predictable pattern (relative frequency of heads is close to 0.5) - can make probability statements only about random events

probabilities of all possible outcomes

- we think of all possible outcomes as variable values -each variable value has a probability - the variable values together with their probabilities are a probability distribution

risk

- when calculating the probability of a negative outcome - risk= another word for probability -often compare 2 risks by calculating the percentage change -calculate the difference (how much the risk is changed) and divide by the risk for the placebo group - percentage reduction of risk= (new treatment risk-reference (placebo) risk)/reference (placebo) risk

why does changing the bin size and the starting point of the first bin change the histogram so drastically

- when we change the bins, the data gets grouped differently which affects the appearance of the histogram - avoid histogram with too large or too small bin withers since it doesn't help us see variability or patterns in data

what happens to the probability histogram when our continuous random variable has more precision

- when we increase the precision of the measurement, we will have a larger number of bins in our histogram because each bin contains measurements that fall within a smaller interval of values - as the width of the intervals of the bins get smaller, the probability histogram gets closer to the curve - if we continue to reduce the size of the intervals, the curve becomes a better and better way to estimate the probability histograms -normal distributions normally for continuous random variables

spread about the median

- which distribution has more variability - determines how you measure spread (either by range) but don';t use range when data is distributed about the median -to measure variability about the mean, use quartiles - if it has more data close to the median, the data set has less variability about the median

standard deviation vs mean sample vs model

- x bar= mean of data in a sample, mu= mean of a density curve defined by a mathematical model -s represents standard deviation of data in a sample; sigma (o) to represent the standard deviation of a density curve defined by a mathematical model

conditional percentages

-A way to approximate a percentage by dividing the number of times an event occurred in an experiment by the total number of respondents in that row or column. See relative frequency. - based on the a specific conditions -conditional percentages= numeric summary - conditional percentages are calculated separately for each value of the explanatory variable - when we try to understand the relationship between 2 categorical variables, we compare the distributions of the response variable for values of the explanatory variable - we look at how the pattern of conditional percentages differs between the values of the explanatory variable

addition rule

-Considering mutually exclusive events, the probability of both occurring is the sum of the probabilities of each event. -

marginal proportions

-Ratios of the row or column totals to the overall total number of observations -doesn't help us determine relationship to two categorical variables because it involves only one of the variables

Theoretical Probability

-What should occur or what we expect to happen in an experiment -expect for a coin toss to be 50/50 heads or tails - we determine the number of ways an event can occur and divide by the total number of possible outcomes - in situations where the outcomes are equally likely, we can use mathematics to calculate the probability instead of collecting data

exploring the relationship between 2 categorical variables

-amounts fo comparing the distributions of the response variable for different values of the explanatory variable

median

-another way to identify typical value - middle of the ata when the values are listed in order - divides the data into 2 equal groups where there is equal amounts of data below and above it

census

-attempt to include every individual from a population in a sample

mean

-average, x bar - to calculate the mena, add data values and divide by # of data points -mean is the fair share measure of center - we can understand the mean as the score Beth would have on every assignment is she always made the same great -does not give us information about any individual homework score or about how the homework scores vary - also known as the balancing point of a distribution since the distance between each data point and the mean are balanced on each side of the mean -distances below the mean= negative, above the mean=positive

placebo effect

-because people in medical experiments improve when taking a placebo, a placebo group provides.a baseline for comparing treatment - if a treament produces better results than a placebo, have evidence that treatment is responsible for improvement

random sampling

-best way to eliminate bias - collecting a random sample is like pulling names from a hat; in a sample random sample everyone in the population has an equal chance of being chosen -also guarantees that the sample results do not change significantly from sample to sample; variability is results is due to chance

typical value

-center of distribution -normally the tallest bin width (since it has the most frequency) -can calculate by taking the entire amount, divide by 2, and count frequencies upward until you land on the bar with the middle value -changing bin width changes the typical value but should be similar

Boxplots

-commonly used to summarize a distribution of a quantitative variable

inference

-conclusion we reach from our sample data that answers our original question about the population - to learn and draw conclusions about the opinions of the entire population based on our sample

examples of observational study and experiment

-conducting a survey, diving class into two groups of one listening to music and one not and having them keep a journal -experiment: word puzzles without music and word puzzles with music and calculating average number of words found= experiment

sampling plan

-describes exactly how we will choose the sample - a sampling plan is biased if it systematically favors certain outcomes - focus on surveys as sampling plans

steps used to rule out confounding variables

-direct control - random assignment -control group -placebo group -blinding -does not rule out chance variation between treatment groups

single blind experiment

-either researcher or participant does not know which treatment the participants receive

why can raw counts be misleading

-ex: if there are more females then males, comparing raw counts is misleading -Instead we compare the percentage of females who responded to each category - by converting to percentages, we are reporting the results as though there are 100 females and 100 males - in general, we need to supplement our display (2 way table) with numeric summaries that allow us to compare the distributions. therefore, we always convert counts to percentages

two types of statistical investigations (producing data )

-experiments and observational studies -our approach to collecting data determines what we can conclude from the data

Dotplot

-graphs a dot for each case against a single axis - vertical axis = count/frequency, horizontal axis= variable values - can say how much protein varies by (1-6 grams ) by using range - most of the cereals have 1-2 gram so proteins

discrete random variables

-have numeric values that can be listed and often can be counted -e.g. variable number of boreal owl eggs is a decrete random variable -shoe size= discrete random variable - blood type is categorical

blinding

-in experiments that use a placebo participants do not now whether they are receiving drug or placebo - are blind to the treatment to prevent their own beliefs about the drug or placebo from confounding the results

direct control

-influencing length of time in washing hands (want all groups to wash hands for the same time) - amount of soap (all participants use one squirt) - stabilizing impact of confounding variable across treatments; differences in response variable cannot be due to differences in confounding variables

experiment

-intentionally manipules one variable in an attempt to cause an effect on another variable - cause and effect relationship between 2 variables

probability distribution

-list of possible outcomes with associated probabilities -each variable value is assigned a probability - sum of all probabilities=1 - e.g. each blood type has a corresponding probability - the probabilities are numbers between 0 and 1 since each probability is a relative frequency - all outcomes are assigned a probability -the outcomes are random events; when we randomly choose a persons we do not know their blood type but there is a predictable pattern in the outcomes that is described by the relative frequencies

bell shaped curve

-normal distribution/normal curves -continuous random variables - indicates that values closer to the mean are more likely and it becomes increasing unlikely to take values far from the mean in either direction -even though all normal curves have the same bell shape, they vary in their center and spread

what information does a box plot not give us

-number of data points in the data set - number of data points within each quartile (even though each quartile contains the same number of data point) -pattern of the data within each quartile

observational study

-observes individuals and measures variables of interest but does not attempt to influence the responses -main purpose is to describe a group of individuals or to investigate an associate between two variables; can investigate a relationship but since not manipulating one variable to cause an effect in another, does not provide convincing evidence of a cause and effect relationship

how to change starting point/bin width of a histogram stat crunch

-options, edit, bins, (can enter start at point and width)

to represent a probability distribution of a random variable

-probability histogram -probability distribution of a random variable X can be represented by a table that provides a way to assign probabilities to outcomes

how to use normal calculator stat crunch

-stat, calculator, normal

well designed experiment

-takes steps to eliminate the effects of confounding variables including random assignment of people to treatment groups, use of a placebo, or blind conditions

Conditional Probability

-the likelihood that a target behavior will occur in a given circumstance -e.g. if we select a female student at random, what is the probability that she is in the health sciences program -starting with a female (condition), then asking what is the probability that female is in the health sciences - a condition is given -can also be represented by a vertical bar - the probability of a categorical variable taking on a particular value giving the condition that the other categorical variable has some particular variable -only using a subset of the data which is determined by the given condition

Complement Rule

-the probability of an event occurring is 1 minus the probability that it doesn't occur -e.g. P(not a universal donor)= P(blood type is not O)= 1- P(type 0) -the complement of event A is the event composed of all blood types except for O

Joint Probability

-the probability of the intersection of two events -e.g. the probability that a randomly selected student is both female and in the health sciences program - when we calculate a joint probability, we divide the count from an inner cell of the table by the overall total count in the lower right corner - the probability that 2 categorical variables each take on a specific value

Marginal Probability

-the values in the margins of a joint probability table that provide the probabilities of each event separately -same as marginal proportion - the probability of a categorical variable taking on a particular value without regard to the other categorical variable - use overall student data contained in the margins of the table - do not take into account the other categorical variable

spread

-the variability of the data - can measure using range but the outliers make it seem like data is much more variable than it is in reality (seen with salaries) -normally can look 1 bar above and 1 below - focus on middle 50% and how spread out is the middle 50% of the data

bins

-variable values divided into equal sized intervals (each bin is a bar) - height of bin= count/frequency

empirical probability

..., involves conducting an experiment to observe the frequency with which an event occurs -actually collecting data; actually flipping a coin multiple times (need a large sample) -empirical probability gets closer to theoretical probability with larger samples - an estimate using data the likelihood that the event will happen

Big picture of stats

1. Producing Data 2. Exploratory Data Analysis 3. Probability 4. Inference

why can't you take about the shape for categorical data

- bars can be rearranged - can talk about typical value -for variability for categorical data, # of bars, 1-2 categories or 12 categories

why it is important to identify the explanatory variable

- because we always use the totals for the explanatory variable to calculate percentages

inflection point normal curve

- -the x-values of the inflection points correspond to 1 standard deviation above and below the mean the curve changes the direction of its bend and goes from bending upward to bending downward or vice versa

standard deviation

- a measurement of spread bout the mean similar to the average deviation - think of standard deviation as roughly the average distance of data from the mean, approximately= to the average deviation

random variable

- a quantitative variable with outcomes that occur as a result of some random process (discrete and continuous) - a probability distribution of a random variable tells us the probabilities of all the possible outcomes (for discrete random variables) of the variable or ranges of values (for continuous random variables) - a probability distribution shows us the regular, predictable distribution of outcomes in a large number of repetitions of a random variable -for a discrete random variable, the probabilities of values are areas of the corresponding regions of the probability histogram for the variable

representative sample

- a subset of the population that reflects the characteristics of the population

histogram

- another way to display distribution of a quantitative variables - histograms useful for large data sets as it divides the variable values into equal sized intervals

questions asking to calculate relative frequency

- approximately what percentage of the sample has hip measurements between 85 and 90cm? - what percentage of the sample will wear large size sweat pants? - in these calculations, we assume that the value of the left-hand endpoint of each bin is included in the count for that bin and not the right-hand endpoint - bin for interval 85-90 has values of 85 but not 90

right skewed

a distribution with a tail that extends to the right

left skewed

A density curve where the left side of the distribution extends in a long tail. (Mean < median.)

mutually exclusive

Events that cannot occur at the same time.

uniform shape

a rectangular shape with the same amount of data for each variable value

empirical rule for normal curves

within 1 std of the mean: central 68% of the data - 95% of values fall within 2stds of the mean; therefore eunliekyl for a value to fall more than 2 stds away from the mean -values more than 2 stds away from the mean in a normal distriubtion= outliers - 99.7% of values fall within 3 stds of the mean - extremely unlikely for a value to fall more than 3 standard deviations away from the mean -values more than 3 standard deviations away from the mean are often called extreme outliers

to find mean of a data set stat crunch

stat, summary stats, columns, pick column (variable), pick statistics - to specifically choose one, options, edit, statistics column and look for IQR

independent events

The outcome of one event does not affect the outcome of the second event

std formula

subtract each data point- mean, square the result, add the total for each data points, divide by the amount of data points-1

2 types of variables

categorical and quantitative

data

consists of individuals and variables that give us information about those individuals (object or person) -variable= an attribute, such as measurement or a label

what does IQR tell us

how spread out the middle 50% of the data is

5 number summary

min, Q1, median, Q3, max - median=Q2 -some quartiles exhibit more variability in the data even though each quartile contains the same amount of data - first quartile has 25% of data, second=50%, third= 75% - uses quartiles to identify center and spread of a d sitrubtion - values between q1 and q3 give a typical range of values

shape

to describe the shape of a distribution imagine sketching the outline of the data to emphasize the general trend


Related study sets

Chemistry (CHM1020) Chapter 6 Study Guide

View Set