ap stats chp 5/6
Normal model
A useful family of models for unimodal, symmetric distributions.
Statistic
A value calculated from data to summarize aspects of the data. For example, the mean, and standard deviation, are statistics.
Standardized value
A value found by subtracting the mean and dividing by the standard deviation.
Shifting
Adding a constant to each data value adds the same constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR.
assumptions.
All models make them/Whenever we model we'll be careful to point out the assumptions that we're making. And, we'll check the associated conditions in the data to make sure that those assumptions are reasonable
Outlier
Any point more than 1.5 IQR from either end of the box in a boxplot is nominated as an outlier.
Far Outlier
If a point is more than 3.0 IQR from either end of the box in a boxplot, it is nominated as a far outlier
outliers
If the data have outliers and you can correct them, you should do so. If they are clearly wrong or impossible, you should remove them and report on them.Otherwise, consider summarizing the data both with and without the outliers/when we group data in different ways, it can allow different cases to emerge as possible outliers
Normal probability plot.
If the distribution of the data is roughly Normal, the plot is roughly a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal.
Parameter
A numerically valued attribute of a model. For example, the values of and in a model are parameters.
Comparing boxplots
When comparing groups with boxplots: -Compare the shapes. Do the boxes look symmetric or skewed? Are there differences between groups? -Compare the medians. Which group has the higher center? Is there any pattern to the medians? -Compare the IQRs. Which group is more spread out? Is there any pattern to how the IQRs change? -Using the IQRs as a background measure of variation, do the medians seem to be different, or do they just vary much as you'd expect from the overall variation? -Check for possible outliers. Identify them if you can and discuss why they might be unusual. Of course, correct them if you find that they are errors.
Comparing distributions
When comparing the distributions of several groups using histograms or stem-and-leaf displays, consider their:Shape,Center,Spread
far outliers
data values farther than 3 IQRs from the quartiles.)
Standardizing
We standardize to eliminate units. Standardized values can be compared and combined even if the original variables had different units and magnitudes.
statistics
summaries of data
standardized values
zscores/Standardizing into z-scores does not change the shape of the distribution of a variable. Standardizing into z-scores changes the center by making the mean 0.Standardizing into z-scores changes the spread by making the standard deviation 1/may change the center and spread values, but it does not affect the shape of a distrib/Changing the center and spread of a variable is equivalent to changing its units. All other aspects of the context do not depend on the choice or modification of measurement units. This fact points out an imp distinction between the numbers the data provide for calculation and the meaning of the variables and the relationships among them. Standardizing can make the numbers easier to work with, but it does not alter the meaning/uses the standard deviation as a ruler to measure distance from the mean, creating z-scores.
Timeplot
A timeplot displays data that change over time. Often, successive values are connected with lines to show trends more clearly. Sometimes a smooth curve is added to the plot to help show long-term patterns and trends
Standard Normal model
A Normal model, with mean and standard deviation . Also called the standard Normal distribution.
Boxplot
A boxplot displays the 5-number summary as a central box with whiskers that extend to the non- outlying data values. Boxplots are particularly effective for comparing groups and for displaying outliers.
Normal probability plot
A display to help assess whether a distribution of data is approximately Normal. If the plot is nearly straight, the data satisfy the Nearly Normal Condition.
Nearly Normal Condition
A distribution is nearly Normal if it is unimodal and symmetric. We can check by looking at a histogram or a Normal probability plot.
68-95-99.7 Rule
In a Normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard devia- tions of the mean.
Rescaling
Multiplying each data value by a constant multiplies both the measures of position (mean,median, and quartiles) and the measures of spread (standard deviation and IQR) by that constant.
boxplot
Once we have a 5-number summary of a (quantitative) variable, we can display that info in a boxplot.are very effective for comparing groups graphically. When we compare groups, we discuss their shape, center, and spreads, and any unusual features
avoid inconsistent scales.
Parts of displays should be mutually consistent—no fair changing scales in the middle or plotting two variables on different scales but on the same display. When comparing two groups, be sure to compare them on the same scale
N(0,1). standard Normal model(or the standard Normal distribution)
The Normal model with mean 0 and standard deviation 1
Normal percentile
The Normal percentile corresponding to a z-score gives the percentage of values in a standard Normal distribution found at that z-score or below.
Nearly Normal Condition.
The shape of the data's distribution is unimodal and symmetric. Check this by making a histogram (or a Normal probability plot)/when we have data, we'll also need to make a histogram to check the Nearly Normal Condition to be sure we can use the Normal model to model the data's distribution. Other times, we may be told that a Normal model is appropriate based on prior knowledge of the situation or on theoretical considerations.check the Nearly Normal Condition by making a histogram or a Normal probability plot
normal percentiles
When the value doesn't fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles or use technology/first convert our data to z-scores before using the table
rescale
When we multiply (or divide) all the data values by any constant, all measures of position (such as the mean, median, and percentiles) and measures of spread (such as the range, the IQR, and the standard deviation) are multiplied (or divided) by that same constant/by multiplying or dividing every value by a constant, changes all the summary statistics—center, position, and spread
Normality Assumption
When we use the Normal model, we assume that the distribution of the data Normal. Practically speaking, there's no way to check whether this Normality Assumption is true
Normal models
bell-shaped curves." Statisticians call them Normal models. are appropriate for distributions whose shapes are unimodal and roughly symmetric.
empirical rule/68-95-99.7 Rule
in a Normal model, about 68% of the values fall within 1 standard deviation of the mean, about 95% of the values fall within 2 standard deviations of the mean, and about 99.7%—almost all—of the values fall within 3 standard deviations of the mean.
chebychevs rule
in any distribution, at least 1/k^2 of the values must lie within + or - k standard deviations of the mean/values beyond 3 standard deviations from the mean are un-common, Normal model or not. Tchebycheff tells us that at least 96% of all values must be within 5 standard deviations of the mean. /can't always apply the 68,95,99.7 Rule, we can be sure that the observation we encountered 5 stan- dard deviations above the mean is unusual.
z-score
tells how many standard deviations a value is from the mean; have a mean of 0 and a standard deviation of 1. When working with data, use the statistics/When working with models, use the parameters
shift
the data by adding (or subtracting) a constant to each value, all measures of position (center, percentiles, min, max) will increase (or decrease) by the same constant/Adding (or subtracting) a constant to every data value adds (or subtracts) the same constant to measures of position, but leaves measures of spread unchanged.by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread.
re-express,or transform,
the data by applying a simple function to make the skewed distribution more symmetric.
parameters of the model.
this mean and standard deviation are not numerical summaries of data. They are part of the model. They don't come from the data. Rather, they are numbers that we choose to help specify the model.