Chapter 3-4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Lower and Upper Quartile

The 25th percentile is called the _________________ quartile. The 75th percentile is called the ______________quartile. One quarter of the data fall below the lower quartile. One quarter fall above the upper quartile.

interquartile

The Quartiles and the Median Split a Distribution into Four Equal Parts. The ___________________ range describes the spread of the middle half of the distribution.

Mean

The ___________ is the sum of the observations divided by the number of observations. The mean is often called the average.

Deviation

The ____________ of a observation yi from the sample mean "y" ̅ is (yi − "y" ̅), the difference between them.

Variance

The ______________ is approximately the average of the squared deviations.

Pth percentile

The _______________________ is the point such that p% of the observations fall below or at that point and (100 − p)% fall above it.

box plot

The five-number summary consisting of (minimum, lower quartile, median, upper quartile, maximum) is the basis of a graphical display called the _____________________ that summarizes center and variability

outlier

The formula for the mean uses numerical values for the observations. The mean can be highly influenced by an observation that falls well above or well below the bulk of the data, called an _____________. The mean is pulled in the direction of the longer tail of a skewed distribution, relative to most of the data.

mutually exclusive

The intervals of values in frequency distributions are usually of equal width. The intervals should include all possible values of the variable. Any possible value must fit into one and only one interval; that is, they should be ________________________________.

Whiskers

The lines extending from the box are called _______________. The __________________ extend to the maximum and minimum, except for outliers, which are marked separately.

unimodal

The mean, median, and mode are identical for a ____________, symmetric distribution, such as a bell-shaped distribution. The mean, median, and mode are complementary measures. They describe different aspects of the data.

percentiles

The range uses two measures of position, the maximum value and the minimum value. The median is a measure of position, with half the data falling below it and half above it. The median is a special case of a set of measures of position called _________________.

The sample mean is defined as "y" ̅ = ("y1 + y2+" ⋯" + yn" )/"n" . The symbol________sigma represents the process of summing. yi represents the sum y1+ y2+ ⋯ + yn. Using this summation symbol, we have the sample mean of n observations, "y" ̅ = (∑▒"yi" )/"n " .

52

This rate measures the number of violent crimes in that state per 10,000 population. If a state had 12,000 violent crimes and a population size of 2,300,000, its violent crime rate was (12,000/2,300,000) × 10,000 = _____.

typical

This section presents statistics that describe the center of a frequency distribution for a quantitative variable. The statistics show what a __________ observation is like.

Interval

To summarize the data with a frequency distribution, Divide the measurement scale for violent crime rate into a set of intervals and count the number of observations in each _____________. Here, we use the intervals {0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69}.

relative frequency distribution

When the table shows the proportions or percentages instead of the numbers, it is called a _____________________.

S Larger Redcaled

____ ≥ 0. s = 0 only when all observations have the same value. The greater the variability about the mean, the ___________is the value of s. The reason for using (n − 1), rather than n, in the denominator of s is technical. If the data are rescaled, the standard deviation is also _________.

S

_________ is a sort of typical distance of an observation from the mean. So, the larger the standard deviation, the greater the spread of the data.

Frequency Distribution

___________________ - A frequency distribution is a listing of possible values for a variable, together with the number of observations at each value.

Median

_____________________ is the observation that falls in the middle of the ordered sample. When the sample size n is odd, a single observation occurs in the middle. When the sample size is even, two middle observations occur, and the median is the midpoint between the two. The middle observation has the index (n+1)/2. That is, the median is the value of observation (n+1)/2 in the ordered sample.

binary (0, 1) data

_______________________, proportion = mean When observations take values of only 0 or 1, the mean equals the proportion of observations that equal 1 - Generally, for highly discrete data, the mean is more informative than the median.

Empirical Rule

_______________________- If the histogram of the data is approximately bell shaped, then 1. About 68% of the observations fall between "y" ̅ − s and "y" ̅ + s. 2. About 95% of the observations fall between "y" ̅ − 2s and "y" ̅ + 2s. 3. All or nearly all observations fall between "y" ̅ − 3s and "y" ̅ + 3s.

stem-and-leaf plot

__________________________ plot, represents each observation by its leading digit(s) (the stem) and by its final digit (the leaf ). Each stem is a number to the left of the vertical bar and a leaf is a number to the right of it.

Range

________________is the difference between the largest and smallest observations. For nation A, in the previous figure, the range of income values is about $50,000 − $0 = $50,000. For nation B, the range is about $30,000 − $20,000 = $10,000. Nation A has greater variability of incomes.

Standard deviation

s = √(("(yi − " "y" ̅")2 " )/"n − 1 " ) = √("sum of squared deviations " /"sample size − 1 " ).

observation

- An _______________ is an outlier if it falls more than 1.5(IQR) above the upper quartile or more than 1.5(IQR) below the lower quartile. In box plots, the whiskers extend to the smallest and largest observations only if those values are not outliers

bar graph

A ____________ graph has a rectangular bar drawn over each category. The height of the bar shows the frequency or relative frequency in that category. The bars are separated to emphasize that the variable is categorical rather than quantitative. The order of presentation for an ordinal variable is the natural ordering of the categories.

pie chart

A _____________ is a circle having a "slice of the pie" for each category. The size of a slice represents the percentage of observations in the category. A bar graph is more precise than a pie chart for visual comparison of categories with similar relative frequencies.

histogram

A graph of a frequency distribution for a quantitative variable is called a ________________. Each interval has a bar over it, with height representing the number of observations in that interval.

U bell

A group for which the distribution is bell shaped is fundamentally different from a group for which the distribution is U-shaped. A _______-shaped distribution indicates a polarization on the variable between two sets of subjects. A ________-shaped distribution indicates that most subjects tend to fall near a central value.

center

A measure of __________ is not adequate for numerically describing data for a quantitative variable. It describes a typical value, but not the spread of the data about that typical value. This section introduces statistics that describe the variability of a data set.

data

A stem-and-leaf plot conveys information similar to a histogram. Turned on its side, it has the same shape as the histogram. In fact, since the stem-and-leaf plot shows each observation, it displays information that is lost with a histogram. Stem-and-leaf plots are useful for quick portrayals of small ________ sets.

(2/3)s (4/3)s

An advantage of the IQR is that it is not sensitive to outliers. For bell-shaped distributions, the distance from the mean to either quartile is about two-thirds of a standard deviation, ________________. Then, IQR equals approximately __________________.

position

Another way to describe a distribution is with a measure of _______________. This tells us the point at which a given percentage of the data fall below (or above) that point. As special cases, some measures of position describe center and some describe variability.

Shape

Another way to describe a sample or a population distribution is by its __________.

Z-score

Another way to measure position is by the number of standard deviations that a value falls from the mean. The number of standard deviations that an observation falls from the mean is called its ______________________. = observation-mean/Standard Deviation

3

By the Empirical Rule, for a bell-shaped distribution it is very unusual for an observation to fall more than ________standard deviations from the mean. An alternative criterion regards an observation as an outlier if it has a z-score larger than 3 in absolute value.

two

Choosing intervals for frequency distributions and histograms is primarily a matter of common sense. Ideally,____ observations in the same interval should be similar in a practical sense

weighted average

Denote the sample means for two sets of data with sample sizes n1 and n2 by ("y1" ) ̅ and ("y2" ) ̅. The overall sample mean for the combined set of (n1+n2) observations is the ______________________________

Bell-Shaped

Empirical Rule For _______________Frequency Distributions, the Empirical Rule Specifies Approximate Percentages of Data within 1, 2, and 3 Standard Deviations of the Mean.

discrete continuous software

For a ________________ variable with relatively few values, a histogram has a separate bar for each possible value. For a _______________variable or a discrete variable with many possible values, you need to divide the possible values into intervals. Statistical ____________ can automatically choose intervals for us and construct frequency distributions and histograms.

quantitative

Frequency distributions and graphs are useful for ___________________variables.

1. Population 2. Sample data

Frequency distributions and histograms apply both to a population and to samples from that population. The first type is called the ________________ distribution, and the second type is called a ________________data distribution

Numerical

Generally, for highly discrete data, the mean is more informative than the median. - If a distribution is highly skewed, the median is better than the mean in representing what is typical. -If the distribution is close to symmetric or only mildly skewed or if it is discrete with few distinct values, the mean is usually preferred over the median, because it uses the _____________ values of all the observations.

μ σ

Greek letters denote parameters. For example, ___ (mu) and __ (sigma) denote the population mean and standard deviation of a variable.

Proportion percentage

Of 116.3 million households, 23.3 million were a married couple with children, for a proportion of 23.3/116.3 = 0.20. Since 0.20 is the ____________ of families that are married couples with children, the _______________ is 100(0.20) = 20%. The table also shows the percentages from the year 1970. We see a substantial drop since 1970 in the relative number of married couples with children.

Notation for Observations

Notation for Observations and Sample Mean -The sample size is symbolized by n. For a variable denoted by y, its observations are denoted by y1, y2, ..., yn. The sample mean is denoted by "y" ̅.

quantile

In proportion terms, a percentile is called a ________________.

Positive Negative

In standard deviation Each observation has a deviation. The deviation is ____________when the observation falls above the mean. The deviation is ____________ when the observation falls below the mean.

Relative proportion percentage

List the categories and show the number of observations in each category. Report proportions or percentages in the categories, also called ____________ frequencies. The _______________equals the number of observations in a category divided by the total number of observations. The ________________is the proportion multiplied by 100.

- Relative Frequency distributions - histograms - box

Many studies compare different groups on some variable. _________________________ distributions, ___________________, and side- by-side ____________________plots are useful for making comparisons.

Tail

Most distributions encountered in the social sciences are not symmetric. The parts of the curve for the lowest and the highest values are called the __________. A distribution is said to be skewed to the right or skewed to the left, according to which _________is longer. The longer tail indicates the direction of the skew.

mode

The ___________is the value that occurs most frequently. The ___________is most commonly used with highly discrete variables, such as with categorical data.

Median Mean Same

The __________is usually more appropriate than the mean when the distribution is very highly skewed. The _______ can be greatly affected by outliers, whereas the median is not. For the mean we need quantitative (interval-scale) data. The median also applies for ordinal scales. Quite different patterns of data can have the __________ median.

mean

The ________is the point of balance on the number line when an equal weight is at each observation point

50%

The box of a box plot contains the central _________ of the distribution, from the lower quartile to the upper quartile. The median is marked by a line drawn within the box.

Interquartile range

The difference between the upper and lower quartiles is called the ______________________, denoted by IQR. This measure describes the spread of the middle half of the observations.

symmetric

The distributions are ____________. The side of the distribution below a central value is a mirror image of the side above that central value.

Sum of Squares

The expression " M(yi − " "y" ̅")2" in these formulas is called a ________________________

Intensive

The median is ____________ to the distances of the observations from the middle, since it uses only the ordinal characteristics of the data.

outliers

The median is not affected by . __________

quantitative

The median, like the mean, is appropriate for_____________variables. For symmetric distributions, the median and the mean are identical. For skewed distributions, the mean lies toward the longer tail relative to the median.

Center

The median, the quartiles, and the maximum and minimum are five positions often used as a set to describe _____________ and spread.

bimodal

The mode is appropriate for all types of data. A frequency distribution is called _____________ if two distinct mounds occur in the distribution.


Kaugnay na mga set ng pag-aaral

MGMT 3810 test 1: my management lab quizzes

View Set

Environmental Science CH 15 (EXAM 3)

View Set

Kata sifat untuk warna-warna (Adjectives for colors)

View Set

第一章 财务管理基本原理及选择题

View Set

The Art of Public Speaking - Chapter 7

View Set