chapter 1
A physician survey for a common medication in five different pharmacies and reported the average price as $16. Four of the five stores and had prices $12, $15, $17, and $22. Assuming that the survey is correct, what is the price of the medication at the other store? Hint: Use the formula for the mean and solve for the missing value.
$14
variable
a characteristic of a case
label
a special variable used in some data sets to uniquely identify different cases
boxplots
based on the five-number summary are useful for comparing several distributions. the box spans the quartiles and shows the spread of the central half of the distribution. the median is marked within the box. lines extend from the box to the extremes and show the full spread of the data
side-by-side boxplots
can be used to display box plots for more than one group of the same graph
a variable is a characteristic of a
case
how many
cases does the data set contain?
the five-number summary
consisting of the median, the quartiles, and the smallest and largest individual observations, provides a quick overall description of a distribution. the median describes the center, and the quartiles and extremes show the spread
The distribution of a categorical variable is displayed using _______.
counts or percentages
shape, center, & spread
describe the overall pattern of a distribution
observation
describes the data for a particular case
values
different cases can have different values of the variables
bar graphs & pie graphs
display the distribution of a categorical variable
stemplots & histograms
display the distributions of a quantitative variable
why, what purpose
do the data have? do we hope to answer some specific questions? do we want to draw conclusions about cases other than the ones we actually have data for? are the variables that are recorded suitable for the intended purpose?
The tails of a distribution show the _______.
extreme values
the 1.5 x IQR rule
flags observations more than 1.5 x IQR beyond the quartiles as possible outliers
the first quartile
has 1/4 of the observations below it
the third quartile
has 3/4 of the observations below it
linear transformations
have the form Xnew = a + bX. changes the origin of a doesn't equal zero and changes the size of the unit of measurement if b > 0. do not changes the overall shape of a distribution. multiplies a mature of spread by b and changes a percentile or measure of center m into a + bm
what
how many variables do the data contain? what are the exact definitions of these variables? what are the units of measurement for each quantitative variable?
units of measurement
important part of the description of a quantitative variable, may not always be obvious
resistant measure
is relatively unaffected b y changes in the numerical value of a small proportion of the the total number of observations, not matter how large these changes are. the median and quartiles are resistant, but the mean and standard deviation are not
the standard deviation
is zero when there is no spread and gets larger as the spread increases
distribution of a categorical variable
lists the categories and give other the count of the percent of cases that fall in each category
median
midpoint
outliers
observations that lie outside the overall pattern of a distribution
distribution
of a variable tells us what values it takes and how often it takes these values
Categorical variables are best displayed by ______.
pie charts or bar graphs
categorical variable
places a case into one of several groups or categories
modified boxplot
points identified by the 1.5 x IQR rule are plotted individually
Variables that take numeric values for which arithmetic operations make sense are called _______.
quantitative
Suppose you are interested in comparing the quality of different hospitals based on infections occurred from surgery. What would be the best way to measure such a variable?
rate of infections
qualitative variable
takes numerical values for which arithmetic operations such as adding and averaging make sense
mean
the arithmetic average of the observations
The first day of class the professor collects information on each student to make a data set that will be analyzed throughout the semester. The information asked includes hometown, GPA, number of classes, number of siblings, and favorite subject. What are the cases in this data set?
the college students
proportion
the count divided by the sum of the counts
Researchers are conducting a study on cortisol levels in different socio-economic demographic groups. Which of the following variables is categorical?
the county of residence
interquartile range
the difference between the quartiles. it is the spread of the center half of the data
A description of different houses on the market includes the following three variables. Which of these variables is quantitative?
the monthly gas bill, the square footage of the house, the monthly electric bill
cases
the objects described by a set of data
What are labels used for in data sets?
to identify cases
spreadsheet
useful for doing simple computations
exploratory data analysis
uses graphs and numerical summaries to describe the variables in a data set and the relations among them
predictive analytics
using descriptions to predict something in the future
Variables take on ________.
values
who
what cases do the data describe?
time plot
when observations on a variable are taken over time, graphs time horizontally and the values of the variable vertically, can reveal interesting patterns in a set of data
A particularly common question in the study of wildlife behavior involves observing contests between "residents" of a particular area and "intruders." In each contest, the residents either win or lose the encounter (assuming there are no ties).Observers might record several variables, some of which are listed below. Which of these variables is categorical?
whether the residents win or lose