Math Chapter 12
If n data items are arranged in order, from smallest to largest, the data item in the middle is the value in _____________ position.
(n+1)/2
A percentile interpretation of 𝑧-scores provides the percentage of data items that are less than any one data item. So, ......
......if 𝑛% of items in a distribution are less than a particular data item; we say that the data item is in the 𝑛th percentile of the distribution.
Changing standard deviation affects a graph's height, while changing a graph's mean affects the graph's......
......location on the horizontal axis.
To Find the Quartiles of a Data Set: 1) ______________ the data from smallest to largest. 2) Find the median (__________________) of the data set. If there are an odd number of data items, the median is the middle value. If there are an even number of data items, the median is half-way between the two middle items. 3) Find the first quartile. The first quartile is the ____________ of the _________________ of the data set. 4) Find the third quartile, which is the ____________ of the ___________________ of the data set.
1) Order 2) 2nd quartile 3) median, lower half 4) median, upper half
Steps for standard deviation
1)Find the mean of the data set. 2)Make a chart or table with three columns: Data, Data - Mean, (Data - Mean)^2 3)List the data vertically under the column labeled, "Data." 4)Subtract the mean from each piece of data and write the difference value in the Data - Mean column. 5)Square the values obtained in the previous step; and write the squares in the (Data − Mean)^2 column. 6)Compute the sum of the values in the last column, (Data − Mean)^2. 7)Divide the sum Step 6 by 𝑛−1, where 𝑛 is the number of pieces of data. 8)Compute the square root of the number obtained in Step 7. This number is the standard deviation of data set.
(T/F) The sum of the deviations from the mean for a data set is always zero.
True
(T/F) The mean, median, and mode of a normal distribution are all equal.
True
(T/F) My table showing z-scores and percentiles displays the percentage of data items less than a given value of z.
True. The statement makes sense, as this is the standard way such tables are organized.
Data distribution types
Uniform (rectangle), bimodal (graphed curves/stereotypical graph), skewed right (a bunch of small rectangles, vertical bar graph, highest data on left side), skewed left (a bunch of small rectangles, vertical bar graph, highest data on right side)
The measure of central tendency that is the data item in the middle of ranked, or ordered, data is called the _____.
median
6) The __________________________ is the value halfway between the lowest (𝐿) and highest (𝐻) values in a set of data. Midrange=(lowest value + highest value)/(2)
midrange
If a statistic is obtained from a random sample of size n, there is a 95% probability that it lies within ((1/(sqrt(n)))*100%) of the true population percent, where ±((1/(sqrt(n)))*100%) is called the _____.
margin of error
The measure of central tendency that is found by adding the lowest and highest data values and dividing the sum by 2 is called the ________.
midrange
5) The __________________ is the piece of data that occurs most frequently.
mode
A data set can contain more than one _________, or no ________ at all.
mode
A data value that occurs most often in a data set is the measure of central tendency called the _____.
mode
1) The graph of a normal distribution is called a ____________________________.
normal curve
Relative frequency histograms that are symmetric and bell-shaped are ___________________________. This shape is called a __________________________.
normal distributing, normal curve
If n% of the items in a distribution are less than particular data item, we say the data item is in the nth __________ of the distribution.
percentile
1) The __________________________ is the set containing all the people or objects whose properties are to be described and analyzed by the data collector.
population
A ______________ is the set of all the people or objects whose properties are to be described and analyzed by the data collector
population
For data that is approximately normally distributed, the 𝑧 - score describes how many standard deviations a data item lies above or below the mean. To compute a 𝑧 - score, use the formula: 𝑧−𝑠core=(data item−mean)/(standard deviation). Data items above the mean have ________________ 𝑧 - scores. Data items below the mean have __________________𝑧 - scores. The 𝑧 - score for the mean is __________.
positive, negative, o
Measures of position are often used to make comparisons. Two common measures of position are ________________________ and ____________________________.
quartiles, percentiles
A sample obtained in such a way that every element in the population has an equal chance of being selected is called a/an _______ sample.
random
1) A _____________________________________ is a sample obtained in such a way that every element in the population has an equal chance of being selected for the sample.
random sample
2) The ____________________ is the difference between the highest and lowest values. It indicates the total spread of data. Range = highest value− lowest value
range
The difference between the highest and lowest data values in a data set is called the _______.
range
The line that best fits a set of points is called a/an ___________.
regression line
5) _______________________ uses data that is easily and readily obtained; and can be extremely biased.
Convenience sampling
1) ________________________ is concerned with the collection, organization, and analysis of data; and
Descriptive
3) A ______________________________________ is a sample that exhibits characteristics typical of those possessed by the target population. It is a small replica of the entire population.
representative sample
2) A ___________________ includes some of the items in the population.
sample
1) A _____________________________ is a graph that uses dots to represent values of two different variables. Scatter plots are used to observe relationship between the two variables.
scatter plot
A set of points representing data is called a/an __________.
scatter plot
Rules for Data Grouped by Classes: 1) The classes should be the same ____________________. 2) The classes should not ___________________________. 3) Each data item should belong to only ________________ class.
width, overlap, one
Regression line equation
y=mx+b or y=ax+b
z-score formula
z-score=((mean - data item)/(standard deviation))
The sum of the deviations from the mean for a data set is always _________.
zero
Mean=𝑥̅=(Σ𝑥f)/(𝑛)
𝑥 represents each data value; f represents the frequency of the data value; Σ𝑥f represents the sum of all the products obtained by multiplying each data value by its frequency; And 𝑛 represents the total frequency of the distribution.
In a normal distribution, approximately __________% of the data items fall within 1 standard deviation of the mean, approximately __________% of the data items fall within 2 standard deviations of the mean, and approximately __________% of the data items fall within 3 standard deviations of the mean.
69, 95, 99.7
2) ________________________ is concerned with making generalizations about and drawing conclusions and predictions from the data collected. _______________ is the numerical information obtained.
Inferential, Data
The 68-95-99.7 rule only applies to percentages between z-scores of ±1, ±2, and ±3, respectively.
Integers means whole numbers
____________________________ are used to describe the spread of data items in a data set.
Measures of dispersion
(Highest number-lowest number)/2
Midrange
____________________________ is the art and science of gathering, analyzing, and making inferences (predictions) from numerical information, data, obtained in an experiment. Statistics is divided into two main branches, they are:
Statistics
4) _____________________________________ involves dividing the population by characteristics calledstratifying factors such as gender, race, religion, or income.
Stratified sampling
What does a z-score measure?
The number of standard deviations above or below the mean a specified data item is.
(T/F) If r=1, changes in one variable cause changes in another variable.
The statement is false because while an increase in one variable is always accompanied by an increase in the other variable, the relationship is not necessarily causative.
(T/F) In a normal distribution, the z-score for the mean is 0.
The statement is true because a z-score describes how many standard deviations a data item in a normal distribution lies above or below the mean.
(T/F) A call-in poll on radio or television is not reliable because the sample is not chosen randomly from a larger population.
The statement is true because people choose to call in. They are not randomly selected.
(T/F) Numbers representing what is average or typical about a data set are called measures of central tendency.
The statement is true because these numbers are generally located toward the center of a distribution.
(T/F) A score in the 50th percentile on a standardized test is the median.
The statement is true because the median, or second quartile, is the 50th percentile.
(T/F) If r=0, there is no correlation between two variables.
The statement is true because when r=0, a change in one variable tends not to be accompanied by any specific change in the other variable.
2) The __________________________________________, or simply ___________, has the symbol 𝑥̅, when it is a sample of a population; and has the symbol of Greek letter,μ, for the entire population.
arithmetic mean, mean
1) An ____________________ is a number that is representative of a group of data.
average
2) The normal curve is ____________________ and ________________________ about the mean.
bell-shaped, symmetric
3) In a normal distribution, the MEAN, MEDIAN, and MODE all have the same value; and all occur at the ___________________ of distribution.
center
3) A _________________________________ is sometimes referred to as an area sample because it is frequently applied on a geographical basis.
cluster sample
The _______________________, r is a number between -1 and 1. When all points in a scatter plot fall on the regression line, the value of the correlation coefficient will be either -1 or 1.
correlation coefficient
A measure that is used to describe the strength and direction of a relationship between variables whose data points lie on or near a line is called the ___________, ranging from r=_______ to r=________.
correlation coefficient, -1, 1
6) A piece of data, or _______________________ is a single response to an experiment.
data item
7) A ____________________ is the value of the data item.
data value
8) The ______________________ of a data value or item is the number of times that value occurs.
frequency
9) A _____________________________ is the listing of observed values and the corresponding frequency occurrence of each value.
frequency distribution
If data values are listed in one column and the adjacent column indicates the number of times each value occurs, the data presentation is called a _______.
frequency distribution
11) A _____________________________________ is a line graph with observed values on its horizontal scale and frequencies on the vertical scale
frequency polygon
If the midpoints of the tops of the bars of a histogram are connected with straight lines, the resulting line graph is a data presentation called a/an _______. To complete such a graph at both ends, the lines are drawn down to touch the _______.
frequency polygon, horizontal axis
Suppose a data presentation is given where data values are listed in one column and the adjacent column indicates the number of times each value occurs. If this data presentation is varied by organizing the data into classes, the data presentation is called a/an _______.
grouped frequency distribution
10) A ________________________________ is a graph with observed values on the horizonal scale and frequencies on the vertical scale.
histogram
Data can be displayed using a bar graph with bars that touch each other. This visual presentation of the data is called a/an _______. The heights of the bars represent the _______ of the data values.
histogram, frequencies
Often, we are interested in predicting or inferring from collected data, what may happen ____________ _______________. A tool useful in this endeavor is the equation of the line of best fit, which is the equation that "best fits" the data. The regression line is: y=𝑚𝑥+𝑏.
in the near future
2) A ________________________________ is used to determine whether there is a linear relationship between two quantities, and if so, the strength of that relationship.
linear correlation
The strength of the linear relationship between two variables can be measured by the ___________________________________ ___________________, denoted 𝑟. •If the value of 𝑟 is positive, as one variable ______________, the other also ________________. •If the value of 𝑟 is negative, as one variable ______________, the other ___________________. •The variable 𝑟 will always be a value between −1 and1, ______________.
linear correlation coefficient, increases, increases, increases, decreases, inclusive
3) The ___________, 𝑥̅ is the sum of the data divided by the number of pieces of data. The formula for calculating the mean is: 𝑥̅=(Σ𝑥)/(𝑛) where Σ𝑥 represents the sum of all data and 𝑛 represents the number of pieces of data.
mean
A z-score describes how many standard deviations a data item in a normal distribution lies above or below the _____.
mean
Σx/n, the sum of all the data items divided by the number of dataitems, is the measure of central tendency called the________.
mean
4) The ______________________________ is the value in the middle of a set of ranked data. If the middle is two values, then the mean of the middle two values is the median.
median
If one or more data items are much greater than the other items, the _________, rather than the mean, is more representative of the data.
median
Measures of dispersion get __________ as the spread of data decreases
smaller
1) Measures of dispersion are used to indicate the ______________________.
spread of data
The _______________________________ measures how much the data differ from the mean. The notation is 𝑠𝑠when used to calculate for a sample; and σ, lower case Greek sigma, when used to calculate for a population. The formula is: s=sqrt((Σ(x-𝑥̅)^2)/(n-1))
standard deviation
The formula sqrt((∑(data item−mean)^2)/(n−1)) gives the value of the _____ for a data set.
standard deviation
12) A _________________________________________ is a tool that organizes and groups data while allowing us to see the actual values of each data item. The left group of digits is called the ___________and the right group of digits is called the ___________.
stem & leaf plot, stem, leaf/leaves
One advantage of a _________ plot is that it displays the data items.
stem-and-leaf
A data presentation that separates each data item into two parts is called a _______.
stem-and-leaf plot
2) The sample is a ________________________________ if the sample is obtained by selecting every nth item on a list or production line.
systematic sample