Statistics Exam 1

Ace your homework & exams now with Quizwiz!

How many classes are recommended in a histogram of a data set with more than 50 observations?

Fewer than 25 = 5-6 classes 25-50 = 7-14 classes More than 50 = 15-20 classes

To what kinds of data sets can the empirical rule be applied?

The empirical rule is based on observable evidence and can be applied to data sets with frequency distributions that are mound-shaped and symmetric. The mean is the middle.

In a histogram, what are the class intervals?

The possible numerical values of the quantitative variable are partitioned into class intervals, each of which has the same width.

Describe how the mean compares with the median for a distribution as follows: a) Skewed to the left b) Skewed to the right c) Symmetric

a) Mean is less than the median b) Mean is greater than the median c) Mean and median is the same/equal.

Explain the difference between a bar graph and a histogram.

-A bar graph is a graph in which the categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage. -A histogram is a graph in which the possible numerical values of the quantitative variable are partitioned into class intervals, each of which has the same width. These intervals form the scale of the horizontal axis. The frequency or relative frequency of observations in each class interval is determined. A vertical bar is placed over each class interval, with the height of the bar equal to either the class frequency or class relative frequency.

Explain the difference between a bar graph and a Pareto diagram.

-A bar graph is a graph in which the categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage. -A pareto diagram is a bar graph with the categories (classes) of the qualitative variable (i.e. the bars) arranged by height in descending order. The cumulative total is represented by a line. One goal of the Pareto diagram is to make it easy to locate the most important categories, those with the largest frequencies.

Explain the difference between a bar graph and a pie chart.

-A bar graph is a graph in which the categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage. -A pie chart is a graph in which the categories (classes) of the qualitative variable are represented by slices of a pie (circle). The size of each slice is proportional to the class relative frequency.

Give three different measures of central tendency.

1) The mean of a set of quantitative data is the sum of the measurements, divided by the number of measurements contained in the data set. Most popular and best understand measure of central tendency. 2) The median of a quantitative data set is the middle number when the measurements are arranged in ascending (or descending) order. When there is an even number, average the middle two. 3) The mode is the measurement that occurs most frequently in the data set.

Why would a statistician consider an inference incomplete without an accompanying measure of its reliability?

A statistician would consider an inference incomplete without an accompanying measure of its reliability because they want to know how good the inference is, or how reliable it is, since inferences contain a level of uncertainty. The measure of reliability which accompanies an inference separates the science of statistics from the art of fortune-telling as there exists a margin of error and variability in consistency. An inference without an accompanying measure of its reliability is nothing more than a guess.

What is the primary disadvantage of using the range to compare the variability of data sets?

The primary disadvantage of using the range to compare the variability of data sets is that the range does not always detect differences in data variation for large data sets. Two data sets can have the same range and be vastly different with respect to data variation.

What is the range of a data set?

The range of a quantitative data set is equal to the largest measurement minus the smallest measurement.

Explain the difference between a dot plot and a stem and leaf display.

-A dot plot is a graph in which the numerical value of each quantitative measurement in the data set is represented by the a dot on a horizontal scale. When data values repeat, the dots are placed above one another vertically. -A stem and leaf display is a display in which the numerical value of the quantitative variable is partitioned into a "stem" and "leaf". The possible stems are listed in order in a column. The leaf for each quantitative measurement in the data set is placed in the corresponding stem row. Leaves for observations with the same stem value are listed in increasing order horizontally. The stem/number which is in parenthesis has its own frequency, has the greatest frequency.

Explain how populations and variables differ.

-A population is a set of all units (usually people, objects, transactions, or events) that we are interested in studying. -A variable is a characteristic or property of an individual experimental (or observational) unit in the population.

Explain how populations and samples differ.

-A population is a set of all units (usually people, objects, transactions, or events) that we are interested in studying. -A sample is a subset (portion) of all the units of the population. A sample is often used to study very large populations and learn about them. There has to be a match between sample and population, however samples are usually smaller.

What is a representative sample? What is its value?

-A representative sample exhibits characteristics typical of those possessed by the target population. -It ensures that every subset of fixed size in the population has the same chance of being included in the sample. Random selection also helps to eliminate sampling error by allowing each member of the larger group an equal chance of being chosen.

Explain the difference between descriptive and inferential statistics.

-Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, summarizes the information revealed in a data set, and presents that information in a convenient form. -Inferential statistics utilizes sample data to make estimates, decisions, predictions, other generalizations about a larger set of data.

Explain the difference between a measure of central tendency and a measure of variability.

-Measures of central tendency measure the tendency of the data to cluster, or center, about certain numerical values. -Measures of variability measure the spread of data.

List and define the five elements of an inferential statistical analysis.

-Population of interest: a population is a set of all units (usually people, objects, transaction, or event) about which we collect data. -One or more variables that are to be investigated: variables are characteristics or properties of an individual experiment or observation unit in the population. -The sample of population units: a sample is a subset of the units of a population. -The inference about the population based on information contained in the sample: a statistical inference is an estimate, prediction, or some other generalization about a population based on information contained in a sample. -A measure of the reliability of the inference: a measure of reliability is a statement (usually quantitative) about the degree of uncertainty associated with a statistical inference.

Explain the difference between quantitative and qualitative data.

-Quantitative data are measurements that are recorded on a naturally occurring numerical scale. Some examples are temperature or unemployment rate. -Qualitative data or categorical data are measurements that cannot be measured on a natural numerical scale. They can only be classified into one of a group of categories. Some examples are political party affiliation or the size of a car.

Describe the sample variance in words rather than with a formula. Do the same with the population variance.

-The sample variance, denoted by the symbol s^2, is used to represent the sample variance for a sample of 'n' measurements and is equal to the sum of the squared deviations from the mean, x̄, divided by (n-1). S is the the square root of the quantity (sample standard deviation). -The population variance, denoted by the symbol σ^2, is the average of the squared deviations from the mean, μ, of the measurements on all units in the population and σ (sigma) is the square root of the quantity (population standard deviation).

List the four major methods of collecting data and explain their differences.

1) From a published source such as a book, journal or newspaper. 2) From a designed experiment in which the researcher exerts strict control over the units (people, objects, or things) in the study. 3) From an observational study in which the researcher observes the experimental units in their natural setting and records the variable(s) of interest. 4) Survey, the most common type observational study, where the researcher samples a group of people, asks one or more questions, and records the responses. Most people respond correctly to surveys even though their answers may be distorted.

What two factors affect the accuracy of the sample mean as an estimate of the population mean?

1) The size of the sample. The larger the sample, the more accurate the estimate will tend to be. 2) The variability, or spread, of the data. All other factors remaining constant, the more variable the data, the less accurate is the estimate.

Explain the concept of a skewed distribution.

A data set is said to be skewed if one tail of the distribution has more extreme observations than the other tail. -If the data is skewed to the right, then typically the mean is greater than the median which is greater than the mode. Rightward skewness is a result of high scores to the right. AKA positive skewness. -If the data is skewed to the left, then typically the mean is less than the median which is less than the mode. Mean is also less than mode. Leftward skewness is a result of low scores to the left. AKA negative skewness. -If the data is symmetric, then the mean equals the median and is equal to the mode, there is no skewness. Ideal situation.

To what kind of data sets can Chebyshev's rule be applied?

Chebyshev's rule can be applied to any data set, regardless of the shape of the frequency distribution of the data. It does not have to be mound-shaped.

If the standard deviation increases, does this imply that the data are more variable or less variable?

If the standard deviation increases, it implies that the data is more variable.

Explain the difference between class frequency, class relative frequency, and class percentage for a qualitative variable.

Qualitative data are nonnumerical in nature, thus the value of a qualitative variable can only be classified into categories called classes. We can summarize the data numerically in two ways, class frequency and class relative frequency. -Class frequency is the number of observations in the data set that fall into a particular class. -The class relative frequency is the proportion of the the total number of observations falling into each class. Class relative frequency is class frequency divided by the total number of observations in the data set. That is, class relative frequency = class frequency/n. -The class percentage is the class relative frequency multiplied by 100. That is, class percentage = (class relative frequency) x 100.

Define statistical thinking.

Statistical thinking involves applying rational thought and the science of statistics to critically assess data and inferences. Fundamental to the thought process is that variation exists in populations of data.

What is Statistics?

Statistics is the science of data, which involves collecting, classifying, summarizing, organizing, analyzing, presenting, and interpreting numerical and categorical information.

Explain the difference between the stem and the leaf in a stem-and-Ieaf display.

The stem is the portion of the measurement to the left of the decimal point while the remaining portion, to the right of the decimal point, is the leaf.

What is the symbol used to represent the sample mean? The population mean?

The symbol used to represent the sample mean is x̄. The symbol used to represent the population mean is the Greek letter μ.

Can the variance of a data set ever be negative? Explain. Can the variance ever be smaller than the standard deviation? Explain.

The variance of a data set cannot be negative because it is the sum of the squared deviation divided by a positive value. Variance can be smaller than the standard deviation if the variance is less than 1.

Treasury deficit prior to civil war. In Civil War History, historian Jane Flaherty researched the condition of the U.S. Treasury on the eve of the Civil war in 1861. Between 1854 and 1857 (under President Franklin Pierce), the annual surplus/deficit was +18.8, +6.7, +5.3 and +1.3 million dollars. In contrast between 1858 and 1861 (under President James Buchanan), the annual surplus/deficit was -27.3, -16.2, -7.2, and -25.2 million dollars respectively. Flaherty used these data to aid in portraying the exhausted condition of the U.S. Treasury when Abraham Lincoln took office in 1861. Does this study represent a descriptive or inferential statistical study? Explain.

This study represents a descriptive statistical study because it utilizes numerical methods to look for patterns a data set, summarizes the information revealed, and presents the information in a convenient form.

For a set of data with a mound-shaped relative frequency distribution, what can be said about the percentage of the measurements contained in each of the intervals specified in the previous flashcard?

a) Approximately 68% of the measurements will fall within one standard deviations of the mean. b) Approximately 95% of the measurements will fall within two standard deviations of the mean. c) Approximately 99.7% (essentially all) of the measurements will fall within three standard deviations of the mean.

The output from a statistical computer program indicates that the mean and standard deviation of a data set consisting of 200 measurements are $1,500 and $300 respectively. a) What are the units of measurement of the variable of interest? On the basis of the units, what type of data is this, quantitative or qualitative? b) What can be said about the number of measurements between $900 and $2100? between $600 and $2400? between $1200 and $1800? between $1500 and $2100?

a) Dollars, quantitative. b) At least 3/4, at least 8/9, nothing, nothing.

In the Journal of Earthquake Engineering a team of civil and environmental engineers studied the ground motion characteristics of 15 earthquakes that occurred around the world since 1940. Three (of many) variables measured on each earthquake were the type of ground motion (short, long, or forward directive), the magnitude of the earthquake (on the Richter scale), and peak ground acceleration (feet per second). One of the goals of the study was to estimate the inelastic spectra of any ground motion cycle.

a) Identify the experimental units for this study. b) Do the data for the 15 earthquakes represent a population or a sample? Explain. c) Define the variables measured and classify them as quantitative or qualitative. a) The experimental units for this study are the earthquake sites. b) The data for the earthquakes represent a sample because the population of earthquakes that have occurred around the world since 1940 is much larger than 15. c) Type of ground motion: qualitative, magnitude of earthquake: quantitative, peak ground acceleration: quantitative.

STEM Experiences for Girls: NSF promotes girl's participations in informal science, technology, engineering, & math (STEM). What has been the impact of these informal STEM experiences? This was the question of interest in the published study Cascading Influences: Long-Term Impacts of Informal STEM Experiences for Girls (Mar. 2013). A sample of 159 young women who recently participated in a STEM program were recruited to complete an online survey. Of these, only 27% felt that participation in the STEM program increased their interest in science.

a) Identify the population of interest to the researchers. b) Identify the sample. c) Use information in the study to make an inference about the relevant population. a) All young women who recently participated in a STEM program. b) 159 young women recruited to complete an online survey. c) The NSF is interested in knowing what the impact of informal STEM experiences has had on girls who participated in them. After selecting a sample of 159 young women to complete an online survey, they found that only 27% felt that participation in the STEM program increased their interest in science, which means that the majority did not feel it drew them to science very much.

For any set of data, what can be said about the percentage of the measurements contained in each of the following intervals? a) x̄ - s to x̄ + s b) x̄ - 2s to x̄ + 2s c) x̄ - 3s to x̄ + 3s

a) It is possible that very few of the measurements will fall within one standard deviation of the mean. b) At least 3/4 of the measurements will fall within two standard deviations of the mean. c) At least 8/9 of the measurements will fall within three standard deviations of the mean.

National Bridge Inventory. All highway bridges in the United States are inspected periodically for structural deficiency by the Federal Highway Administration (FHWA). Data from the FHWA inspections are compiled into the National Bridge Inventory (NBI). Several of the nearly 100 variables maintained by the NBI are listed next. Classify each variable as quantitative or qualitative.

a) Length of maximum span (feet) b) Number of vehicle lanes c) Toll bridge (yes or no) d) Average daily traffic e) Condition of deck (good, fair, or poor) f) Bypass or detour length (miles) g) Type of route (interstate, U.S state, county or city) a) Quantitative b) Quantitative c) Qualitative d) Quantitative e) Qualitative f) Quantitative g) Qualitative

Sea buckthorn, a plant that typically grows at high altitudes in Europe and Asia has been found to have medicinal value. The medicinal properties of berries collected from sea buckthorn were investigated in Academia Journal of Medicinal plants. The following variables were measured for each plant sampled. Identify each as producing quantitative or qualitative data.

a) Species of sea buckthorn (H. rhamnoides, H. gyantsensis, H. neurocarpa, H. tibetana, H. salicifolia). b) Altitude of collection location (meters). c) Total flavonoid content in berries (milligrams per gram). a) Qualitative b) Quantitative c) Quantitative

Study of quality of drinking water. Disasters published a study of the effects of a tropical cyclone on the quality of drinking water on a remote Pacific island. Water samples (size 500 milliliters) were collected approximately four weeks after Cyclone Ami hit the island. The following variables were recorded for each water sample. Identify each variable as quantitative or qualitative.

a) Town where sample was collected b) Type of water supply (river intake, stream, or borehole) c) Acidic level (pH scale 1 to 14) d) Turbidity level (nephalometric turbidity units = NTUs) e) Temperature (degrees centigrade) f) Number of fecal coliforms per 100 milliliters g) Free-chlorine residual (milligrams per liter) h) Presence of hydrogen sulphide (yes or no) a) Qualitative b) Qualitative c) Quantitative d) Quantitative e) Quantitative f) Quantitative g) Quantitative h) Qualitative

Corrosion prevention of buried steel structures. Engineers have designed tests on underground steel structures that measure the potential for corrosion. In Materials Performance, two tests for steel corrosion called "instant-off" and "instant-on" potential were compared. The tests were applied to buried piping at a petrochemical plant. Both the "instant-off" and "instant-on" corrosion measurements were made at each of 19 different randomly selected pipe locations. One objective of the study is to determine if one test is more desirable (i.e., can more accurately predict the potential for corrosion) than the other when applied to buried steel piping.

a) What are the experimental units for this study? b) Describe the sample. c) Describe the population. d) Is this an example of descriptive or inferential statistics? a) The experimental units for this study are the pipe locations. b) The sample is the 19 different randomly selected pipe locations. c) The population is the set of all possible pipe locations. d) This is an example of inferential statistics because they want to come to a decision about the tests.

Drafting NFL Quarterbacks: The Journal of Productivity Analysis published a study of how successful National Football League (NFL) teams are in drafting productive quarterbacks. Data were collected for all 331 quarterbacks drafted over a 38-year period. Several variables were measured for each QB, including draft position (one of the top 10 players picked, selection between picks 11-50, or selected after pick 50), NFL winning ratio (percentage of games won), and QB production score (higher scores indicate more productive QBs). The researchers discovered that draft position was only weakly related to a quarterback's performance in the NFL. They concluded that "quarterbacks taken higher [in the draft] do not appear to perform any better."

a) What is the experimental unit for this study? b) Identify the type (quantitative or qualitative) of each variable measured. c) Is the study an application of descriptive or inferential statistics? Explain. a) The experimental unit for this study is all 331 quarterbacks drafted for the NFL over a 38 year period. b) Draft position: qualitative, NFL winning ratio: quantitative, QB production score: quantitative. c) This study is an application of inferential statistics because it utilizes sample data to make estimates, decisions, predictions, other generalizations about a larger set of data and because it makes a conclusion about the draft position affecting QB performance.


Related study sets

Sociology - CH 1: An introduction to sociology

View Set

ADV ECONOMICS: Chapters 3 and 4 - Supply and Demand

View Set

Compensation Test 3 Chapter 7 Slides

View Set

Abnormal Psych Quizzes Chapter 1-4 Oltmanns

View Set