Statistics 201 Final

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

According to the empirical rule for symmetric distributions, how much of the data lies within two standard deviations of the mean?

95%

The mnemonic CVDOT is used to help remember the important characteristics of data. What does CVDOT stand for?

Center, Variation, Distribution, Outliers, Time

TRUE or FALSE: We may adjust the class limits of a frequency distribution to convenient or attractive values even if this causes the first or last class to be empty. Defend your answer.

FALSE. A frequency distribution can never have an empty first or last class

TRUE or FALSE: To find class width, subtract the lower class limit from the upper class limit. Defend your answer.

FALSE. Class width is the difference between any pair of consecutive lower class boundaries or consecutive upper class boundaries.

How do we identify "unusual values" in a data set using the range rule of thumb?

"unusual" values are more than two standard deviations away from the mean, either high or low

Give an example of a voluntary response study.

1) Email or Facebook surveys in which people decide whether and how to respond. 2) Magazine surveys in which people read the magazine and decide whether and how to respond. 3) Personal surveys or polls outside of a store or a mall in which people decide whether to stop and respond.

Examples of experimental studies

1) One section of Statistics students takes tests using a calculator, another section does not; the test results are compared. 2) In one group of new teachers, each teacher is assigned an experienced mentor; in another group of new teachers no mentors are assigned; satisfaction rates of teachers are compared at the end of the first year of teaching.

Examples of observational studies.

1) Watch the birds that come to a particular birdfeeder and record the number of each species that appears. 2) Watch the people entering the food court of a large mall and record the specific vendor they go to. 3) Watch people at a second floor landing in an office building and record how many use the elevator to go down to the first floor and how many use the stairs.

In a recent survey of 170 sociology students at the University of North Florida, 52.4% of the males strongly agreed that a married woman should take her husband's last name. How many male sociology students does this 52.4% represent?

170 students x 0.524 = 89 male students

In 2012, third baseman Miguel Cabrera of the Detroit Tigers baseball team had 205 hits in 622 times at bat. What is Miguel's batting average?

205/622=0.330

Population vs. sample

A sample is a subset of a population that can help provide insight into the population. A sample should be randomly collected and adequate in size.

What can be gained by presenting data in a frequency distribution? What is lost by this presentation?

Advantage of frequency distribution: large amounts of data can be grouped so that patterns can be seen. Disadvantage of frequency distribution: individual data values are lost.

What can be gained by presenting data in a relative frequency distribution? What is lost by this presentation?

Advantages of relative frequency distribution: clustering or patterns of distribution can be seen. Disadvantages of relative frequency distribution: individual data values are lost; sample size not known.

Choose all that apply: A data set may have outliers because: a. data entry error when data was recorded (typographical error). b. error in gathering data (incorrect measurement, incorrect instrument reading, faulty instrument, etc.). c. person responding to a survey question misunderstood the question. d. person responding to a survey question intentionally gave a bad answer. e. a data set has a legitimate unusual value.

All the answer choices apply.

What one single value is used to represent all the elements in any one class of a frequency distribution?

Class midpoint

Descriptive vs. inferential statistics

Descriptive statistics provide a concise summary of data. Inferential statistics use a random sample of data taken from a population to describe and make inferences about the population.

Discrete vs. continuous variables

Discrete: A finite number of values between any two values. A discrete variable is always numeric. For example, the number of customer complaints or the number of flaws or defects. Continuous: An infinite number of values between any two values. A continuous variable can be numeric or date/time. For example, the length of a part or the date and time a payment is received.

What do we call a graph in which data values are plotted as points or dots along a scale of values?

Dot-plot

TRUE or FALSE: If some class other than the first or last class of a frequency distribution is empty, we must adjust the class limits so that there is at least one data value in the class. Defend your answer.

FALSE. Any class other than the first or last may be empty; this gives important information about the distribution of the data.

TRUE or FALSE: The classes in a frequency distribution may overlap in order to make all the data fit. Defend your answer.

FALSE. Classes in a frequency distribution can never overlap. Overlapping would lead to confusion about where an individual datum value would be placed.

TRUE or FALSE: In a data set with one really extreme value, the mean would be a good choice for a reliable measure of central tendency. Defend your answer.

FALSE. The mean is sensitive to the extreme value; a better choice is the median because it is not sensitive to an isolated extreme value.

TRUE or FALSE: The variance is the square root of the standard deviation. Defend your answer.

FALSE. This statement is backwards: the standard deviation is the square root of the variance.

TRUE or FALSE: Very large samples guarantee sound statistical results. Defend your answer with a brief explanation.

FALSE: If the sample data has been gathered in an inappropriate fashion, even a large sample will be bad.

TRUE or FALSE: Correlation implies causality. Defend your answer with a brief explanation.

FALSE: See page 9 in the text: Do not use a correlation between two variables as a justification for concluding that one of the variables is the cause of the other variable.

Example of determining percentile for a set.

For example, suppose you have 25 test scores, and in order from lowest to highest they look like this: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99. To find the 90th percentile for these (ordered) scores, start by multiplying 90% times the total number of scores, which gives 90% ∗ 25 = 0.90 ∗ 25 = 22.5 (the index). Rounding up to the nearest whole number, you get 23. Counting from left to right (from the smallest to the largest value in the data set), you go until you find the 23rd value in the data set. That value is 98, and it's the 90th percentile for this data set.

What do we call a line graph that displays cumulative frequency values?

Ogive

Identify two details that might make a graph misleading or inaccurate.

One scale that does not start at zero. use of pictographs use of areas or volumes to represent one-dimensional data

What do we call a bar graph for qualitative data?

Pareto chart

What is the interquartile range?

Q3-Q1

What do we call a graph of pairs of data values using a horizontal and a vertical axis?

Scatter plot

Systematic sample at a large department store

Selecting every 5th shopper entering the store.

TRUE or FALSE: The standard deviation of a set of data values is never a negative number. Defend your answer.

TRUE. By definition, standard deviation is the principal square root of the variance so it is always non- negative.

TRUE or FALSE: The modal class of a grouped frequency distribution is the class that has the highest frequency. Defend your answer.

TRUE. This is by definition.

TRUE or FALSE: A sample of 1250 responders to an online survey is an example of a self-selected sample. Defend your answer with a brief explanation.

TRUE: The responders are those who chose to answer the online survey.

Explain why there is no 100th-percentile.

The 100th -percentile would mean that 100% of the data values would be below this position. That would make this group empty.

Binomial experiment

The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.

Name one guaranteed characteristic of an ogive.

The graph never decreases; it is always either increasing or constant.

Characteristics of a normal distribution.

The mean, median, and mode of a normal distribution are equal. The area under the normal curve is equal to 1.0. Normal distributions are denser in the center and less dense in the tails. Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).

How to determine percentiles for a data set.

The median: 50th percentile (50% above and below). To calculate kth percentile: 1) Order data values in ascending order. 2) Multiply k percent by number of values n. 3) If the number obtained isn't a whole number, round it up to nearest whole number. Count the values in the set from left to right until you reach the number indicated from rounding. 4) If the number obtained is a whole number, count until you reach that number.

Type I error vs type II error

Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α. Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by β. The probability of not committing a Type II error is called the Power of the test.

Under what circumstances would we use the coefficient of variation to compare variation in different data sets?

Use the coefficient of variation when comparing two data sets with different scales or units, or when the data sets have substantially different means.

Random variable

When the value of a variable is the outcome of a statistical experiment , that variable is a random variable.

The selling prices of a sample of homes in a particular neighborhood are gathered. There is one extremely high value in this set (a high outlier). A. Which measures of center would be the most affected by this extreme value? B. Which measures of center would be least affected by this extreme value?

a. Most affected are the mean and midrange. b. Least affected is the median - it is not affected by the one outlier value

Round off rules: a. Round population parameter values to ___________________________________. b. Round sample statistic values to _______________________________________. c. Round - z scores to ___________________________________________________.

a. one more place than is present in the original data values b. one more place than is present in the original data values c. two decimal places

What is the formula for finding the class midpoint?

class midpoint = (upper class limit - lower class limit)/2

A percentile measures location by ________________________________________________.

dividing data into 100 groups each containing roughly 1% of data

Frequency distribution from data set

http://www.statisticshowto.com/how-to-draw-a-frequency-distribution-table/

A measure of center that is not affected by extreme values is ___________________________________.

median

What is the "range rule of thumb?"

minimum "usual" value = (mean) − 2 × (standard deviation) maximum "usual" value = (mean) + 2×(standard deviation)

A measure of center that may not be unique is _______________________________________________.

mode

A numerical measure describing some feature of a population is called a __________; a numerical measure describing some feature of a sample is called a __________.

parameter; statistic

The symbol for sample standard deviation is _______________________________________.

s

Identify two important characteristics of a normal or bell-shaped distribution.

symmetric about the middle mean=median=mode asymptotic to the horizontal axis on both ends 68% of data falls within one standard deviation of the mean 95% of the data falls within two standard deviations of the mean 99.7% of the data falls within three standard deviations of the mean total area under curve = 1 curve lies completely above the horizontal axis

To construct a boxplot for a data set you must first find ________________________________________.

the 5-number summary: x-min, Q1 , median, Q3 , x-max

In a time-series graph, the horizontal axis is marked off in units of ______________________________?

time (minutes, hours, days, weeks, quarters, years, etc.)

What is the name given to the value that "splits the difference" between the upper class limit of one class and the lower class limit of the very next class?

upper class boundary or lower class boundary

What do we call the highest value that is possible to be put in any class of a frequency distribution?

upper class limit

The symbol for sample mean is __________________________________________________.

x bar

The values included in the 5-number summary of a data set are ________________________.

x-min, Q1 , median, Q3 , x-max

The formula for a - z score is _____________________________________________________.

z =( x − xbar) / s , for a sample, or z = (x − μ)/σ , for a population

The measure that describes how many standard deviations away from the mean a data value lies is called ________________________________.

z-score

The symbol for population mean is _______________________________________________.

μ

The symbol for population standard deviation is ____________________________________.

σ

Stratified sample of college students

• Conduct a survey using 10 randomly selected freshmen, 10 randomly selected sophomores, 10 randomly selected juniors and 10 randomly selected seniors. • Conduct a survey using 10 randomly selected students in Calculus 1, 10 randomly selected in Calculus II, and 10 randomly selected students in Calculus III.

3 reasons to use a sample of a population

• It is too expensive to survey the entire population. • It takes too much time to survey the entire population. • It is not convenient or possible to survey the entire population.

Convenience sample of teenage drivers

• Randomly selected student drivers in a high school parking lot. • Randomly selected student drivers at a community youth center.

Give an example of statistical significance versus practical significance.

• a diet program has a result of 1 oz weight loss per month and is statistically significant. However, it is not practically significant. • Researchers find that out of a simple random sample of 10,000 men and women, the IQ difference is statistically significant (mean average 100 and women average 101). This is practically meaningless due to the large sample size.

Examples of data at the nominal level

• models of cars in a student parking lot • species of birds seen at a feeding station • kinds of injuries suffered by skiers on a particular ski slope


Set pelajaran terkait

Chapter 5 Section 3: Cost Revenue and Profit Maximization

View Set

Chapter 23 - back of the Book Risk Management

View Set

AIMA 2nd Edition Chapter 2: Intelligent Agents

View Set

ECO: Test #2(Chapters 7, 8, and 9)

View Set

CHM116 Lab 6: Anions and Cations Pre-Lab Quiz

View Set

Chapter 6: Writing A Business Plan

View Set

Photosynthesis and Cellular Respiration Test

View Set