DSC 210 Final Exam
In the textile industry, a manufacturer is interested in the number of blemishes or flaws occurring in each 100 feet of material. The probability distribution that has the greatest chance of applying to this situation is the _____.
Poisson distribution
If one wanted to find the probability of 10 customer arrivals in an hour at a service station, one would generally use the _____.
Poisson probability distribution
It is desired to take a random sample of individuals who are frequent fliers on Delta Airlines. Frequent fliers are classified based upon their frequent flier status (50% silver level, 30% blue level, 20% red level) and a simple random sample reflecting these percentages is taken from each of these groups. Which type of sampling method has been employed?
Stratified Random Sampling
T/F: The level of significance α (alpha) is the probability that the confidence interval does not contain the value of the parameter being estimated.
True
T/F: The median is used in creating a boxplot.
True
T/F: The number of defective items in a shipment is a discrete random variable.
True
T/F: The sample mean provides a point estimate for the population mean.
True
T/F: The symbol for the sample variance is s^2
True
Find z-score/value of x corresponding to a percentile: excel
=NORM.INV(percentile, mean, stdev) = mean = 0 = stdev. = 1
Percentiles: excel
=PERCENTILE.EXC(data set, percentile as decimal)
Standard Deviation: excel
=STDEV.S(number 1, [number 2], ...)
Sample variance: excel
=VAR.S()
Events A and B are mutually exclusive. Which of the following statements is also true?
P(A ∪ B) = P(A) + P(B)
Which of the following is a point estimator? A. σ B. t C. s D. p
C. s
T/F: A histogram is appropriate to be used for categorical data.
False
T/F: A representative sample means that you select every member of the population to be used in your sample.
False
T/F: Every confidence interval constructed will contain the value of the parameter being estimated.
False
T/F: Highway patrol officers measure the speed of automobiles on a highway using radar equipment. The random variable in this experiment is speed, measured in miles per hour. This random variable is a discrete random variable.
False
T/F: In combinations, the order of selection is important.
False
T/F: Numerical calculations are always appropriate for nominal data.
False
T/F: The Central Limit Theorem can only be used when the population being sampled is normally distributed.
False
T/F: The RAND() function in Excel helps in calculating the margin of error for a confidence interval.
False
T/F: The binomial probability distribution is appropriate to use for "without replacement" type problems.
False
T/F: The correlation coefficient can never be negative.
False
T/F: The exponential distribution is a discrete probability distribution.
False
T/F: The exponential probability distribution is a symmetric distribution.
False
T/F: The mean is a better measure of center than the median if there is an outlier in the data set.
False
T/F: The mean of a standard normal random variable is one.
False
T/F: The standard deviation is used in creating a boxplot.
False
The weight of items produced by a machine is normally distributed with a mean of 8 ounces and a standard deviation of 2 ounces. What is the probability that a randomly selected item weighs exactly 8 ounces?
0
The assembly time for a product is uniformly distributed between 6 and 10 minutes. The probability of assembling the product in less than 8 minutes is ___________.
0.50
When the population has a normal distribution, the sampling distribution of is normally distributed _____.
for any sample size
The expected value of a random variable is the _____.
mean value
After the data have been arranged from smallest value to largest value, the value in the middle is called the _____.
median
Which of the following is not a graph that is used for quantitative data?
pie chart
Excel's __________ can be used to construct a frequency distribution for categorical data.
pivot table tool
A single numerical value used as an estimate of a population parameter is known as a _____.
point estimate
The general form of an interval estimate of a population mean is the _____ plus or minus the _____.
point estimate, margin of error
The key difference between binomial and hypergeometric distributions is that with the hypergeometric distribution the _____.
probability of success changes from trial to trial
The interquartile range is used as a measure of variability to overcome what difficulty of the range?
the range is influence too much by extreme values
Quarterly sales data is an example of what type of data?
time-series
Which of the following descriptive statistics is NOT measured in the same units as the data?
variance
Which of the following is NOT a probability-based sampling method?
volunteer sampling
If we change a 95% confidence interval estimate to a 99% confidence interval estimate, we can expect the _____.
width of the confidence interval to increase
Sample mean
x-bar
The _____ denotes the number of standard deviations a data value is away from the mean of the data set.
z-score
In a post office, the mailboxes are numbered from 1 to 5,000. These numbers represent ______.
categorical data
Poisson Probability Distribution: excel
= POISSON.DIST(x, mu, T/F)
Standard Error of x-bar: excel
= stdev/SQRT(random sample n)
Binomial Distribution: excel
=BINOM.DIST(x, n, p, TRUE or FALSE)
Sampling distribution of p-bar: excel
=NORM.DIST()
T/F: The Poisson random variable is an example of a discrete random variable.
True
T/F: The area under the curve for a normal distribution must always equal 1.
True
positively skewed distribution
- skewed to the left
negatively skewed distribution
- skewed to the right
elements
- the entities on which data are collected - ex: persons, mutual funds, companies, products
observational study
- the researcher has no control over the variables and merely records the data - ex: surveys
population
- the set of all elements of interest in a particular study
Sampling Distribution of the sample proportion (p-bar)
- (0<= p-bar <=1) - mean: the mean of all possible p-bar values equals the population proportion p) - stdev: sigma of p-bar = square root of p(1-p)/n (a.k.a standard error)
Process for computation of Binomial Probability Distributions
- 1) define x - 2) specify the probability distribution of x --> ex: x~Binomial(n = _____, p = ______) - 3) write the probability statement in terms of values --> ex: P(x<= #)
Conditions for using the normal distribution of p-bar
- 1) n*p >= 5 - 2) n(1-p) >= 5
Normal Probability Distribution: probability that x is greater than a number
- = 1 - = NORM.DIST(x-value, mu, sigma, TRUE)
Binomial distribution - "at least"
- =1-BINOM.DIST(x,n,p,T/F) - probability statement should show one less than the "at least" number
What value of z puts # probability? (use excel)
- =NORM.S.INV(probability to the left of the z-value) - returns the z-value with given probability to its left - interpret: P(z < #)
Excel: TRUE vs FALSE?
- FALSE when P(x=x) - TRUE when P(x<=x)
situations in which ethical issues in statistics may arise
- use of inappropriate statistics to summarize data - biased interpretation of the statistical results - use of misleading graphs - poor sampling methods (not a representative sample or fishing for supportive data)
variables
- a characteristic or property of an individual element
scatter diagram
- a graph that shows the degree and direction of relationship between two variables
class frequency
- a raw count of the how many fell into a category
representative sample
- a sample whos members display characteristics of the target population - analyzing this data helps draw conclusions (make inferences) about the population
sample
- a subset selected from the population
crosstabulation
- a tabular summary of data for two variables. - the classes for one variable are represented by the rows; the classes for the other variable are represented by the columns
observations
- a value of something of interest you're measuring or counting during a study or experiment
Expected value of a Discrete random variable
- average of the random variable - E(x) = sum of x*f(x)
bar chart
- categorical data graph
pie chart
- categorical data graph
experiment
- certain variables are controlled by the researcher so that data can be obtained about how they influence the variable of interest - ex: a study to test the effectiveness of a new medication vs a placebo
class percentage
- compute: class relative frequency x 100
interval measurement
- data can be ordered and there is a fixed distance between values - ex: temperature, IQ, grade marking
ratio measurement
- data can be ordered with a fixed distance between values and there is a meaningful zero point which allows ratios to be useful - ex: cost of purchasing a share of stock, height
cross-sectional data
- data collected at the same or approximately the same point in time - like a snapshot in time
time-series data
- data collected over several time periods (daily, weekly, monthly, yearly) - seen in many businesses when trying to predict future values based on trends observed over time
symmetric distribution
- data that is evenly distributed between the left and the right side
nominal measurement
- data values are labels or categories with no logical ordering of values - ex: social security number, eye color, gender
ordinal measurement
- data values can be arranged, but difference between values cannot be computed mathematically - ex: survey responses on a scale of very poor, poor, average, good, very good
Discrete Uniform Probability Distribution: probability density function
- f(x) = 1/n, where n = total number of sample points
When is a sampling distribution of x-bar considered a normal distribution?
- if n>30, then the Central Limit Theory (CLT) applies to tell us that x-BAR ~ N
qualitative (categorical) variables
- information that can be classified into different categories based on a nonnumeric characteristic - scales of measurement: nominal and ordinal
quantitative variables
- information with numerical values that indicate how much or how many - scales of measurement: interval and ratio
Sampling Distribution of x-bar
- mean: the mean of the x-bar values equals the mean of the population x values - stdev: sigma of x-bar = sigma/n (a.k.a. standard error, or SE, of the mean)
inferential statistics
- methods used to draw conclusions about the population based upon the sample info - conclusions include estimates or predictions about the population
If x ~ Binomial (n = ___, p = ____), then...
- mu = E(x) = n*p - sigma^2 = var(x) = n*p(1-p)
descriptive statistics
- numerical and graphical summaries of the data which help show any patterns in a set of data - describe the data to show any patterns in the data set - ex: a histogram
class relative frequency
- proportion of the total for a class - compute: class frequency/total
dot plot
- quantitative data graph
histogram
- quantitative data graph
cumulative percentage
- records the percentage of cases at or below any given value of the variable
how can graphs be misleading?
- scale not starting at zero - scale made very small to make graph look bigger - scale values/labels missing from graph - incorrect scale placed on graph - pieces of a Pie Chart are not the correct sizes - oversized volumes of objects that are too big for the vertical scale differences they represent - size of images used in Pictographs being different for the different categories being graphed - graph being a non-standard size or shape
stem-and-leaf display
- shows quantitative data values in a way that sketches the distribution of the data
Ex: A research organization surveyed 250 women between the ages of 35 and 50 who work in the state of Ohio and asked them about the amount of time they spend commuting to their jobs each week. The average amount of time these women spent commuting to work each week was 190 minutes. Identify the sample that was taken.
250 women in the state of Ohio who are between ages 35 and 50
Normal Probability Distribution: when x is given (excel)
= NORM.DIST(x-value, mu, sigma, TRUE)
Normal Probability Distribution: when the z-value is known (excel)
= NORM.DIST(z-value, TRUE)
The use of the normal probability distribution as an approximation of the sampling distribution of is based on the condition that both np and n (1 - p) equal or exceed _____.
5
Hypergeometric Probability Distribution: excel
= HYPGEOM.DIST(x, n, r, N, T/F), where - x = number of successes - n = number of trials - N = population size - r = total number of successes in the population
Normal Probability Distribution: probability that x is between two values
= NORM.DIST(larger x-value, mu, sigma, TRUE) - = NORM.DIST(smaller x-value, mu, sigma, TRUE)
Sampling Distributions of x-bar: excel
= NORM.DIST(x, mean, stdev/SQRT(n), T/F)
The margin of error in an interval estimate of the population mean is a function of all of the following EXCEPT _____. A. sample mean B. level of significance C. variability of the population D. sample size
A. sample mean
All of the following are true about the standard error of the mean EXCEPT _____. A. it measures the variability in sample means B. it is larger than the standard deviation of the population C. it decreases as the sample size increases D. its value is influenced by the standard deviation of the population
B. it is larger than the standard deviation of the population
Posterior probabilities are computed using _____
Bayes' Theorem
Which of the following is NOT a characteristic of an experiment where the binomial probability distribution is applicable? A. The experiment has a sequence of n identical trials. B. Exactly two outcomes are possible on each trial. C. The trials are dependent. D. The probabilities of the outcomes do not change from one trial to another.
C. The trials are dependent
All of the following are examples of observational studies except: A. an online survey to record your satisfaction with a company's service. B. the number of cars running a stop sign in a residential area during rush hour. C. the behavior of Walmart shoppers after they are given a $20 gift card from the store. D. a Gallup poll measuring the approval rating of the president.
C. the behavior of Walmart shoppers after they are given a $20 gift card from the store.
The fact that the sampling distribution of the sample mean can be approximated by a normal probability distribution whenever the sample size is large is based on the _____.
Central Limit Theorem
T/F: Taking repeated samples until you obtain the desired result is not an ethically acceptable statistical practice.
True
T/F: The Empirical Rule can only be applied if the data distribution is bell-shaped.
True
T/F: A pie chart is appropriate for categorical data
True
T/F: Cluster sampling is a probability-based sampling method.
True
T/F: If the random variable X = number of occurrences in a certain interval of time has a Poisson distribution, then the random variable Y = time between occurrences has an exponential distribution.
True
T/F: If two events are mutually exclusive, this means they cannot happen at the same time.
True
T/F: In a crosstabulation, the two variables can be either categorical or quantitative.
True
T/F: Statistical studies in which researchers do not control variables of interest are not of any value.
True
Ex: A research organization surveyed 250 women between the ages of 35 and 50 who work in the state of Ohio and asked them about the amount of time they spend commuting to their jobs each week. The average amount of time these women spent commuting to work each week was 190 minutes. Identify the variable of interest in this problem and indicate if it is categorical or quantitative.
Variable = how many minutes each week a woman commutes to work. The variable is quantitative.
Ex: A research organization surveyed 250 women between the ages of 35 and 50 who work in the state of Ohio and asked them about the amount of time they spend commuting to their jobs each week. The average amount of time these women spent commuting to work each week was 190 minutes. What is the population in this problem?
all women who work in the state of Ohio and are between ages 35 and 50
A graphical summary of data that is based on a five-number summary is a _____.
boxplot
A graphical device for depicting categorical data that have been summarized in a frequency distribution, relative frequency distribution, or percent frequency distribution is a(n) _____.
bar chart
The t distribution is a family of similar probability distributions, with each individual distribution depending on a parameter known as the _____.
degrees of freedom
The summaries of data, which may be tabular, graphical, or numerical, are referred to as:
descriptive statistics
The height and weight are recorded by the school nurse for every student in a school. What type of graph would best display the relationship between height and weight?
scatter diagram
Which of the following symbols represents the standard deviation of a population?
sigma
An important numerical measure related to the shape of a distribution is the _____.
skewness
The standard deviation of all possible values is called the _____.
standard error of the mean
The process of analyzing sample data to draw conclusions about the characteristics of a population is called _____.
statistical inference
Whenever the population standard deviation is unknown, which distribution is used in developing an interval estimate for a population mean?
t distribution
A dot plot can be used to display _____
the distribution of one quantitative variable
Which of the following would likely display a negative relationship when creating a scatter diagram?
the number of classes a student misses during a semester and the grade obtained in the course
The probability that Pete is late to work on a given day is .2. Pete has 5 days of work next week . The random variable in this problem is __________________________________________.
the number of days out of 5 that Pete is late to work