ISDS (Statistics) Exam 1
A small p-value indicates that the sample mean is relatively
'far' from the hypothesized mean and provides evidence for rejecting the null hypothesis.
Quantitative Variables
(also known as Numeric) - yields values that represent quantities (weight, salary, etc.)
Categorical Variables
(also known as Qualitative) - have values that can be placed into categories (Yes/ No)
position
(n+1)/2
One-tailed test
(sometimes called directional test) is a hypothesis test for which only one tail of the sampling distribution is used.
For a set of data, the mean is 40 and the standard deviation is 10. Suppose one of the observations has a value of 65. What is the standardized z-score for this observation?
+2.5
If a 95% confidence interval for the mean value of a store's customer accounts is computed as $850 +- 70. Then the null hypothesis of a two-tailed hypothesis test would be rejected if the value of u0 is less than ___ or greater than ___.
-780 and 920
Which of the following statements is NOT correct concerning the p-value and critical value approaches to hypothesis testing?
-Both approaches use the same decision rule
Place in order, from beginning to end, the steps to calculate the mean absolute deviation.
-Calculate the arithmetic mean for the data set. -Find the absolute difference between each value and the mean. -Sum the absolute difference. -Divide by the sample (or the population) size.
Which of the following is NOT a step we use when formulating the null and alternative hypotheses?
-Calculate the value of the sample statistic
A Type II error occurs when we
-Do not reject the null hypothesis when it is actually false
The null hypothesis for a two-sided test for a population mean would be denoted as
-H0: u=u0
We use hypothesis testing to
-Resolve conflicts between two competing opinions
Research suggests that depression significantly increases the risk of developing dementia later in life (BBC News, July 6, 2010). In a study involving 949 elderly persons, it was reported that 22% of those who had depression went on to develop dementia, compared to only 17% of those who did not have depression.
-The sample consists of 949 elderly people. -The population is all elderly people. The numbers 22% and 17% represents sample statistics
What about variance is MOST accurate?
-Variance is the average of the squared deviations from the mean.
Is it possible for the data set to have no MODE?
-Yes, if there are no observations that occur more than once.
For a given sample size n, a can only be reduced
-at the expense of increasing b
An important final conclusion to a statistical test is to
-clearly interpret the results in terms of the initial claim
In hypothesis testing, two correct decisions are possible
-do not reject the null hypothesis when it is true -reject the null hypothesis when it is false
Specify the competing hypotheses that would be used in order to determine whether the population mean is less than 250.
-h0 greater than or equal to 150 and HAu is less than 150
We can generally reduce both Type I and Type II errors simultaneously by
-increasing the sample size
The basic principle of hypothesis testing is to first assume that the ____ hypothesis is true and then determine is the sample data ___ this assumption.
-null -contradict
The two equivalent methods to solve a hypothesis test are the
-p-value approach -critical value approach
In hypothesis testing, if the sample data provide significant evidence that the null hypothesis is incorrect, then we
-reject the null hypothesis
The critical value approach specifies a region of values, called the ___. If the test statistic falls into this region, we reject the ____.
-rejection region -null hypothesis
Which of the following hypothesis tests may be performed?
-right-tailed, left-tailed, and two-tailed
Put the following steps in the p-value approach to hypothesis testing in the correct order.
-specify the null and alternative hypotheses -calculate the value of the test statistic and its p-value -state the conclusion and interpret results
The critical value of a hypothesis test is
-the value that separates the rejection region from the non-rejection region.
Standard deviation of z-scores
1
In order to be effective decision-makers, you must be able to assess the credibility of statistical results. In short, you should ask the following questions:
1. Who carried out the study and provided the results? 2. Do they have a vested interest in the outcome of the study?
The following image was taken from poll results of data collected in Lousiana by Ed Renwick (New Orleans), December 2002, and reported in a January 2003 issue of The Advocate. Which of the following statements is true?
67% is an example of a statistic
Discrete
A discrete distribution is one in which the data can only take on certain values, for example integers. A continuous distribution is one in which data can take on any value within a specified range (which may be infinite).
The coefficient of variation is BEST described as
A relative measure of dispersion
Left-skewed
Always: mean < mode Always: median < mode Most of the time: mean Very few observations on the left end, very many observations on the right end.
Right-skewed
Always: mode < mean Always: mode < median Most of the time: mode < median < mean Very few observations on the right end, very many observations on the left end.
The TWO types of variables are:
Categorical variables Quantitative variables
Which of the following variables are qualitative and which are quantitative? If the variable is quantitative, then specify whether the variable is discrete or continuous.
Colors of cars in.a mall parking lot. -Qualitative Time it takes each student to complete a final exam. -Quantitative; continuous The number of patrons who frequent a restaurant. -Qualitative; discrete
Types of Data
Cross-sectional Time Series Data
TWO branches of Statistics:
Descriptive & Inferential
Two types of quantitative variables:
Discrete Continuous
Categorical variables
Examples of categorical variables are race, sex, age group, and educational level.
Which summaries are used for qualitative data
For qualitative data, we can use frequency distribution and relative frequency. We now introduce frequency distribution, relative frequency and percent frequency. Frequency distribution: tabular summary of data indicating the number of data values in each of several nonoverlapping classes.
The null hypothesis
H0
The alternative hypothesis
H1
In the past, the average grade on the final examination in statistics is at least 85. A student taking the final thought that the final was hard and plans on taking a sample to test her belief that the average score has decreased. The correct set of hypotheses is
Ho: u is greater than or equal to 85 vs. Ha: u is less than 85
How small must the p-value be to justify a decision to reject the null and minimize the chance of making a type I error?
In general, the Rejection Rule using the p-value Approach is as follows: If p-value < α, then reject the null hypothesis If p-value ≥ α, then do not reject the null hypothesis
Definition of population
In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements.
Range
Largest Value - Smallest Value; influenced by outliers.
The average of the absolute differences between the values of the data set and the mean is the
Mean absolute deviation
Identify a situation of descriptive statistics
Measures of Frequency: * Count, Percent, Frequency. ... Measures of Central Tendency. * Mean, Median, and Mode. ... Measures of Dispersion or Variation. * Range, Variance, Standard Deviation. ... Measures of Position. * Percentile Ranks, Quartile Ranks.
Generally, the ____ is the best measure of central location when outliers are present.
Median
Which of the following variables are qualitative and which are quantitative? If the variable is quantitative, then specify whether the variable is discrete or continuous.
Points scored in a football game. -Quantitative; discrete Racial comparison of a high school classroom. -Qualitative Heights of 15-year-olds. -Quantitative; continuous
N
Population size
The notation o represents the
Population standard deviation
In a symmetric distribution,
RT skewed: mean and median are greater than the mode LT skewed: mean and median are less than the mode
The Need for Sampling
Reasons: (1) Studying the population is expensive. (2) It is impossible to examine every member of the population.
Statistics Avoidance and Misconception:
Statistics deals with outrageous formulae and tedious calculations that have no use in real life
What is Statistics?
Statistics is the branch of mathematics that uncovers patterns in data and transforms that into useful information for decision making.
Which summaries are used for numeric data
The mean and median are two numerical summaries to describe the center of a distribution for a quantitative variable; the range and interquartile range are two numerical summaries to describe dispersion in a distribution for a quantitative variable.
It came as a big surprise when Apple's touch screen iPhone 4, considered by many to be the best smartphone ever, was found to have a problem (The New York Times, June 24, 2010). Users complained of weak reception, and sometimes even dropped calls, when they cradled the phone in their hands in a particular way. A quick survey at a local store found that 2% of iPhone 4 users experienced this reception problem.
The population is all 4 iPhone users. 2% denotes the sample statistic
An accounting professor wants to know the average GPA of the students enrolled in her class. She looks up information on Blackboard about the students enrolled in her class and computes the average GPA as 3.29.
The population is all students enrolled in the accounting class. the value 3.29 represents the population parameter.
Calculate and describe the characteristics of the measures of center.
The two most widely used measures of the "center" of the data are the mean (average) and the median. To calculate the mean weight of 50 people, add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts.
The manager of a company notices that potential buyers on their website drop off during checkout. After consulting with the web master, the site is redesigned so that the 'Checkout' process is reduced to one, manageable web page. Q: Has the change been effective and how? (Note: this process uses data to monitor the web traffic, then uses data to assess any changes)
This data that is given, shows that the change was ineffective due to the major drop-off in numbers. It is important to determine if the change is necessary to make before you make it. And also how you make the change.
True or false: The mean is the most widely used measure of central location for quantitative data.
True
TWO branches of statistics
Two branches, descriptive statistics and inferential statistics, comprise the field of statistics
Quantitative variables
Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country's population, a person's shoe size, or a car's speed are all quantitative variables.
Suppose you select a sample size of 18 from a normal population where ơ is unknown. If a one-tailed test (upper tail) is performed and α = .05, then the critical value is
When is unknown, use a t- table, df = n-1 = 18-1 = 17, so look in the 17th row. The tail area, .05, indicates to look in the 0.05 column, to get CV=1.740... use positive value since it is a one tailed 'upper' tail test.
identify a situation of inferential statistics
With inferential statistics, you take data from samples and make generalizations about a population. For example, you might stand in a mall and ask a sample of 100 people if they like shopping at Sears.
When the following hypotheses are being tested at a level of significance of α = 0.05, the null hypothesis will be rejected if the test statistic Z
Z is less than -1.645
Type 1 error:
a If we reject the null hypothesis when in reality the null hypothesis is true
Symmetrical distribution
a distribution in which one half of the data are a mirror image of the other half
Skewed distribution
a distribution which is asymmetric
Pie Chart
a graphical display of data where slices of the pie, in degrees, are associated with the frequency or proportion of observations in that category.
z-Scores
a measure of relative location that describes how far an individual observation is from the mean: Z= (X- X bar)/s
Sample
a subset of the population selected for analysis
Parameter
a summary measure that describes a characteristic of an entire population. (µ or π) (number that can represent an entire group/number that describes a population)
Statistic
a summary measure that describes a characteristic of a sample. (number that describes a sample)
Frequency Table
a tabular summary of a data showing the frequency (or percent) of items in each of the distinct categories.Lists each unique categorical outcome, then shows the frequencies within each category.
Mean
average; value around which observations tend to cluster; balance point of histogram.
Cross-Sectional
data collected at one point in time
Time Series Data
data collected over several periods of time
True or false: In the critical value approach, if the value of the test statistics does NOT fall within the rejection region, then we reject the null hypothesis.
false
Bar Graph
graphical display of data where each category is depicted by a bar representing the frequency or proportion of observations in that category. (Note: bars do not touch)
We do NOT reject the null hypothesis when the p-value is
greater than a
Types of Variables
in order to address statistical questions, one must FIRST be able to identify types of variables
Viewership studies using a sample of television households indicate that the share of the audience for the Channel 9 News has increased by 12% since the new anchor was added to the 10:00 P.M. news team. This is an example of
inferential statistics
What is the alternative hypothesis?
is the opposite of the null hypothesis and corresponds to what the researcher wants to prove.
What is the null hypothesis?
is the initial statement about the population and ordinarily represents a commonly accepted state of affairs, a general position, or the status quo. The null hypothesis is tentatively believed to be true unless overwhelming refuted by data.
range
largest-smallest
If the value of the test statistic falls in the rejection region, then the p-value must be
less than a
Variance
measure of variability that utilizes all data values. A measure that reflects how observations vary or deviate from the mean.
Continuous
measurements can take on infinitely many values within an interval
The measure of the central location that can BEST be labeled as the midpoint of the data set is the ___
median
When summarizing a qualitative data set, the ___ is the best measurement of central location.
mode
If we reject the null hypothesis when it is actually false we have committed
no error
Interval scale
not only can we categorize and rank the data, we are also assured that the differences between scale values are meaningful. Thus, the arithmetic operations of addition and subtraction are meaningful.
When testing u, the p-value is the probability of obtaining a sample mean at least as extreme as the one derived from a given sample, assuming that the ___ hypothesis is true.
null
You are reviewing your portfolio and note the amount of interest earned for each of the stocks in which you have invested. The type of variable most appropriate to their measurements is
numeric, continuous
population standard deviation
o= square root of o^2
Population variance
o^2= (E(Xi-u)^2)/N
The arithmetic mean is usually NOT a good measure of central location if a ___ exists.
outlier
Both population and sample variances and standard deviations are influenced by
outliers
The notation u represents the
population mean
p-value testing
probability of getting your sample mean or some value further in the tail when in reality the null hypothesis is true.
Statistics
provides a means of using numbers to analyze the world we live in
Suppose after reviewing the data, it was determined that the age 89 was keyed incorrectly and that the age should have been 98. Once the data are corrected, which of the following values would change
range
Ordinal scale
reflects a stronger level of measurement. With ordinal data we are able to both categorize and rank the data with respect to some characteristic or trait. The weakness with ordinal data is that we cannot interpret the difference between the ranked values because the actual numbers used are arbitrary.
The probability of making a Type I error is the probability of
rejecting a true null hypothesis
Nominal scale
represents the least sophisticated level of measurement. If we are presented with nominal data, all we can do is categorize or group the data. The values in the data set differ merely by name or label.
Ratio scale
represents the strongest level of measurement. Ratio data have all the characteristics of interval data as well as a true zero point, which allows us to interpret the ratios of values. A ratio scale is used to measure many types of data in business analysis.
Discrete
result of counting
When testing u and o is known, H0 can never be rejected it is less than or equal to 0 for a
right-tailed test
If the population standard deviation is unknown, it can be estimated by using ____.
s
Sample standard deviation
s= square root of s^2
Sample variance
s^2= (E(Xi-x bar)^2)/ n-1
n
sample size
Standard deviation
square root of variance
A summary measure that is computed to describe a characteristic of a sample taken from a population is called
statistic (sample)
The Department of Transportation estimates that there is an average of 20 accidents per day. This is an example of
statistical inference
The process of using sample statistics to draw conclusions about the population parameters is called
statistical inference
Variable
the characteristic of an observation or individual
When constructing a frequency distribution, which of the following statements is true?
the classes do not overlap
Population
the entire set of observations for which conclusions are to be made. (Ex: all registered voters.)
Shape
the manner in which data are distributed
Median
the middle value of an ordered array
The hypothesis tentatively assumed to be true is
the null hypothesis
Mode
the observation number that occurs most often.
Consider a one-tailed upper tailed test. The p- value is the tail area above the sample mean. So as the sample mean gets further from the hypothesized mean, the area above the mean (which is the p-value) get smaller.
the smaller p-value, the stronger the sample evidence is to reject
Data
the values associated with a variable
Two widely used measures of dispersion are
the variance and the standard deviation
Descriptive Statistics
those methods involved in the collection, summarizing (numerically or graphically), presenting, and analyzing a set of data in order to describe the various features of that data set. (creating pie charts)
Inferential Statistics
those methods that use data from a smaller group (sample) to make conclusions/decisions about the characteristic of a larger group (population).
True or False: The optimal values of Type I and Type II errors require a compromise in balancing the costs of each type of error.
true
True or false: We choose a value for a before conducting a hypothesis test.
true
Population mean
u= (E Xi)/N
As data become more concentrated,
variance (and standard deviation) decreases.
As data spread out,
variance (and standard deviation) increases
Data where all values are the same have no variation;
variance = 0 and standard deviation = 0
When a mean is calculated and some observations are given greater importance than others, we refer to this measure of central location as a
weighted mean
Sample mean
x bar= (E Xi)/n
Mean of z-scores
zero
Both are variances and standard deviations are either
zero or positive (never negative)