ISDS (Statistics) Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A small p-value indicates that the sample mean is relatively

'far' from the hypothesized mean and provides evidence for rejecting the null hypothesis.

Quantitative Variables

(also known as Numeric) - yields values that represent quantities (weight, salary, etc.)

Categorical Variables

(also known as Qualitative) - have values that can be placed into categories (Yes/ No)

position

(n+1)/2

One-tailed test

(sometimes called directional test) is a hypothesis test for which only one tail of the sampling distribution is used.

For a set of data, the mean is 40 and the standard deviation is 10. Suppose one of the observations has a value of 65. What is the standardized z-score for this observation?

+2.5

If a 95% confidence interval for the mean value of a store's customer accounts is computed as $850 +- 70. Then the null hypothesis of a two-tailed hypothesis test would be rejected if the value of u0 is less than ___ or greater than ___.

-780 and 920

Which of the following statements is NOT correct concerning the p-value and critical value approaches to hypothesis testing?

-Both approaches use the same decision rule

Place in order, from beginning to end, the steps to calculate the mean absolute deviation.

-Calculate the arithmetic mean for the data set. -Find the absolute difference between each value and the mean. -Sum the absolute difference. -Divide by the sample (or the population) size.

Which of the following is NOT a step we use when formulating the null and alternative hypotheses?

-Calculate the value of the sample statistic

A Type II error occurs when we

-Do not reject the null hypothesis when it is actually false

The null hypothesis for a two-sided test for a population mean would be denoted as

-H0: u=u0

We use hypothesis testing to

-Resolve conflicts between two competing opinions

Research suggests that depression significantly increases the risk of developing dementia later in life (BBC News, July 6, 2010). In a study involving 949 elderly persons, it was reported that 22% of those who had depression went on to develop dementia, compared to only 17% of those who did not have depression.

-The sample consists of 949 elderly people. -The population is all elderly people. The numbers 22% and 17% represents sample statistics

What about variance is MOST accurate?

-Variance is the average of the squared deviations from the mean.

Is it possible for the data set to have no MODE?

-Yes, if there are no observations that occur more than once.

For a given sample size n, a can only be reduced

-at the expense of increasing b

An important final conclusion to a statistical test is to

-clearly interpret the results in terms of the initial claim

In hypothesis testing, two correct decisions are possible

-do not reject the null hypothesis when it is true -reject the null hypothesis when it is false

Specify the competing hypotheses that would be used in order to determine whether the population mean is less than 250.

-h0 greater than or equal to 150 and HAu is less than 150

We can generally reduce both Type I and Type II errors simultaneously by

-increasing the sample size

The basic principle of hypothesis testing is to first assume that the ____ hypothesis is true and then determine is the sample data ___ this assumption.

-null -contradict

The two equivalent methods to solve a hypothesis test are the

-p-value approach -critical value approach

In hypothesis testing, if the sample data provide significant evidence that the null hypothesis is incorrect, then we

-reject the null hypothesis

The critical value approach specifies a region of values, called the ___. If the test statistic falls into this region, we reject the ____.

-rejection region -null hypothesis

Which of the following hypothesis tests may be performed?

-right-tailed, left-tailed, and two-tailed

Put the following steps in the p-value approach to hypothesis testing in the correct order.

-specify the null and alternative hypotheses -calculate the value of the test statistic and its p-value -state the conclusion and interpret results

The critical value of a hypothesis test is

-the value that separates the rejection region from the non-rejection region.

Standard deviation of z-scores

1

In order to be effective decision-makers, you must be able to assess the credibility of statistical results. In short, you should ask the following questions:

1. Who carried out the study and provided the results? 2. Do they have a vested interest in the outcome of the study?

The following image was taken from poll results of data collected in Lousiana by Ed Renwick (New Orleans), December 2002, and reported in a January 2003 issue of The Advocate. Which of the following statements is true?

67% is an example of a statistic

Discrete

A discrete distribution is one in which the data can only take on certain values, for example integers. A continuous distribution is one in which data can take on any value within a specified range (which may be infinite).

The coefficient of variation is BEST described as

A relative measure of dispersion

Left-skewed

Always: mean < mode Always: median < mode Most of the time: mean Very few observations on the left end, very many observations on the right end.

Right-skewed

Always: mode < mean Always: mode < median Most of the time: mode < median < mean Very few observations on the right end, very many observations on the left end.

The TWO types of variables are:

Categorical variables Quantitative variables

Which of the following variables are qualitative and which are quantitative? If the variable is quantitative, then specify whether the variable is discrete or continuous.

Colors of cars in.a mall parking lot. -Qualitative Time it takes each student to complete a final exam. -Quantitative; continuous The number of patrons who frequent a restaurant. -Qualitative; discrete

Types of Data

Cross-sectional Time Series Data

TWO branches of Statistics:

Descriptive & Inferential

Two types of quantitative variables:

Discrete Continuous

Categorical variables

Examples of categorical variables are race, sex, age group, and educational level.

Which summaries are used for qualitative data

For qualitative data, we can use frequency distribution and relative frequency. We now introduce frequency distribution, relative frequency and percent frequency. Frequency distribution: tabular summary of data indicating the number of data values in each of several nonoverlapping classes.

The null hypothesis

H0

The alternative hypothesis

H1

In the past, the average grade on the final examination in statistics is at least 85. A student taking the final thought that the final was hard and plans on taking a sample to test her belief that the average score has decreased. The correct set of hypotheses is

Ho: u is greater than or equal to 85 vs. Ha: u is less than 85

How small must the p-value be to justify a decision to reject the null and minimize the chance of making a type I error?

In general, the Rejection Rule using the p-value Approach is as follows: If p-value < α, then reject the null hypothesis If p-value ≥ α, then do not reject the null hypothesis

Definition of population

In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements.

Range

Largest Value - Smallest Value; influenced by outliers.

The average of the absolute differences between the values of the data set and the mean is the

Mean absolute deviation

Identify a situation of descriptive statistics

Measures of Frequency: * Count, Percent, Frequency. ... Measures of Central Tendency. * Mean, Median, and Mode. ... Measures of Dispersion or Variation. * Range, Variance, Standard Deviation. ... Measures of Position. * Percentile Ranks, Quartile Ranks.

Generally, the ____ is the best measure of central location when outliers are present.

Median

Which of the following variables are qualitative and which are quantitative? If the variable is quantitative, then specify whether the variable is discrete or continuous.

Points scored in a football game. -Quantitative; discrete Racial comparison of a high school classroom. -Qualitative Heights of 15-year-olds. -Quantitative; continuous

N

Population size

The notation o represents the

Population standard deviation

In a symmetric distribution,

RT skewed: mean and median are greater than the mode LT skewed: mean and median are less than the mode

The Need for Sampling

Reasons: (1) Studying the population is expensive. (2) It is impossible to examine every member of the population.

Statistics Avoidance and Misconception:

Statistics deals with outrageous formulae and tedious calculations that have no use in real life

What is Statistics?

Statistics is the branch of mathematics that uncovers patterns in data and transforms that into useful information for decision making.

Which summaries are used for numeric data

The mean and median are two numerical summaries to describe the center of a distribution for a quantitative variable; the range and interquartile range are two numerical summaries to describe dispersion in a distribution for a quantitative variable.

It came as a big surprise when Apple's touch screen iPhone 4, considered by many to be the best smartphone ever, was found to have a problem (The New York Times, June 24, 2010). Users complained of weak reception, and sometimes even dropped calls, when they cradled the phone in their hands in a particular way. A quick survey at a local store found that 2% of iPhone 4 users experienced this reception problem.

The population is all 4 iPhone users. 2% denotes the sample statistic

An accounting professor wants to know the average GPA of the students enrolled in her class. She looks up information on Blackboard about the students enrolled in her class and computes the average GPA as 3.29.

The population is all students enrolled in the accounting class. the value 3.29 represents the population parameter.

Calculate and describe the characteristics of the measures of center.

The two most widely used measures of the "center" of the data are the mean (average) and the median. To calculate the mean weight of 50 people, add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts.

The manager of a company notices that potential buyers on their website drop off during checkout. After consulting with the web master, the site is redesigned so that the 'Checkout' process is reduced to one, manageable web page. Q: Has the change been effective and how? (Note: this process uses data to monitor the web traffic, then uses data to assess any changes)

This data that is given, shows that the change was ineffective due to the major drop-off in numbers. It is important to determine if the change is necessary to make before you make it. And also how you make the change.

True or false: The mean is the most widely used measure of central location for quantitative data.

True

TWO branches of statistics

Two branches, descriptive statistics and inferential statistics, comprise the field of statistics

Quantitative variables

Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country's population, a person's shoe size, or a car's speed are all quantitative variables.

Suppose you select a sample size of 18 from a normal population where ơ is unknown. If a one-tailed test (upper tail) is performed and α = .05, then the critical value is

When is unknown, use a t- table, df = n-1 = 18-1 = 17, so look in the 17th row. The tail area, .05, indicates to look in the 0.05 column, to get CV=1.740... use positive value since it is a one tailed 'upper' tail test.

identify a situation of inferential statistics

With inferential statistics, you take data from samples and make generalizations about a population. For example, you might stand in a mall and ask a sample of 100 people if they like shopping at Sears.

When the following hypotheses are being tested at a level of significance of α = 0.05, the null hypothesis will be rejected if the test statistic Z

Z is less than -1.645

Type 1 error:

a If we reject the null hypothesis when in reality the null hypothesis is true

Symmetrical distribution

a distribution in which one half of the data are a mirror image of the other half

Skewed distribution

a distribution which is asymmetric

Pie Chart

a graphical display of data where slices of the pie, in degrees, are associated with the frequency or proportion of observations in that category.

z-Scores

a measure of relative location that describes how far an individual observation is from the mean: Z= (X- X bar)/s

Sample

a subset of the population selected for analysis

Parameter

a summary measure that describes a characteristic of an entire population. (µ or π) (number that can represent an entire group/number that describes a population)

Statistic

a summary measure that describes a characteristic of a sample. (number that describes a sample)

Frequency Table

a tabular summary of a data showing the frequency (or percent) of items in each of the distinct categories.Lists each unique categorical outcome, then shows the frequencies within each category.

Mean

average; value around which observations tend to cluster; balance point of histogram.

Cross-Sectional

data collected at one point in time

Time Series Data

data collected over several periods of time

True or false: In the critical value approach, if the value of the test statistics does NOT fall within the rejection region, then we reject the null hypothesis.

false

Bar Graph

graphical display of data where each category is depicted by a bar representing the frequency or proportion of observations in that category. (Note: bars do not touch)

We do NOT reject the null hypothesis when the p-value is

greater than a

Types of Variables

in order to address statistical questions, one must FIRST be able to identify types of variables

Viewership studies using a sample of television households indicate that the share of the audience for the Channel 9 News has increased by 12% since the new anchor was added to the 10:00 P.M. news team. This is an example of

inferential statistics

What is the alternative hypothesis?

is the opposite of the null hypothesis and corresponds to what the researcher wants to prove.

What is the null hypothesis?

is the initial statement about the population and ordinarily represents a commonly accepted state of affairs, a general position, or the status quo. The null hypothesis is tentatively believed to be true unless overwhelming refuted by data.

range

largest-smallest

If the value of the test statistic falls in the rejection region, then the p-value must be

less than a

Variance

measure of variability that utilizes all data values. A measure that reflects how observations vary or deviate from the mean.

Continuous

measurements can take on infinitely many values within an interval

The measure of the central location that can BEST be labeled as the midpoint of the data set is the ___

median

When summarizing a qualitative data set, the ___ is the best measurement of central location.

mode

If we reject the null hypothesis when it is actually false we have committed

no error

Interval scale

not only can we categorize and rank the data, we are also assured that the differences between scale values are meaningful. Thus, the arithmetic operations of addition and subtraction are meaningful.

When testing u, the p-value is the probability of obtaining a sample mean at least as extreme as the one derived from a given sample, assuming that the ___ hypothesis is true.

null

You are reviewing your portfolio and note the amount of interest earned for each of the stocks in which you have invested. The type of variable most appropriate to their measurements is

numeric, continuous

population standard deviation

o= square root of o^2

Population variance

o^2= (E(Xi-u)^2)/N

The arithmetic mean is usually NOT a good measure of central location if a ___ exists.

outlier

Both population and sample variances and standard deviations are influenced by

outliers

The notation u represents the

population mean

p-value testing

probability of getting your sample mean or some value further in the tail when in reality the null hypothesis is true.

Statistics

provides a means of using numbers to analyze the world we live in

Suppose after reviewing the data, it was determined that the age 89 was keyed incorrectly and that the age should have been 98. Once the data are corrected, which of the following values would change

range

Ordinal scale

reflects a stronger level of measurement. With ordinal data we are able to both categorize and rank the data with respect to some characteristic or trait. The weakness with ordinal data is that we cannot interpret the difference between the ranked values because the actual numbers used are arbitrary.

The probability of making a Type I error is the probability of

rejecting a true null hypothesis

Nominal scale

represents the least sophisticated level of measurement. If we are presented with nominal data, all we can do is categorize or group the data. The values in the data set differ merely by name or label.

Ratio scale

represents the strongest level of measurement. Ratio data have all the characteristics of interval data as well as a true zero point, which allows us to interpret the ratios of values. A ratio scale is used to measure many types of data in business analysis.

Discrete

result of counting

When testing u and o is known, H0 can never be rejected it is less than or equal to 0 for a

right-tailed test

If the population standard deviation is unknown, it can be estimated by using ____.

s

Sample standard deviation

s= square root of s^2

Sample variance

s^2= (E(Xi-x bar)^2)/ n-1

n

sample size

Standard deviation

square root of variance

A summary measure that is computed to describe a characteristic of a sample taken from a population is called

statistic (sample)

The Department of Transportation estimates that there is an average of 20 accidents per day. This is an example of

statistical inference

The process of using sample statistics to draw conclusions about the population parameters is called

statistical inference

Variable

the characteristic of an observation or individual

When constructing a frequency distribution, which of the following statements is true?

the classes do not overlap

Population

the entire set of observations for which conclusions are to be made. (Ex: all registered voters.)

Shape

the manner in which data are distributed

Median

the middle value of an ordered array

The hypothesis tentatively assumed to be true is

the null hypothesis

Mode

the observation number that occurs most often.

Consider a one-tailed upper tailed test. The p- value is the tail area above the sample mean. So as the sample mean gets further from the hypothesized mean, the area above the mean (which is the p-value) get smaller.

the smaller p-value, the stronger the sample evidence is to reject

Data

the values associated with a variable

Two widely used measures of dispersion are

the variance and the standard deviation

Descriptive Statistics

those methods involved in the collection, summarizing (numerically or graphically), presenting, and analyzing a set of data in order to describe the various features of that data set. (creating pie charts)

Inferential Statistics

those methods that use data from a smaller group (sample) to make conclusions/decisions about the characteristic of a larger group (population).

True or False: The optimal values of Type I and Type II errors require a compromise in balancing the costs of each type of error.

true

True or false: We choose a value for a before conducting a hypothesis test.

true

Population mean

u= (E Xi)/N

As data become more concentrated,

variance (and standard deviation) decreases.

As data spread out,

variance (and standard deviation) increases

Data where all values are the same have no variation;

variance = 0 and standard deviation = 0

When a mean is calculated and some observations are given greater importance than others, we refer to this measure of central location as a

weighted mean

Sample mean

x bar= (E Xi)/n

Mean of z-scores

zero

Both are variances and standard deviations are either

zero or positive (never negative)


Ensembles d'études connexes

Unit 3: Economic Factors and Business Information.

View Set

Money and Prices in the Long Run- CH 11 The Monetary System

View Set

Section D4 " Single Phase AC Circuits "

View Set

Chapter 26 - Continuing Education

View Set

Adaptive Quizzing for Anti psychotic Agents

View Set

Chapter 2 - Values and Attributes

View Set