BMAL-590 Quantitative Research

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What Is Statistics?

"Statistics is a way to get information from data." Statistics is a tool for creating new understanding from a set of numbers. You need data and information

Histogram

(or bar graph) can show if the data is evenly distributed across the range of values, if it falls symmetrically from a center peak (normal distribution), if there is a peak but the more of the data falls on one side of the peak than the other (a skewed distribution), or if there are two or more peaks in the data (bi- or multi-modal).

Critical Concepts in Hypothesis Testing: Concept 4

There are two possible decisions that can be made: Conclude that there is enough evidence to support the alternative hypothesis (also stated as: fing the null hypothesis in favor of the alternative). Conclude that there is not enough evidence to support the alternative hypothesis (also stated as: not rejecting the null hypothesis in favor of the alternative). NOTE: We do not say that we accept the null hypothesis. Once the null and alternative hypotheses are stated, the next step is to randomly sample the population and calculate a test statistic (in this example, the sample mean). If the test statistic's value is inconsistent with the null hypothesis, we reject the null hypothesis and infer that the alternative hypothesis is true.

Probability Trees

Potential events are represented in a diagram with a branch for each possible outcome of the events. The probability of each outcome is indicated on the appropriate branch, and these values can be used to calculate the overall impact of risk occurrence in a project. At the ends of the "branches", we calculate joint probabilities as the product of the individual probabilities on the preceding branches. The advantage of a probability tree on this type of problem is that it restrains its users from making the wrong calculation. Once the tree is drawn and the probabilities of the branches inserted, virtually the only allowable calculation is the multiplication of the probabilities of linked branches.

A population of all college applicants exists who have taken the SAT exam in the United States in the last year. A parameter of the population are

SAT scores

Sampling Distribution of the Mean

Sampling distributions describe the distributions of sample statistics. There are two ways to create a sampling distribution. The first is to actually draw samples of the same size from a population, calculate the statistic of interest, and then use descriptive techniques to learn more about the sampling distribution. The second method relies on the rules of probablility and the laws of expected value and variance to derive the sampling distribution.

Which of the following types of samples is almost always biased?

Self-selected samples

What can we infer about a Population's Parameters based on a Sample's Statistics?

Since statistical inference involves using statistics to make inferences about parameters, we can make an estimate, prediction, or decision about a population based on sample data. We can apply what we know about a sample to the larger population from which it was drawn!

Which of the following statements involve descriptive statistics as opposed to inferential statistics

The Alcohol, Tobacco and Firearms Department reported that Houston had 1,791 registered gun dealers in 1997.

Critical Concepts in Hypothesis Testing: Concept 3

The goal of the process is to determine whether there is enough evidence to infer that the alternative hypothesis is true.

There are three approaches to assign probability to outcomes. Each of these approaches must follow the two rules governing probabilities:

The probability of any outcome must lie between 0 and 1. That is:0 ≤ 𝑃(𝑂𝑖 )≤ 1 The sum of the probabilities of all the outcomes in a sample space must be 1. That is: Σ𝑘𝑖= 𝑃(𝑂𝑖) =1

Interpreting Probability

The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. Probability is a long-term relative frequency.

Which of the following statements is true regarding the design of a good survey

The questions should be kept as short as possible

direct observation

The simplest method of obtaining data there are many drawbacks to direct observation. One of the most critical limitations of this data collection method is that it is difficult to produce useful information in a meaningful way.

Critical Concepts in Hypothesis Testing: Concept 2

The testing procedure begins with the assumption that the null hypothesis is true.

Central Limit Theorem

The theory that, as sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution.

union of two events

The union of events A and B is the event containing all sample points that are in A or B or both

(part 2) Which of the following statements is true regarding the design of a good survey?

a. The questions should be kept as short as possible. b. A mixture of dichotomous, multiple-choice, and open-ended questions may be used. c. Leading questions must be avoided. d. All of these choices are true.

exhaustive

all possible outcomes must be included Additionally, the outcomes must be mutually exclusive, which means that no two outcomes can occur at the same time. A list of exhaustive and mutually exclusive outcomes is called a sample space and is denoted by S. The outcomes are denoted by 𝑂1, 𝑂2,....,𝑂𝑘 Using set notation we represent the sample space and its outcomes as: 𝑆={𝑂1,𝑂2,....,𝑂𝑘}

Self-selected samples

almost always biased, because the individuals who participate in them are more keenly interested in the issue than are the other members of the population. As a result, the conclusions drawn from such surveys are frequently wrong.

A random experiment

an action or process that leads to one of several possible outcomes

relative frequency approach

an objective way of determining probabilities based on observing frequencies over a number of trials

range

calculated by subtracting the smallest number from the largest.

Each of the following are characteristics of the sampling distribution of the mean except

if the original population is not normally distributed, the sampling distribution of the mean will also be approximately normal for large sample sizes

The classical approach describes a probability

in terms of the proportion of times that an event can be theoretically expected to occur

personal interview

involves an interviewer soliciting information from a respondent by asking prepared questions. A personal interview has the advantage of having a higher expected response rate than other methods of data collection. In addition, there will probably be fewer incorrect responses resulting from respondents misunderstanding some questions, because the interviewer can clarify misunderstandings when asked. But, the interviewer must also be careful not to say too much, for fear of biasing the response. The main disadvantage of personal interviews is that they are expensive, especially when travel is involved.

hypothesis testing

make and test an educated guess about a problem/solution

average

mean

mutually exclusive

means that each member of the population must be assigned to exactly one stratum. After the population has been stratified in this way, we can use simple random sampling to generate the complete sample

significance level

measures how frequently the conclusion will be wrong in the long run. A 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time.

non-sampling error

more serious than sampling error because taking a larger sample won't diminish the size, or the possibility of occurrence, of this error. Even a census can (and probably will) contain non-sampling errors. Non-sampling errors result from mistakes that are made in the acquisition of data. Non-sampling errors also result from the sample observations being selected improperly.

stratified random sample

obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum Examples of criteria for separating a population into strata (and of the strata themselves) include the following: Gender: male or female Age: under 20, 20-30, 31-40, 41-50, 51-60, over 60 Occupation: professional, clerical, blue-collar, other Household income: under $25,000; $25,000-$49,999; $50,000-$74,999; over $75,000 We can acquire information about the total population, make inferences within a stratum (gender), or make comparisons across strata (gender and age). Figure 2 outlines gender within a stratum. It highlights gender and age as an example comparison that can occur across data.

Selection bias

occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample.

Type I Error

occurs when we reject a true null hypothesis. denoted by 𝛼, which is also called the significance level

descriptive statistics

one of two branches of statistics which focuses on methods of organizing, summarizing, and presenting data in a convenient and informative way. One form of descriptive statistics uses graphical techniques which allow statistics practitioners to present data in ways that make it easy for the reader to extract useful information. Another form of descriptive statistics uses numerical techniques to summarize data. Rather than providing the raw data, the professor may only share summary data with the student.

Statistical inference problems involve three key concepts:

population, the sample, and the statistical inference.

Bayes's Law is used to compute

posterior probabilities

sampling error

refers to differences between the sample and the population that exists only because of the observations that happened to be selected for the sample. Sampling error is an error that we expect to occur when we make a statement about a population that is based only on the observations contained in a sample taken from the population.

Non-response error

refers to error (or bias) introduced when responses are not obtained from some members of the sample. When this happens, the sample observations that are collected may not be representative of the target population, resulting in biased results.

Two methods exist to create a sampling distribution. One involves using parallel samples from a population and the other is to use the

rules of probability

A company has developed a new smartphone whose average lifetime is unknown. In order to estimate this average, 200 smartphones are randomly selected from a large production line and tested; their average lifetime is found to be 5 years. The 200 smartphones represent a

sample

survey

solicits information from people concerning such things as their income, family size, and opinions on various issues. The majority of surveys are conducted for private use.

Standard deviation

the square root of the variance and gets the variability measure back to the same units as the data. Standard deviation has many useful properties when the data is normally distributed

standard error of the proportion

the standard deviation of sample proportions, which measures the average variation around the mean of the sample proportions

When rolling two dice, what is the total number of possible outcomes?

there are 36 different outcomes but only 11 possible totals

independent

two events are independent if the probability of one event is not affected by the occurrence of the other event

classical approach

used by mathematicians to help determine probability associated with games of chance. If an experiment has n possible outcomes, this method would assign a probability of 1/n to each outcome.

Multiplication Rule

used to calculate the joint probability of two events. It is based on the formula for conditional probability defined earlier. That is, from the following formula: 𝑃(𝐴|𝐵)= 𝑃(𝐴∩𝐵) / 𝑃(𝐵)P(A|B)

Conditional Probability

used to determine how two events are related; that is, we can determine the probability of one event given the occurrence of another related event. Conditional probabilities are written as 𝑃(𝐴|𝐵)P(A|B) and read as " the probability of A given B" and is calculated as 𝑃(𝐴|𝐵) = 𝑃(𝐴∩𝐵)/𝑃(𝐵)

telephone interview

usually less expensive, but it is also less personal and has a lower expected response rate. Unless the issue is of interest, many people will refuse to respond to telephone surveys. This problem is exacerbated by telemarketers trying to sell services or products.

self-administered questionnaire

usually mailed to a sample of people. This is an inexpensive method of conducting a survey and is, therefore, attractive when the number of people to be surveyed is large. But self-administered questionnaires usually have a low response rate and may have a relatively high number of incorrect responses due to respondents misunderstanding some questions.

The null hypothesis 𝐻0:

will always state that the parameter equals the value specified in the alternative hypothesis 𝐻1H1.

Bayes' Law

𝑃(𝐴│𝐵) = 𝑃(𝐴∩𝐵)/𝑃(𝐵) The probabilities 𝑃(𝐴) and 𝑃(𝐴𝐶) are called prior probabilities because they are determined prior to the decision about taking the preparatory course. The conditional probability 𝑃(𝐴|𝐵)P(A|B) is called a posterior probability (or revised probability), because the prior probability is revised after the decision about taking the preparatory course.

Questionnaire Design

1. Keep the questionnaire as short as possible. Most people are unwilling to spend much time filling out a questionnaire. 2. Ask short, simple, and clearly worded questions to enable respondents to answer quickly, correctly, and without ambiguity. 3. Start with demographic questions to help respondents get started and become comfortable quickly. 4. Use dichotomous (yes/no) and multiple choice questions because of their simplicity. 5. Use open-ended questions cautiously because they are time consuming and more difficult to tabulate and analyze. 6. Avoid using leading questions that tend to lead the respondent to a particular answer. 7. Trial a questionnaire on a small number of people to uncover potential problems, such as ambiguous wording. 8. Think about the way you intend to use the collected data when preparing the questionnaire. First determine whether you are soliciting values for an quantitative variable or a categorical variable. Then consider which type of statistical techniques, descriptive or inferential, you intend to apply to the data to be collected, and note the requirements of the specific techniques to be used.

Critical Concepts in Hypothesis Testing

1. There are two hypotheses. One is called the null hypothesis and the other the alternative or research hypothesis. 2. The testing procedure begins with the assumption that the null hypothesis is true. 3. The goal of the process is to determine whether there is enough evidence to infer that the alternative hypothesis is true. 4. There are two possible decisions:Conclude that there is enough evidence to support the alternative hypothesis.Conclude that there is not enough evidence to support the alternative hypothesis. 5. Two possible errors can be made in any test. A Type I error occurs when we reject a true null hypothesis and a Type II error occurs when we don't reject a false null hypothesis. The probabilities of Type I and Type II errors are 𝑃(Type I error)=𝛼 𝑃(Type II error)=𝛽

Suppose you are given 3 numbers that relate to the number of people in a university student sample. The three numbers are 10, 20, and 30. If the standard deviation is 10, the standard error equals

5.77

You are tasked with finding the sample standard deviation. You are given 4 numbers. The numbers are 5, 10, 15, and 20. The sample standard deviation equals

6.455

Types of Errors

A Type I error occurs when we reject a true null hypothesis (Reject 𝐻0 when it is TRUE). A Type II error occurs when we don't reject a false null hypothesis (Do NOT reject 𝐻0 when it is FALSE).

experiments

A more expensive but better way to produce data is through. Data produced in this manner are called experimental.

Identifying the Correct Method

Although it is difficult to offer strict rules on which probability method to use, nevertheless we can provide some general guidelines. The key issue is whether joint probabilities are provided or are required. Where the joint probabilities were given, we can compute marginal probabilities by adding across rows and down columns. We can use the joint and marginal probabilities to compute conditional probabilities, for which a formula is available. This allows us to determine whether the events described by the table are independent or dependent. We can also apply the addition rule to compute the probability that either of two events occurs. The first step in assigning probability is to create an exhaustive and mutually exclusive list of outcomes. The second step is to use the classical, relative frequency, or subjective approach and assign probability to the outcomes. There are a variety of methods available to compute the probability of other events. These methods include probability rules and trees. An important application of these rules is Bayes' Law, which allows us to compute conditional probabilities from other forms of probability.

Which method of data collection is involved when a researcher counts and records the number of students wearing backpacks on campus on a given day?

Direct observation

Data

Facts and statistics collected together for reference or analysis

several drawbacks to the rejection region method:

Foremost among them is the type of information provided by the result of the test. The rejection region method produces a yes or no response to the question, "Is there sufficient statistical evidence to infer that the alternative hypothesis is true?" The implication is that the result of the test of hypothesis will be converted automatically into one of two possible courses of action: one action as a result of rejecting the null hypothesis in favor of the alternative and another as a result of not rejecting the null hypothesis in favor of the alternative. The rejection of the null hypothesis seems to imply that the new billing system will be installed. What is needed to take full advantage of the information available from the test result and make a better decision is a measure of the amount of statistical evidence supporting the alternative hypothesis so that it can be weighed in relation to the other factors, especially the financial ones. The p-value of a test provides this measure.

𝛼

Greek letter "alpha" If we use 𝛼 to represent significance, then our confidence level is 1−𝛼 Confidence Level + Significance Level = 1 Consider a statement from polling data you may hear about in the news: "This poll is considered accurate within 3.4 percentage points, 19 times out of 20." In this case, our confidence level is 95% (19/20 = 0.95), while our significance level is 5%. A 5% significance level means, that in the long run, this type of conclusion will be wrong 5% of the time.

Multiplication Rule For Independent Events

If A and B are independent events, P(A|B) = P(A) and P(B|A) = P(B). It follows that the joint probability of two independent events is simply the product of the probabilities of the two events. We can express this as a special form of the multiplication rule. The joint probability of any two independent events A and B is P(A ∩ B) = P(A) • P(B).

Which of the following statements is not correct?

If event A does not occur, then its complement A' will also not occur.

rejection region method

It can be used in conjunction with the computer, but it is mandatory for those computing statistics manually. a range of values such that if the test statistic falls into that range, we decide to reject the null hypothesis in favor of the alternative hypothesis.

We rejected the null hypothesis. Does this prove that the alternative hypothesis is true?

No our conclusion is based on sample data (and not on the entire population), so we can never prove anything by using statistical inference. Consequently, we summarize the test by stating that there is enough statistical evidence to infer that the null hypothesis is false and that the alternative hypothesis is true.

Critical Concepts in Hypothesis Testing: Concept 5

Two possible errors can be made in any test: A Type I error occurs when we reject a true null hypothesis. A Type II error occurs when we don't reject a false null hypothesis. There are probabilities associated with each type of error: 𝑃(Type I error)=𝛼 𝑃(Type II error)=𝛽

inferential statistics

a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data. Exit polls are a very common application of statistical inference.

Marginal probability

a measure of the likelihood that a particular event will occur, regardless of whether another event occurs. Marginal probabilities, computed by adding across rows or down columns, are so named because they are calculated in the margins of the table

sampling plan

a method or procedure for specifying how a sample will be taken from a population. Three different sampling plans include simple random sampling, stratified random sampling, and cluster sampling.

simple random sample

a sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen. One way to conduct a simple random sample is to assign a number to each element in the population, write these numbers on individual slips of paper, toss them into a hat, and draw the required number of slips (the sample size, n) from the hat. Sometimes the elements of the population are already numbered such as Social Security numbers, employee numbers, or driver's license numbers. After each element of the chosen population has been assigned a unique number, sample numbers can be selected at random.

Sample

a set of data drawn from the population. A descriptive measure of a sample is called a statistic. We use statistics to make inferences about parameters.

Which of the following is a measure of the reliability of a statistical inference

a significance level

cluster sample

a simple random sample of groups or clusters of elements versus a simple random sample of individual objects. Cluster sampling is particularly useful when it is difficult or costly to develop a complete list of the population members (making it difficult and costly to generate a simple random sample). It is also useful whenever the population elements are widely dispersed geographically. cluster sampling also increases sampling error because households belonging to the same cluster are likely to be similar in many aspects, including household income. This can be partially offset by using some of the cost savings to choose a larger sample than would be used for a simple random sample. Whichever type of sampling plan you select, you still have to decide what sample size to use. We can rely on our intuition which tells us that the larger the sample size is the more accurate we can expect the sample estimates to be.

null hypothesis

a statement or idea that can be falsified, or proved wrong represented by 𝐻0 (pronounced H-nought)

Three types of non-sampling errors are errors in:

data acquisition, non-response errors, and selection bias

subjective approach

define probability as the degree of belief that we hold in the occurrence of an event. Subjective probabilities can also be described as hunches or educated guesses.

Type II Error

defined as not rejecting a false null hypothesis denoted by 𝛽 (Greek letter beta) The error probabilities 𝛼 and 𝛽 are inversely related, meaning that any attempt to reduce one will increase the other.

"Probability of Precipitation" (P.O.P.)

defined in different ways by different forecasters, but basically it's a subjective probability based on past observations combined with current weather conditions. A P.O.P. of 60% means that based on current conditions, there is a 60% chance of rain.

three of the most popular methods to collect data:

direct observation (ex: number of customers entering a bank per hour), experiments (ex: new ways to produce things to minimize costs), and surveys.

The process of using sample statistics to draw conclusions about population parameters is called

doing inferential statistics

Data acquisition

errors arise from the recording of incorrect responses. Incorrect responses may be the result of incorrect measurements taken because of faulty equipment, mistakes made during transcription from primary sources, inaccurate recording of data due to misinterpretation of terms, or inaccurate responses to questions concerning sensitive issues such as sexual activity or possible tax evasion.

If a set of events includes all the possible outcomes of an experiment, these events are considered to be

exhaustive

Sampling

statistical inference permits us to draw conclusions about a population based on a sample. The chief motives for examining a sample rather than a population are cost and practicality. Statistical inference permits us to draw conclusions about a population parameter based on a sample that is quite small in comparison to the size of the population. Another illustration of sampling can be taken from the field of quality management. To ensure that a production process is operating properly, the operations manager needs to know what proportion of items being produced is defective. If the quality technician must destroy the item to determine whether it is defective, then there is no alternative to sampling: A complete inspection of the product population would destroy the entire output of the production process, which is impractical The sample statistic can come quite close to the parameter it is designed to estimate if the target population (the population about which we want to draw inferences) and the sampled population (the actual population from which the sample has been taken) are equal. But in practice, these populations may not be equal. In any case, the sampled population and the target population should be close to one another.

The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a videocassette recorder over the past 12 months are satisfied with their products. If there are four different brands of videocassette recorders made by the company, the best sampling strategy would be to use a ________

stratified random sample

variance

the average squared deviation from the mean

The concept that allows us to draw conclusions about the population based strictly on sample data without having any knowledge about the distribution of the underlying population is __________

the central limit theorem

Population:

the group of all items of interest to a statistics practitioner. It is frequently very large and may, in fact, be infinitely large. In the language of statistics, population does not necessarily refer to a group of people. It may, for example, refer to the population of diameters of ball bearings produced at a large plant. A descriptive measure of a population is called a parameter. In most applications of inferential statistics, the parameter represents the information we need.

mode

the most frequently occurring score(s) in a distribution

alternative or research hypothesis

the opposite of null hypothesis- consists of a statement about the expected relationship between the variables denoted 𝐻1

Complement Rule

the probability of an event occurring is 1 minus the probability that it doesn't occur The complement rule is 𝑃(𝐴𝑐) = 1-𝑃(𝐴)

joint probability

the probability of the intersection of two events The intersection of events A and B is the event as seen in Figure 4 that occurs when both A and B occur. It is denoted as: 𝐴∩𝐵 (read as A intersect B) This type of combination is called a union of two events as seen in Figure 5. It is denoted as: 𝐴∪𝐵 (read as A union B) the event containing all sample points that are in A or B or both

statistical inference

the process of making an estimate, prediction, or decision about a population based on sample data. Because populations are almost always very large, investigating each member of the population would be impractical and expensive. It is far easier and cheaper to take a sample from the population of interest and draw conclusions or make estimates about the population on the basis of information provided by the sample. However, such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference a measure of reliability. There are two such measures, the confidence level and the significance level. The confidence level is the proportion of times that an estimating procedure will be correct. When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run. Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.

The response rate:

the proportion of all people selected who complete the survey, is a key survey parameter and helps in the understanding of the validity of the survey and sources of non-response error. Non-response errors can occur for a number of reasons. An interviewer may be unable to contact a person listed in the sample or the sampled person may refuse to respond for some reason. In either case, responses are not obtained from a sampled person and bias is introduced. The problem of non-response error is even greater when self-administered questionnaires are used rather than an interviewer who can attempt to reduce the non-response rate by means of callbacks.

response rate

the proportion of all people who were selected who complete the survey. A low response rate can destroy the validity of any conclusion resulting from the statistical analysis. Statistics practitioners need to ensure that data are reliable.

Confidence level

the proportion of times that an estimating procedure will be correct. A confidence level of 95% means that estimates based on this form of statistical inference will be correct 95% of the time.

the p-value approach

the researcher determines the exact probability of obtaining the observed sample difference, under the assumption that the null hypothesis is correct

Each of the following are characteristics of the sampling distribution of the mean:

the sampling distribution of the mean has a different mean from the original population the standard deviation of the sampling distribution of the mean is referred to as the standard deviation if the original population is not normally distributed, the sampling distribution of the mean will be normal


Kaugnay na mga set ng pag-aaral

Chapter 5 Risk, Issue and Project Change Management

View Set

US Government Chapter 2-1 and 2-2 Quizzes

View Set

NASM CPT Chapter 20 Developing a Successful Personal Training Business

View Set