Stats

Ace your homework & exams now with Quizwiz!

What is a p‐value? How is it used to reach a conclusion about a hypothesis test?

p-value - the probability of obtaining a test statistic value equal to or more extreme than that obtained from the sample data when the null hypothesis is true. Compare the p-value to the chosen level of significance alpha; whenever p < alpha, we reject the null hypothesis.

Relative frequency definition -

- the probabilities are observed empirically from sample data. If we count 10 freshmen from the total sample of 50 students, we could conclude that the probability of a random student on campus will be a freshman is 10/50 = 0.2.

What are the key assumptions of ANOVA? What should be done if these are seriously violated?

ANOVA assumptions are that samples from each of the groups being compared are randomly and independently obtained, are normally distributed, and have equal variances. The first two assumptions are relatively easily confirmed to, and ANOVA is fairly robust to minor violations in these two assumptions. The third assumption does not have serious consequences when the sample sizes are equal, when they are not, we should run the Levine test for equality of variances. If the test fails, we should consider using one of the nonparametric tests instead.

What is analysis of variance? What hypothesis does it test? Provide some practical examples

ANOVA is a methodology used to compare the means of several different groups to determine if all are equal, or if any are significantly different from the rest. The null hypothesis is that the population means of all groups are equal; the alternative hypothesis is that at least one mean differs from the rest. Comparing average returns for several mutual funds offered by different institutions that belong to the same category, comparing average prices of several brands of the same product.

Describe some ways in which data are used in different business functions.

An Answer: Data is used in many different business functions: a. Finance and Accounting - the data is the basic element from which a balance sheet is created, and the determination of costs and profits at a company or within a business unit. b. Marketing - data is used to determine advertising impact, how, when, and where coupons and sales promotions are used by customers, in market research to determine customer satisfaction and where new product interests might lie. c. Human Resources - data is used to determine employee turnover, attendance, success of orientation programs and the effectiveness of training programs. d. Strategic planning - data is used to determine which countries a company may want to enter in a market and where to build manufacturing and warehouse facilities.

Explain how to compute the mean and variance of a sample and a population. How would you explain the formulas in simple English?

Answer: If a population consists of N observations x1, . . . , xN, population mean, µ is calculated as the ratio of sum of the observations x1, . . . , xN to the total number of observations, N. The mean of a sample of n observations, x1, . . . , xn, denoted by "x‐bar" is calculated as the ratio of sum of the observations, x1, . . . , xn to the total number of observations, n. Variance of a population is the sum of the squared deviations of the observations x1, . . . , xN from its mean ,µ divided by the total number of observations, N Variance of a population is the sum of the squared deviations of the observations x1, . . . , xn from its mean ,x bar divided by the total number of observations minus one.

Define probability and explain its three perspectives. Provide an example of each.

Answer: Probability - the likelihood that an outcome occurs. It must be a number between 0 and 1, and the sum of probabilities over all outcomes must add up to 1. The three perspectives: Classical definition of probability - if the process that generates the outcomes is known, probabilities can be deduced from theoretical arguments. Games of chance are typical examples, and the probabilities are calculated by determining the number of possible ways to succeed out of the total number of possibilities. Getting two heads after throwing two quarters in the air is 1/4, because there is only one way to get HH out of four possible outcomes HH, HT, TH, and TT.

How is the standard normal distribution used to compute normal probabilities?

Answer: Standard normal distribution - It is a special case of the normal distribution with mean 0 and standard deviation 1. A standard normal random variable is usually denoted by Z, and its density function by f(z). The standard normal table or the Excel 2010 function NORM.S.DIST(z) are used to compute normal probabilities. NORM.S.DIST(z) generates the cumulative probability for a standard normal distribution.

Explain the difference between a discrete and a continuous metric

Answer: A discrete metric is countable and finite number of distinct values and is usually expressed as counts or proportions. When expressed as "counts," there are "gaps" between the possible values of "discrete" metrics. Continuous metrics are results of measurements, such as length, time or weight, and assume an infinite (continuous) range of possibilities. As such, there are NO gaps between the possible values of "continuous" metrics. We are only bound by the preciseness of our measurement device.

What is a metric, and how does it differ from a measure?

Answer: A metric is a unit of measurement that provides a method for objectively quantifying performance. A measurement is the act of obtaining data. Measurement creates measures which are numerical values associated with a metric.

What is the difference between a population and a sample?

Answer: A population consists of all items of interest for a particular decision or investigation, such as all the residents of a county or all the students at a university. A sample is a subset of a population, such as the residents in a neighborhood or the students in a business statistics class.

Explain the differences between categorical, ordinal, interval, and ratio data.

Answer: Categorical data or nominal data is data that is sorted into categories according to specified characteristics, without any natural order, such as male/female by geographic regions. Ordinal data are ordered or ranked according to some relationship to one another. Rating a service as poor, average, good, very good, or excellent is an example of ordinal data. Interval data are ordered, have a specified measure of the distance between observations but have no natural zero. Common examples are time and temperature. Ratio data is interval data which have a natural zero. Most business and economic data fall into this category, and statistical methods are the most widely applicable to them.

Explain the difference between cross‐sectional and time‐series data.

Answer: Cross sectional data is the data that are collected over a single period of time, such as responses to market questionnaires. Time series data is the data collected over a period of time, such as NASDAQ's daily returns.

Explain the information contained in box plots.

Box plots - graphically display five key statistics of a data set, the minimum, first quartile, median, third quartile, and maximum, and are very useful in identifying the shape of a distribution and outliers in the data.

Explain the coefficient of variation and how it can be used.

Coefficient of variation - provides a relative measure of the dispersion in data relative to the mean. This allows a researcher to compare 2 stocks that have different means and standard deviations. For the stock with the larger coefficient of variation, we could say that it took more risk per unit of return than the other stock did.

Explain the notion of conditional probability and how it is computed.

Conditional probability - the probability of occurrence of one event A, given that another event B is known to have already occurred. Given that a drug passed clinical trials, how likely is it to be approved by FDA is an example of condition probability? In general, the conditional probability of an event A given that event B is known to have occurred is computed as P(A|B) = P(A and B)/P(B).

Explain the concept of correlation and how to interpret correlation coefficients of 0.3, 0, and -0.95.

Correlation - a measure of the strength of a linear relationship between 2 variables. The correlation of 0 implies lack of relationship, correlation of 0.3 represents a weak positive relationship, and a correlation of -0.95 represents a strong negative relationship.

Provide some examples of data profiles.

Data profiling is an analysis of data to better understand relationships in data, as well as similarities and differences. Data profiles are often expressed as percentiles and quartiles. Percentiles are used on standardized tests used for college or graduate school entrance examinations (SAT, ACT, GMAT, GRE, etc.). Percentiles specify the percentage of other test takers who scored at or below the score of a particular individual.

Discuss how confidence intervals can help in making decisions. Provide some examples different from those in the chapter.

Decision making: A confidence interval for the mean returns of a given stock portfolio can be used to make investment decisions. If the proportion of the interval that is negative is too large, the portfolio might be considered too risky. If a confidence interval for the proportion of customers who responded to TV advertising is both high enough and narrow enough, the company might consider rolling the product out nationally

Explain the principal types of descriptive statistics measures that are used for describing data.

Descriptive statistics - a collection of quantitative measures and methods of describing data. This includes the measure of central tendency, (mean, median mode and proportion.), the measure of dispersion, (range, variance, standard deviation), the measure of shape (skewness, kurtosis) and frequency distributions and histograms.

What are frequency distributions and histograms? What information do they provide?

Frequency distribution - a tabular summary that shows the frequency of observations in each of several nonoverlapping classes. Histogram - graphical depiction of a frequency distribution in the form of a column chart. Both frequency distribution and the histogram allow us to visually examine the center, dispersion (variability) and shape of a distribution.

Explain the importance of sampling from a managerial perspective

From managerial perspective, sampling plays an important role in providing information for making business decisions. It is often impossible to survey all the customers or inspect every product, but instead a sample of customers is contacted, or a sample of products inspected, and the sample is used as a representative of a larger group.

Explain the notion of hypothesis testing. What is the general process that one should follow in conducting a hypothesis test?

Hypothesis testing - involves drawing inferences about two contrasting propositions (hypotheses) relating to the value of a population parameter, one of which is assumed to be true in the absence of contradictory data. In conducting a hypothesis test, we seek evidence based on the sample data, to determine if the assumed hypothesis can be rejected; if not, we can only assume it to be true. The process of hypothesis testing involves the following steps: a. Formulating the hypotheses to test b. Selecting a level of significance, which defines the risk of drawing an incorrect conclusion about the assumed hypothesis that is actually true. c. Determining the decision rule on which to base the conclusion. d. Collecting data and calculating a test statistics. e. Applying the decision rule to the test statistic and drawing a conclusion.

What is the difference between paired and independent samples?

Independent samples typically come from different populations and have different sample sizes, and refer to different subjects, like male and female customers. Paired samples refer to a single subject, like a student, come from the same population, have the same sample size, and the difference between the two is of particular interest, like before and after comparisons of students taking a test before and after extensive tutoring, to compare the results and hopefully notice overall improvement.

Explain the difference between the mean, median, mode, and midrange. In what situations might one be more useful than the others?

Mean - an arithmetic average of a set of observations and is the most appropriate tool for interval and ratio data without significant outliers. Median - the middle point of a sorted set of observations, and is the most appropriate tool for ordinal, interval and ratio data and is not affected by outliers. Mode - the most frequent data point in a set of observations, and is appropriate only for nominal and ordinal data with few frequently occurring observations. Midrange - the average of the largest and smallest observations, and is appropriate when the number of observations is relatively small and is adversely impacted by the presence of outliers.

What is the difference between non-sampling error and sampling error? Why might each type of error occur?

Non-sampling error - occurs when the sample does not represent the target population adequately. This is generally the result of poor sample design or choosing the wrong sampling frame. Sampling (statistical) error - occurs because samples are only a subset of the total population, is inherent in any sampling process, and although it can be minimized, it cannot be totally avoided.

What is a PivotTable? Describe some of the key features that PivotTables have.

PivotTables allows you to create custom summaries and charts of key information in the data. PivotTables also provide an easy method of constructing cross‐tabulations for categorical data. The beauty of PivotTables is that if you wish to change the analysis, you can simply uncheck the boxes in the PivotTable Field List or drag the variable names to different field areas. You may easily add multiple variables in the fields to create different views of the data

Explain the difference between the null and alternative hypothesis. Which one can be proven in a statistical sense?

Null hypothesis - represents a theory or statement about the status quo that is accepted as correct. Alternative hypothesis - must be true if we conclude that the null hypothesis is false. Null hypothesis can be proven in a statistical sense

Explain how to compute the relative frequency and cumulative relative frequency.

Once the classes (bin, intervals) for the distribution are determined, based on the range of data and the desired number of bins, the relative frequency is computed by counting how many observations fall into each of the bins and then divided by the total number of observations. Cumulative relative frequency - the running total of relative frequencies up to the upper level of each bin.

What is a proportion? Provide some practical examples where proportions are used in business

Proportion - the fraction of data that have a certain characteristic. It is used mostly with categorical data, such as marketing survey responses. A typical business example might be, "What proportion of school aged children buy a school lunch every day."

What statistical measures are used for describing dispersion in data? How do they differ from one another?

Range - the difference between the largest and the smallest observation, and is extremely sensitive to outliers. Variance - the average of squared deviations for the mean and is also affected by outliers, but not to the same extent as the range. It is expressed in squared units. Standard deviation - the square root of the variance, and represents and average deviation from the mean.

Explain the concepts of skewness and kurtosis and what they tell about the distribution of data.

Skewness - represents the degree of asymmetry of a distribution around its mean. The closer skewness gets to zero, the closer the distribution is to a perfectly symmetrical one. Positive numbers represent right-skewed distributions, and negative numbers represent a distribution that is left skewed. Kurtosis refers to the peakedness (high and narrow) or flatness of a distribution. The higher the kurtosis, the more area the distribution has in its tails rather than in the middle.

Explain the importance of statistics in business.

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. In business, statistics is quite important because it allows managers to make fact-based decisions instead of "gut feel" type decisions. In addition, if various claims are made about a product or service, the use of statistics may strongly support or strongly refute such claims which can discourage legal challenges and possibly allow for true and ethical decisions about a hypothesis.

Describe the difference between subjective and probabilistic sampling methods. What are the advantages and disadvantages of each?

Subjective sampling methods, such as judgmental and convenience sampling, rely on either an expert in the field to make a judgment about sample selection, or the convenience of data availability. Probabilistic sampling methods rely on some random procedure when selecting a sample. Advantage of subjective sampling is the relative ease of obtaining the sample, with introduction of bias in such non-random samples being a major disadvantage. Probabilistic sampling has an advantage of creating samples that represent the general population much better, but may be harder to generate and require some expertise

List the different types of charts available in Google Sheets, and explain characteristics of data sets that make each chart most appropriate to use.

Suggested Answer: There are many different types of charts that Excel can generate: a. Column and bar charts can be used to compare types of data against each other or against a standard. Column charts are vertical and bar charts are horizontal. b. Line charts provide a useful means for displaying data over time. c. Pie charts show the relative proportion of each data source to the total. d. Area charts combines the features of a pie chart with those of line charts. Area charts present more information than pie or line charts alone, but may clutter the observer's mind with too many details if too many data sets are used. e. Scatter diagrams show the relationship between two variables. f. Stock charts allow a manager to plot stock prices, including the high, low, and close. g. Doughnut charts are similar to pie charts, but can include more than one set of data. h. Surface charts show 3 dimensional data. i. A bubble chart is a type of scatter chart, but the size of the data marker corresponds to the value of a 3rd variable. j. A radar chart allows for the plotting of multiple dimensions of several data series.

Explain the peculiar nuances associated with the Excel tools for two‐sample t ‐tests. What issues must you consider to use the tests and interpret the results properly?

Test output in Excel provides results for a one-tail test only. For two-sample t-tests,the sign of the critical value must be taken as plus or minus and the obtained p-value must be multiplied by 2 to get the correct result.

Explain the central limit theorem. Why is it important in statistical analysis?

The central limit theorem states that if the sample size is large enough, the sampling distribution of the mean is approximately normally distributed, regardless of the distribution of the population, and that the mean of the sampling distribution will be the same as that of the population. The central limit theorem also states that if the population is normally distributed, then the sampling distribution of the mean will also be normal for any sample size. It is very important in statistical analysis as it allows us to use the theory we learned about calculating probabilities for normal distributions to draw conclusions about sample means.

What is the standard error of the mean? How does it relate to the standard deviation of the population from which a sample is taken?

The standard error of the mean is the standard deviation of the sampling distribution of the mean. It is the standard deviation of the population from which a sample is taken divided by the square root of the sample size.

How does the t ‐distribution differ from the standard normal distribution?

The t-distribution has a larger variance than the standard normal, thus making confidence intervals wider than those obtained from the standard normal distribution.

Explain Type I and Type II errors. Which one is controllable by the experimenter?

Type I error - the null hypothesis is actually true, but the hypothesis test incorrectly rejects it. The probability of type I error is called a level of significance, controllable by the experimenter. Type II error - the null hypothesis is actually false, but the hypothesis test incorrectly fails to reject it. The probability of type II error is not controllable by the experimenter

What do we mean by an unbiased estimator? Why is this important?

Unbiased estimator - an estimator whose expected value equals the population parameter it is intended to estimate. If this is not true, the estimator is called biased and will not provide correct results.

Explain the importance of the standard deviation in interpreting and drawing conclusions about risk

When comparing financial investments such as stocks, investors compare average returns, but also risks. If 2 stocks have average returns, and the standard deviation is much higher than the other, than we may conclude that the stock with the higher standard deviation is riskier or more volatile

How can one estimate the mean and variance of data that are summarized in a grouped frequency distribution? Why are these only estimates?

When data are summarized in a grouped frequency distribution the mean of the data is estimated as = Variance of data is given as . They are only estimates since they are calculated using the sample data

Subjective probability

the probabilities are determined based on subjective judgment. There is 50-50 chance that the Fed will cut interest rates tomorrow.


Related study sets

Accounting practice problems and notes

View Set

Réponds aux questions suivantes

View Set

Economics of Money (2154) Chapter 1

View Set