statistics midterm
Which data scales of measurement are associated with QUALITATIVE Data?
Labels or names identify distinguishing characteristic of each observation. EX: gender, race, business
What is a Poisson random distribution?
it counts the number of occurrences of a certain event over a given interval of time or space. The occurrences are called "successes". (this is like the Bernoulli process in that many experiments fit the conditions of a Poisson process. • the number of success over a time or space interval equals any integer between zero and infinity; • number of successes counted in nonoverlapping intervals are independent. • Probability that success occurs in any interval is the same for all intervals of equal size and is proportioned to the size of the interval.
nominal scale:
least sophisticated level of management. Differ only by name or value. Categorized/grouped by name
How do you define the range?
The range is the simplest measure of dispersion. It is the difference between the Max and Min values in a data set. RANGE = MAX-Min. The range is NOT considered a good measure of dispersion because it focuses solely on the extreme values and ignores all other observations in the data set.
What is an example of cross-sectional data?
Values of a characteristic of many subjects at the same point or approximate point at the same point in time, or without regard to differences in time. EX: tween responses to 4 questions at end of ski season. EX: sale prices of sf homes last month EX: starting salaries of recent grads
What is an "estimate"?
When a statistic is used to estimate a parameter, it is referred to as an estimator. A particular value of the estimator is called an estimate. Commonly an estimator is referred to as a point estimator because it provides a single point as a population parameter.
Population:
all items of interest in statistical problem
What is empirical probability
an event is the observed relative frequency with which an event occurs. The experiment must be repeated a large number of times for empirical probabilities to be accurate.
What is a mutually exclusive event?
if they do not share any common outcome of an experiment. For two mutually exclusive events, the occurrence of one event precludes the occurrence of the other. EX: we define the two events "at least earning a silver medal"-outcomes of silver and gold AND "at most earning a silver medal"-outcomes of silver, bronze, no medal. These events are exhaustive because no outcome is omitted
What is acceptance sampling?
involves statistical techniques used to develop and maintain a firm's ability to produce high-quality goods and services
What is the detection approach?
is a preferred approach to quality control, when a firm inspects production process and determines where it does not meet specification.
What is required in a simple random sample?
is a sample of n observations that have the same probably of being selected from the population as any other same of n observations. Most statistical methods presume simple random samples
What is a Classical probability?
is based on logical analysis rather than observation or personal judgment. Often used in games of chance; based on assumption that all outcomes are equally likely. ** according to the famous law of large numbers, the empirical probability approaches the classical probability if the experiment is run a very large number of times.**
What is a continuous random variable?
is characterized by uncountable values of an interval. A random variable is a function that assigns numerical values to the outcomes of an experiment.
According to the central limit theorem, as n gets larger for any distribution, what happens to the sampling distribution of the sample mean?
...
What is the probability that a normal random variable is less than its mean?
...
What are the most widely used measure(s) of dispersion? A good measure of dispersion should consider differences of all observations from the mean
.The two most widely measures of dispersion are the VARIANCE and STANDARD DEVIATION, where we calculate the average of the squared differences from the mean. The variance is defined as the average of the squared differences between the observations and the mean.
4. For a particular store, a marketing firm finds that 10% of $10-off coupons delivered by mail are redeemed. Suppose eight customers are randomly selected and are mailed $10-off coupons. What is the expected number of coupons that will be redeemed?
8 x .10 = .80
What may be revealed in a scatterplot?
>a linear relationship exists between the two variables; >a curvilinear relationship exists between the two variables; or >no relationship exists between the two variables
An experiment satisfies a POISSON PROCESS if:
>the number of success over a time or space interval equals any integer between zero and infinity; >number of successes counted in nonoverlapping intervals are independent. >Probability that success occurs in any interval is the same for all intervals of equal size and is proportioned to the size of the interval.
What graphical tool is best used to display relative frequency of grouped quantitative data?
A Scatterplot helps determine if two quantitative variables are related in some systematic way. Each point in diagram represents a pair of observed values of the two variables.
What is the difference between a discrete variable and continuous variable?
A discrete variable assumes a countable number of values, whereas a continuous variable can take on any value within an interval
ordinal scale:
Able to categorize and rank. Differences between ranked values are meaningless. EX: rate a product/professor where 1=worst...4=best
What are the two approaches used for statistical quality control?
Acceptance sampling and detection
What is a simple event?
An event is called simple if it contains a single outcome. An event is any subset of outcomes
What are two divided branches of statistics?
Descriptive statistics and inferential statistics
Inferential Statistics:
Drawing conclusions about large set of data called POPULATION, based on a smaller set of SAMPLE DATA
Does a discrete random variable have a probability mass function?
Every random variable is associated with a probability distribution that describes it completely. It is common to define discrete random variables in terms of their probability mass function and continuous random variables in terms of their probability density function. The probability mass function of a discrete random variable X is a list of all possible pairs (x, P(X=x)
What is an example of time series data?
Values of a characteristic of a subject OVER time. EX: monthly sales of cars in 2010 EX: daily price of IBM stock in first quarter EX: weekly exchange rate between US$ and EURO
What is chance variation
caused by a number of randomly occurring events that are part of the production process.
Examples of Continuous Variable:
characterized by uncountable values within certain interval; weight, time, i.e., unlimited number of values occur between the weights of 100 and 100: 100.1 100.2 100.3 100.43825......
qualitative variable
labels/names used to categorize distinguishing characteristics EX: gender, race, profession, manufacturer
What is Statistics?
language of data: the methodology of extracting info from data set involving involves 3 steps: >find the right data >use appropriate tools >communicate numerical info into written language
How many parameters are needed to fully describe any normal distribution?
only 2 are need namely μ=mean and σ2=standard deviation
When does nonresponse bias occur?
refers to a systematic difference in preferences between respondents and non-respondents to a survey or a poll.
What is the empirical rule for normal distribution?
see figure 6.17, page 187. Given a normal random variable X with mean and standard deviation: the exact percentages can be found
Sample:
subset of population
When constructing a frequency distribution for QUANTATIVE data, what is important to remember?
that the data are more manageable using a frequency distribution, but some detail is lost because we no longer see the actual values.
When does selection bias occur?
under-representation of certain groups from considering for the sample.
What is the main drawback of the interval scale?
value of zero is arbitrarily chosen; this implies that ratios constructed from interval-scaled values bear no significance
What best describes a frequency distribution for qualitative data
For qualitative data, a frequency distribution groups data into categories and records the number of observations that fall into each category. A relative frequency distribution shows the proportion of observations per category.
C H A P T E R 2 What is a useful tool to summarize qualitative Data?
Frequency distributions (we visualize them by using pie or bar charts)
What is the empirical rule?
Given a sample mean xbar, a sample standard deviation s, and a relatively symmetric and bell-shaped distribution: >Approx. 68% of all observations fall in the xbar +/- s, >Approx. 95% of all observations fall in the interval xbar +/- 2s, and >Almost all observation fall in the interval xbar +/- 3s.
Which data scales of measurement are associated with QUANTITATIVE Data?
Meaningful numerical values. EX: #of children, #of points
How is the mode defined?
Mode of a data set is the value that occurs most frequently. There can be more than one mode or no mode in a set. When a data set has one mode it is called a unimodal, 2 or more is called multimodal, two modes is called
For both qualitative and quantitative data, what is the difference between the relative frequency and percent frequency?
RELATIVE FREQUENCY for each category = proportion of observations in each category. Sum should =1 PERCENT FREQUENCY is the % of observations in category; = relative frequency of category X 100
What is the ratio scale?
Represents the strongest level of measurement. Ratio data have all the characteristics of interval data as well as a true zero point. Meaningful ratios can be calculated with values on the ratio scale. EX: profits, inventory levels, weight, time, distance, are measured on ratio scale since ZERO IS MEANINGFUL
interval scale
Stronger measure scale than nominal/ordinal. Can be categorized and ranked. Differences between scale values ARE meaningful. EX: Fahrenheit scale
Descriptive statistics:
Summary of important aspects of data set. This is the most viable application of statistics. EX: employment rate, president's approval rating, averages
T or F BLS takes a sample survey of a portion of the population to measure unemployment
TRUE
Is a z-score a unitless measure?
Yes because its numerator and denominator have the same units, which cancel out with each other. It measures the distance of a given sample value from the mean in standard deviations. EX: a z-score of 2 implies given sample value is 2 standard deviations above the mean. A Z-score (calculated as x-xbar)/s) measures relative location of sample value x; it is also used to detect outliers
What is a priori probability?
a method to determine the likelihood an asset's price will behave a certain way based on odds, not history.
C H A P T E R 4 What is probability?
a numerical value that measures the likelihood that an uncertain event occurs. This value is between zero and one, where a value of zero indicates impossible events and a value of one indicates definite events
What defines a statistic, such as the sample mean or sample proportion?
a random variable whose value depends on the chosen random sample.
What is required in a stratified random sample?
first the population is divided into mutual exclusive, collectively exhaustive groups (strata). The number of observations is proportional to stratum's size in populations. Two advantages: 1-guarantees populations subdivisions of interest represented in sample 2-estimates of parameters produced from stratified random sampling have greater precision than estimates obtain from simple random sampling.
variable:
general characteristic being observed on set of people, objects, events, when each observation varies in kind or degree
Frequency Distribution for qualitative data:
groups data into categories and records the number of observations that fall into each category
QUANTITATIVE Data and Frequency Distribution: (p 29)
groups data into intervals called CLASSES and records #observations that falls into each class. It records #observations that falls below the upper limit of each class
When are events collectively exhaustive?
if all possible outcomes of an experiment belong to the events. Ex: earning a medal and failing to earn a medal. They are exhaustive because they include all outcomes in the sample space.
Examples of Discrete:
# of children in a family; #of points scored in game. (you won't observe 1.3 children, or 92.5 points in a ball game). NOTE: you can have a stock price, $23.37... you can also have an infinite number such as #of cars crossing bridge of Saturday 0,1,2,3,etc
Examples of Poisson Random variables with respect to time:
#cars crossing Brooklyn bridge between 9am-10am Monday #customers that use McDs drive-thru in a day
Examples of Poisson random variables with respect to space:
#defects in 50 yd roll fabric #schools of fish in 100 sq miles
What properties does the probability density function of a continuous random variable have.
? Definition: is an equation used to compute probabilities of continuous random variables, and must satisfy the following 2 properties: >The total area under the graph of the equation over all possible values of the random variable must equal 1. >The height of the graph of the equation must be great than or equal to 0 for all possible values of the random variable. That is, the graph of the equation must lie on or above the horizontal axis for all possible value of the random variable. ***the probability density function f(x) of a continuous random variable X is non-negative and the entire area under this function = 1.
What is a binomial random variable?
A bionomial random variable is defined as the number of successes achieved in the n trials of a Bermoulli process. A Bermoulii process consists of a series n independent and identical trials of an experiment such that on each trial: *there are only two possible outcomes, conventionally labeled success and failiture and; *each time the trial is repeated, the probabilities of success and failure remain the same The possible values of a binomial random variable include 0, 1, . . . . n. EX: a bank grants or denies a loan to a mortgage applicant *a consumer either uses or does not use a credit card *an employee travels or does not travel by public transportation *a life insurance policy dies or does not die Binomial distribution shows the probabilities associated with the possible values of X.
What does a sample space contain?
A sample sample, S, of an experiment contains all possible outcomes of the experiment. Ex: sample space representing the letter grade in a course is given by S = {A, B, C, D, F}. If the teacher also gives out an I for incomplete then S is not valid because all outcomes are not included in S.
C H A P T E R 7 What is "bias" in sampling?
It occurs when info from a sample is not typical of that in the population in a systematic way. Often happens from samples not representative of the population.
C H A P T E R 6 What is a continuous random variable?
Characterized by uncountable values because it can take on any value within an interval. CANNOT describe the possible values of a continuous random variable X with a list x1, x2, ... The probability that a continuous random variable X assumes a particular value x is zero. The counterpart to the probability mass function is call the probability density function f(x). The probability density function f(x) of a continuous random variable x has the following properties: f(x) ≥0 for all possible values x of X and, The area under f(x) over all values of x equals 1. To compute probabilities for continuous random variables we can use the cumulative distribution function.
Relative Frequency Distribution:
Comparing February to March has different #days so we calculate each category's frequency distribution to RELATIVE Frequency Distribution by: category's frequency / total #observations. >sum of relative frequencies should =1, or a very close value ****see page 18-19
How are horizontal bar charts constructed?
Horizontal bar and pie charts are used to show frequency distributions for qualitative data. A bar chart depicts frequency/relative frequency for each category of qualitative variable as series of horizontal/vertical bars, and their lengths are proportional to values depicted
How does the variance of the sample mean compare to the variance of the population?
If we were to sample repeatedly from a given population, the variance of a sample mean will equal the variance of the individual variance drawn from the underlying population
What is hypergeometric distribubution?
Is appropriate in applications where we cannot assume the trials are independent. The probability of success may not be the same from trial to trial; it will depend on population size and if the sampling was done with or without replacement. Example: a box contains 20 items, 10%, or 2, are defective. The probability of success in first draw is 0.10 (=2/20). If the first draw was defective, the probability the 2nd will be is 0.526(=1/19); if the first item had not been defective, probability of the 2nd would be 0.1053(=2/19) THIS tells us binomial distribution is not appropriate because trial are NOT independent and probability changes trial to trial.
C H A P T E R 1 Why are population parameters difficult to calculate?
It is expensive to get info on the entire population AND It is impossible to example very member of the population
T or F We analyze sample data and calculate a sample statistic to make inferences about the unknown population parameter
TRUE
What is the most commonly used measure of central location?
The mean, (or average arithmetic mean), is the primary measure. What is one weakness of this measure? It is unduly influenced by outliers.
C H A P T E R 3 How is the median defined?
The median is the middle value of a data set. We arrange the data in ascending (smallest to largest) order and calculate the median as *the middle value if the number of observation is odd, or *the average of the two middle values if the number of observations is even. The median is especially useful when outliers are present. Outliers are extremely small or extremely large values such as a very low or very high grade.
What are the characteristics of the normal distribution?
The most extensively used continuous probability distribution and is the cornerstone of statistical inference. >Familiar bell shape; symmetric around its mean, meaning the mean, median, mode are all equal for a normally distributed random value >Normal distribution is completely described by two parameters - population mean (describes central location) and population variance (describes dispersion of distrubtion) >Normal distribution is asymptotic in the sense that the tails get closer and closer to the horizontal axis but never touch it. Can assume any value between -infinity to +infinity EX: heights/weights of infants; SAT scores; Cumulative debt of grads; Advertising expenditures of firms; Rate of return on investment ***The standard normal distribution, also called the z distribution, is special case of normal distribution, with mean=0 and standard deviation=1
What does the Sharpe Ratio measure?
measures extra reward per unit of risk. The higher the Sharpe ratio, the better the investment compensates its investors for risk.**investors are advised to pick investments that have high Sharpe ratios. sharpe ratios are computed in terms of sample mean and sample variance, where return is usually expressed as % .
Summarizing Qualitative Data:
nominal and ordinal data are types of qualitative data. In order to bet organize qualitative data, it is useful to construct a frequency distribution.
All data measurements can be classified into 4 major categories:
nominal scale ordinal scale interval scale ratio scale
Is there any bias if information from the sample is typical of information in the population?
non-response bias: systematic difference in preferences between respondents to a survey or poll.
What is a discrete random variable?
see page 168. assumes a countable number of distinct values, whereas a continuous random variable is characterized by unaccountable values in an interval