MSIS Exam 2
Classical definition of probability
(if the process that generated the outcome is known) Probailities can be deduced from theoretical arguments
Properties of Probability Density Functions (5)
1) A graph of the density function must lie at or above the x-axis. 2) The total aea under the density function above the x-axis is 1. 3) For continuous randcome variables, there are an infinite number of values. 4) Calculates the probability of a random variable lying within a certain interval such as between two number, or to the left or right of a number. 5) P is the area under the density funciton between a and b.
Mathematical functions used in predictive analytic models (5)
1) Linear functions 2) Logarithmic functions 3) Polynomial functions 4) Exponential functions 5) Power functions
2 categories of regression analysis
1) Regression models of cross-selectional data 2) Regression models of time-series data
Properties of the normal distribution (4)
1) Symmetric - distribution, so its measure of skewnessis zero. 2) Mean = Median = Model; half the area falls above the mean and half falls below it. 3) The range of X is unbounded - the tails of the distribution extend to negative and positive infinity. 4) Empirical rules apply - exactly for the normal distribution (68-96-99.7 rule).
What is the probability that a respondent is femail and prefers Science? 1; Female; Science 2; Male; Science 3; Male; Math 4; Female; Arts 5; Female; Math 6; Male; Science 7; Female; Science 8; Male; Math 9; Female; Arts 10; Male; Arts 11; Male; Science 12; Female; Science 13; Female; Math
3/13
Normal distribution in Excel
=NORM.DIST (x,mean,standard_deviation,TRUE)
Excel function when cumulative probability is known, but the value of X isn't
=NORM.INV(probability,mean,standard_deviation) provides the x value for a given cumulative probability
Probability distribution
A charazterization of the possible values that a random variable may assume along with the probability of assuming these values may be developed using any of the three persepctives of probability: classical, relative frequency, and subjective
Event
A collection of one or more outcomes from a sample space
Random variable
A numerical description of the outcome of an experiment; may be continuous or discrete;
Simple regression
A regression model that involves a single independent variable
Multiple regression
A regression model that involves two or more independent variables
Outcome
A result that we observe in an experiment
Polynomial function (used in predicitive analytic models)
A second-order polynomial is parabolic in nature and has only one hill or valley; A third-order polynomial has one of two hills or valleys; Revenue models that incorporate price elasticity are often polynomial functions
What is a stream of historical data known as?
A time series
Regression analysis
A tool for building mathematical and statistical models that characterize relationship between a dependent variable (ratio) and one or more independent, or explanatory, variables (ratio or categorical), all of which are numerical. Two categories - regression models of cross-selectional data & regression models of time-series data
Relative frequency definiton of probability
Based on empirical data (the probability that an outcome will occur is simply the relative frequency associated with that outcome)
Subjective definition of probability
Based on judgement and experience; Often done in creating decision models for phenomena for which we have no historical data (Ex - sports experts might predict at the start of the football season - what is the probability of a specific team winning the national championship?)
How is the normal distribution characterized?
By two parameters: the mean and the standard deviation
Empirical Probability Distribution
Calculating the relative frequencies from a sample of empirical data to develop a distribution, based on sample data; an approximation of the probability distribution of the associated random variable, whereas the probability distribution of a random variable, such as one derived from counting arguments, is a theoretical model of the random variable
Which of the following is true when using the Excel Regression tool?
Checking the option Constant is Zero forces the intercept to zero
What are the three persepective of defining probability?
Classical definition Relative frequency definition Subjective definition
In multiple regression, R Square is referred to as the....
Coefficient of multiple determination
Random variables may be ________ or ________
Continuous; discrete
Expected value of a random variable
Corresponds to the notion of the mean (average) for a sample; can be helpful in making a variety of decisions
Power functions (used in predicitive analytic models)
Define phenomena that increase at a specific rate; Formula: y = ax^b; Learning curves has express improving times in performing a task and are often modeled with power functions having a > 0 and b < 0
Triangular distribution
Defined by 3 parameters: the minimum, a; the maximum, b; and the most likely, c; Often used when no data are available to characterize an uncertain variable and the distribution must be estimated judgmentally a, then c, then b along x-axis positively skewed, point towards left; negatively, towards right
In Regression Analysis, ___________ variable should be ratio and ___________ variable can be ratio or categorical
Dependent; independent
The _______ rules apply exactly for the normal distribution.
Empirical 68-96-99.7 rule
For a random variable X, the ___________ ___________ is the weighted average of all possible outcomes, where the weights are the probabilities
Expected value
The _______ of a random variable corresponds to the notion of the mean, or average, of a sample.
Expected value
Cumulative distribution function
F(x), specifies the probability that the random variable X assumes a value less than or equal to the specified value x: F(x) = P(X<=x)
In the normal distribution, _______ of the area falls _______ the _______, and _______ falls _______ it.
Half; above; mean; half; below
Continuous random variable Examples?
Has outcomes over one or more continuous intervals or real numbers Examples: weekly change in DJIA daily temperature time between machine failures
Exponential functions (used in predicitive analytic models)
Have the property that Y rises or falls at constantly increases rates
The ________ the variance, the ________ the uncertainry of the outcome
Higher; higher
Rule 5
If an event A is comprised of the outcomes {A1, A2,.....An} and event B is comprised of the outcomes {B1, B2,....Bn} then P(Ai) = P(Ai and B1) + P(Ai and B2) +.....+ P(Ai and Bn) P(Bi) = P(A1 and Bi) + P(A2 and Bi) +.....+ P(An and Bi)
Rule 3 Example - rolling two dice, A = {7,11}, B = {2, 3, 12}, P(A or B) = ?
If events A and B are mutually exclusive (meaning they have no outcomes in common), then P(A or B) = P(A) + P(B) EX - P(A) = 8/36 P(B) = 4/36 P(A or B) = 8/36 + 4/36 = 12/36
Which of the following is true about the classical definition of probability?
If the process that generates the outcomes is known, probabilities can be deduced from theoretical arguments
Rule 4 Example - rolling two dice, A = {2, 3, 12}, B = {even number}, P(A or B) = ?
If two events are A and B are not mutuall exclusive, then P(A or B) = P(A) + P(B) - P(A and B) EX - P(A) = 4/36 P(B) = 18/36 P(A and B) = 2/36 P(A or B) = 4/36 + 18/36 - 2/36 = 20/36
Permutations definition Fomula
If we want to select n objects from N and the order is important, the outcomes are permutations; The number of permutations of n objects selected from N is: P(n,N) = N! / (N-n)!
Which of the following is true about variance?
In measure the uncertainty of a random variable
Time-series models may exhibit seasonal effects or cyclical effects. A seasonal effect differs from a cyclical effect in that a seasonal effect....
Is one that repeats at fixed intervals over time, typically a year, month, week, or day
Which of the following is true about probability density functions?
It calculates the probability of a random variable lying within a certain interval
In forecasting, what is an index?
It is a single measure that weights multiple indicators and provides a measure of overall expectation
Which of the following is true of linear functions used in predictive analytical models?
It is used when there is a steady decrease or increase over a range of a variable
Using a cross-tabulation like this, Row Labels; Brand 1; Brand 2; Brand 3; Grand Total- Female; 0.09; 0.06; 0.22; 0.37- Male; 0.25; 0.17; 0.21; 0.63- Grand Total; 0.34; 0.23; 0.43; 1- How do you find joint probabilities? (if probability that female and prefers brand 1) How do you find marginal probabilities? (if probabilty female) How do you find conditional probability? (knowing a respondent is male, what's the probability they prefer brand 1) How do you identify that two events are independent? (are gender and brand preference independent)
Joint probability = 0.09 Marginal probability = 0.09 + 0.06 + 0.22 = 0.37 calculated by adding the joint proabilities across the rows and columns Conditional probability = .25/.63 of 63 males, 25 prefer brand 1 Determining independence - P(B1) = 0.34 P(B1 I Male) = 0.397 = 0.25/0.63 Because 0.397 ≠ 0.34 - gender and brand prefernce ARE NOT INDEPENDENT
Expected value
Long run average and is appropriate for decisions that occur on a repeated basis For one-time decisions, need to consider the downside risk and the upside potential of the decision
Which of the following is true of normal distributions?
Mean, median, and mode are all equal
Normal distribution
Most important distribution used in statistics; Continuous distribution described by the bell shaped curve
A linear regression model with more than one independent variable is called a.....
Multiple linear regression
Standard Normal Distribution
Normal distribution with mean = 0 and standard deviation =1; A standard normal random variable is denoted by Z; The scale along the x-axis represents the number of standard deviations from the mean of zero; Excel function =NORM.S.DIST(z) finds probabilities for the standard normal distribution
Before launching a new line of toys, Toys inc used the method of historical analogy to obtain a forecast. In this scenario, Toys inc.....
Noted the consumer response to similar previous products to market campaigns and used the responses as a basis to predict how the new marketing campaign might fare.
Which of the following is a continuous random variable?
Number of students in class
A(n) _________ is an extreme value that is different from the rest of the data.
Outlier
In regression analysis, which of the following is used to determine whether an independent variable is significant?
P-value
Excel's trendline tool
Provied a convenient method for providing the best fitting functional relationship among these alternatives for a set of data; R-squared is a measure of the "fit" of the line to the data; Will have a value between 0 and 1; The larger the value of R-squared, the better the fit
In Excel's trendline tool, the value of the ______ gives the measure of fit of the line to the data.
R-squared
EX - Classical definition of probability Rolling 2 dice, probability of rolling a 3.
Roll 2 dice - 36 possible outcomes; probability = number of wat of rolling a number divided by 36; ex - probability of rolling a 3 is 2/36 = 1/18
In modeling relationships and trends, which one is used for cross-sectional data?
Scatter chart
Linear functions (used in predicitive analytic models)
Show steady increase of decrease over the range of X; simplest type of function used in predictive models
A regression model that involves a single independent variable is called...
Simple regression
The normal distribution is _________. Its measure of skewness is _________.
Symmetric; zero
Interaction is...
The dependence between two variables
Probability density functions
The distribution that characterized the outcome of a continous random variable
Which of the following is true about the observed errors associated with estimating the value of the dependent variable using the regression line?
The errors can be negative or positive
Regression models of time-series data
The independent variables are time or some function of time, and the focus is on predicting the future
Probability defintion
The likelihood that an outcome occurs; expresed as values between 0 and 1
In the normal distribution, what is equal?
The mean, median, and mode - half of the area falls above the mean and half falls below it.
Discrete random variable Examples?
The number of outcomes can be counted Examples: outcome of dice rolls whether a customer likes or dislikes a product number of hits on a Web site link today
Marginal probability
The probability of an event, irrespective of the outcome of the other joint event
Rule 1 Example - rolling a 7 or 11 on two dice, probability = ?
The probability of any event is the sum of the probabilities of the outcomes that comprise that event EX - probability = 6/36 + 2/36 = 8/36
Conditional probability
The probability of occurence of one event A, given that another event B is known to be true or has already occurred P(A I B) = P(A and B) / P(B) "probability of A given B"
Rule 2 Example - rolling two dice A = {7,11}, complement = ?
The probability of the complement of any event A is P(A^c) = 1-P(A) EX - P(A^c) = 1 - 8/36 = 28/36
Joint probability
The probability of the intersection of two events
Experiment
The process that results in an outcome
What does it mean if variables are mutually exclusive?
They have no outcomes in common
Regression models of _________ data focus on predicting the future
Time series
Independent events
Two events are indepent is P (A I B) = P(A) Example: the probability of preferring a brand depends on gender, we may say that brand preference and gender are not independent
In a normal distribution, the range of X is _______. What does that mean?
Unbounded; the tails of the distribution extend to negative and positive infinity
EX - Relative frequency definiton of probability
Use relative frequencies as probabilities; probability a computer is repaired in 10 days = 0.076, using a graph (of days, frequency, relative frequency, and cumulative percentage)
Weighted average of the squared deviations from the expected value Used to compute..... Common measure of _______.
Used to compute the variance of a discrete random variable; Common measure of dispersion
Logarithmic functions (used in predicitive analytic models)
Used when the rate of change in a variable increase or decreases quickly, then levels out; such as diminishing returns to scale
The Delphi method used for forecasting....
Uses a panel of experts, whose identities are usually kept confidential from one another, to respond to a sequence of questionnaires.
How is a continous random variable characterized?
Using probability density functions
How is a discrete random variable computed?
Using the weighted average of the squared deviations from the expected value
Which of the following is the weighted average of the squared deviations from the expected value?
Variance
Combinations definition Formula
When order does not matter and we only want to count unique outcomes, which are called combinations; The number of combinations for selecting n objects from a set of N is: C(n,N) = N! / n!(N-n)!
May convert probabilities for any normal random variable X, having a mean and std dev, by convertuing it to a standard normal random variable Z Formula?
Z = (X - mean) / std dev
Which of the following is true about variance? a. It measures the uncertainty of a random variable. b. Higher variance implies low uncertainty. c. It is the square root of a random variable's standard deviation. d. It is the weighted average of all possible outcomes.
a. It measures the uncertainty of a random variable.
Regression models of ________ data focus on predicting the future. a. missing b. time-series c. panel d. cross-sectional
b. time-series
Empirical data
based on factual statements and statistics
A probability density function: a. is the probability distribution of discrete outcomes. b. suggests that the probability that a random variable assumes a specific value must be positive. c. characterizes outcomes of a continuous random variable. d. can yield negative values depending on the values of the random variable, X.
c. characterizes outcomes of a continuous random variable.
Before launching a new line of toys, Toys Inc. used the method of historical analogy to obtain a forecast. In this scenario, Toys Inc.: a. noted the behavior of its current customers while they use their products. b. used a panel of experts, whose identities were kept confidential from one another, to respond to a sequence of questionnaires. c. noted the consumer response to similar previous products to marketing campaigns and used the responses as a basis to predict how the new marketing campaign might fare. d. used a brainstorming session among a group of experts to draw new ideas.
d. used a brainstorming session among a group of experts to draw new ideas.
The Delphi method used for forecasting: a. obtains forecasts through a comparative analysis with a previous situation. b. uses measures that are believed to influence the behavior of a variable that the researcher wishes to forecast. c. uses a single measure that weights multiple indicators and provides a measure of overall expectation. d. uses a panel of experts, whose identities are typically kept confidential from one another, to respond to a sequence of questionnaires.
d. uses a panel of experts, whose identities are typically kept confidential from one another, to respond to a sequence of questionnaires.
The variance of a discrete random variable X is.....
is a weighted average of the squared deviations from the expected value
Sample space
the collection of all possible outcomes of an experiment
Probability mass function
the probability distribution of the discrete outcomes, for a discrete random variable: f(x) Probability of each outcome but be between 0 and 1 Sum of all probabilities must add to 1
Probability is expressed as
values between 0 and 1