Business Stat Exam 1 Definitions
Nominal scale
- least sophisticated level of measurement - data are simply categories for grouping the data
Interval Scale
-categorize and rank data -differences between values are meaningful -no absolute zero or starting point defined -meaningful ratios may not be obtained -ex: Fahrenheit temperature
Ordinal Scale
-data may be categorized and ranked with respect to some characteristics or trait -(excellent, good, fair, poor) -differences between categories are meaningless because actual numbers are arbitrary
Ratio Scale
-strongest level of measurement -can be categorized and ranked -differences in values are meaningful -absolute zero -ex: sales, weight, time, distance
Ratio scale; discrete
A discrete variable takes on individually distinct values. The ratio scale has a meaningful zero point and we can interpret ratios of values. In this case, the linebacker would have no tackles.
covariance
A positive value of covariance indicates a positive linear relationship between x and y; on average, if x is above (below) its mean, then y tends to be above (below) its mean, and vice versa. A negative value of covariance indicates a negative linear relationship between x and y; on average, if x is above (below) its mean, then y tends to be below (above) its mean, and vice versa.
When interpreting the covariance between variables x and y, which of the following statements is the most accurate?
A positive value of covariance indicates that, on average, if x is above its mean, then y tends to be above its mean.
Event
A subset of a sample space. They are exhaustive and mutually exclusive.
Risk Loving
Accept risky prospect even if expected gain is negative
Which of the following statements is most accurate when defining percentiles?
Approximately p% of the observations are less than the pth percentile, and approximately (100 - p)% of the observations are greater than the pthpercentile.
Which of the following represents a population and a sample from that population?
Attendees at a sporting event, and those who purchased popcorn at said sporting event: Those individuals who purchase popcorn at said sporting event are clearly a subset of all attendees at a given sporting event.
Which of the following is not a graphical technique to display quantitative data?
Bar Chart
Two defining properties of a probability
Between 0 and 1 sum of probabilities of any list of mutually exclusive and exhaustive will be equal to 1
What is an advantage of the correlation coefficient over the covariance?
Both answers-that it falls between -1 and 1 and that it is a unit-free measure-are correct:The correlation coefficient is preferred in evaluating the direction and strength of the linear relationship between two variables. It is a unit-free measure, assuming the values from the interval [-1, 1].
Sampling is used heavily in manufacturing and service settings to ensure high-quality products. In which of the following areas would sampling be inappropriate?
Custom cabinet making: Custom cabinets are not meant to be standardized in their characteristics. Therefore, sampling would make no sense.
Sample Space
Denoted S, of an experiment includes all possible outcomes of the experiment
Events are considered _________ if the occurrence of one is related to the probability of the occurrence of the other.
Dependent: We generally test for the independence of two events by comparing the conditional probability of one event P(A|B)P(A|B) , to its unconditional probability P(A). If they are the same, we say that the two events, A and B, are independent.
Unstructured Data
Does not conform to a pre-defined column format: reports, emails, multimedia
Objective probabilities
Empirical probability and classical probability
T or F: Cross-sectional data contain values of a characteristic of one subject collected over time.
False: Cross-sectional data contain values of a characteristic of many subjects at the same point or approximately the same point in time, or without regards to differences in time.
T or F: Geometric mean is greater than the arithmetic mean.
False: Geometric mean is smaller than the arithmetic mean and is less sensitive to outliers.
T or F: Population parameters are used to estimate corresponding sample statistics.
False: Sample statistics are used to estimate the corresponding population parameter.
T or F: The total probability rule is defined as P(A) = P(A ∩B) P(A ∩Bc )
False: The total probability rule is defined as P(A) = P(A ∩B) + P(A ∩ Bc ).
Which of the following is true when using the empirical rule for a set of sample data?
For a set of sample data, the empirical rule states that approximately 68% of all observations are in the interval x−+−s, approximately 95% of all observations are in the interval x−+−2s, and almost all observations are in the interval x−+−3s.
What graphical tool is best used to display the relative frequency of grouped quantitative data?
Histogram: Histograms are used to display the relative frequency of quantitative data. An ogive is used to display the cumulative frequency, while the bar chart and pie chart display qualitative data.
The Fahrenheit scale for measuring temperature would be classified as a(n)
Interval Scale: Zero in Fahrenheit degrees does not mean "no temperature." We cannot say, for example, that today is twice as warm as six months ago, which characterizes the ratio scale.
Which scales of data measurement are associated with quantitative data?
Interval and ratio: Two scales are associated with quantitative data: interval scale and ratio scale.
What graphical tool would you use to display the cumulative relative frequency of the grouped data?
Ogive
An undergraduate student's status (freshman, sophomore, junior, or senior) is an example of which scale of measurement?
Ordinal scale: Undergraduate students are classified into the four categories based on the number of credit hours earned. There is a natural ordering between the four categories; sophomores have more credit hours than freshmen, and so on.
For which of the following data sets will a pie chart be most useful?
Percentage of net sales by product for Lenovo in Year 1: Only percentage of net sales by product for Lenovo in Year 1 looks at multiple categories of a single qualitative variable, in which the percentage of net sales by product may be meaningfully displayed.
The accompanying chart shows the number of books written by each author in a collection of cookbooks. What type of data is being represented?
Qualitative, nominal: The data are qualitative and nominal (no ordering is present in the categories).
Which of the following is an example of time series data?
Quarterly housing starts collected over the last 60 years: Time series data refers to data collected by recording a characteristic of a subject over several time periods.
Which of the following scales represents the strongest level of measurement?
Ratio Scale
San Francisco 49ers' linebacker Patrick Willis won the Defensive Rookie of the Year Award in 2007 with a total of 174 tackles. Tackles are measured on what kind of a scale? Is a variable measuring the number of tackles considered continuous or discrete?
Ratio scale; discrete
Which of the following is an example of cross-sectional data?
Results of market research testing consumer preferences for soda: Cross-sectional data refers to data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.
A stem-and-leaf diagram is constructed by separating each value of a data set into two parts. What are these parts?
Stem consisting of the leftmost digits and the leaf consists of the remaining digits.
Bar chart for qualitative data
The data are qualitative and the chart is a bar chart.
When using a polygon to graph quantitative data, what does each point represent?
The midpoint of a particular class and its associated frequency or relative frequency
Which of the following variables is not continuous?
The number of obtained heads when a fair coin is tossed 20 times Although in practice the exact values of such variables as height, time, and temperature are approximated, they are continuous in nature. If a fair coin is tossed 20 times, the possible numbers of obtained heads are 0, 1, 2, ..., 20.
Which of the following are examples of cross-sectional data?
The sales prices of single-family homes sold last month in California The current average prices of regular gasoline in different states The test scores of students in a class Cross-sectional data refers to data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.
Which of the following can be represented by a continuous random variable?
The time of a flight between Chicago and New York: A discrete random variable assumes a countable number of possible values, whereas a continuous random variable is characterized by uncountable values.
Why do we sample?
Too expensive and too hard to collect data of the population
Consider these events. T or F A = The survey respondent is less than 40 years old. B = The survey respondent is 40 years or older. Events A and B are mutually exclusive and exhaustive.
True: Events are mutually exclusive if they do not share any common outcome of a random experiment. Events are exhaustive if all possible outcomes of a random experiment are included in the event.
T or F: Permutations are used when the order in which different objects are arranged matters.
True: If the order in which objects are arranged matters, we should use permutations.
T or F: Structured data tends to include numbers, dates, and groups of words and numbers called strings.
True: Structure data generally refers to data that has a well-defined length and format. This type of data is not open to interpretation.
T or F: The coefficient of variation is a unit-free measure of dispersion.
True: The coefficient of variation is computed as CV=s/x and is a relative measure of dispersion.
T or F: Two events A and B are independent if the probability of one does not influence the probability of the other.
True: Two independent events are defined as independent if the conditional probability and the simple event are the same value. Using formulas, P(A) = P(A | B)
T or F: Ordinal scale reflects a stronger level of measurement than the nominal scale.
True: With ordinal data we are able both to categorize and rank the data with respect to some characteristic.
Scatterplot
When looking at the plotted points, the variables have a positive relationship (y tends to increase as x increases), and the relationship appears linear or slightly curvilinear.
Sample
a subset of the population
probability distribution
all of the possible outcomes are included in every random variable
The complement of an event A, within the sample space S, is the event consisting of ____________.
all outcomes in S that are not in A: The complement of event A, Ac, is the event consisting of all outcomes in the sample space S that are not in A.
Chebyshev's theorem is applicable when the data are___________________.
any shape: There are no restrictions on the shape of a data distribution when using Chebyshev's theorem.
Subjective probability
assigned on personal judgement
Random Variable
assigns numerical value to the outcomes of an experiment
We need statistics in order to:
avoid making uninformed decisions and costly mistakes make sound statistical conclusions vs. questionable choices
Classical probability
based on logical analysis rather than on observation or personal judgement
Population parameters are difficult to calculate due to
both cost prohibitions on data collection and the infeasibility of collecting data on the entire population.
Sample Statistics
calculated from the sample data and is used to make inferences about the unknown population parameter
Hypergeometric probability distribution
cannot assume trials are independent: use when sampling without replacement from a population whose size N is not significantly larger than the sample size n (population not much bigger than sample)
When constructing a frequency distribution for quantitative data, it is important to remember that _____________.
classes should be mutually exclusive, exhaustive, and the total number of classes should be between 5 and 20.
Descriptive Statistics
collecting, organizing and presenting data
Population
consists of all items of interest
probability density
continuous random variables
A(n) ____________ variable is characterized by infinitely uncountable values and can take any value within interval.
continuous: A continuous variable can take on any value within an interval, while a discrete variable assumes a countable number of distinct values.
Cross sectional data
data collected by recording a characteristic of many subjects at some (one) point in time
Time series data
data collected over several time periods
positively skewed data
data forms a long, narrow tail to the right
Binomial random variable
defined as number of successes achieved in n trials of Bernoulli process
Risk averse
demand positive expected gain from taking risk
Scatterplot
depicts relationship between x and y
Correlation Coefficient
describes both direction and strength of relationship
cumulative distribution function
describes either continuous or discrete random variables
The two branches of the study of statistics are generally referred to as
descriptive and inferential statistics.
Your business statistics class had a test last week. The average score for the class is an example of
descriptive statistics: Descriptive statistics refers to summarizing a set of data.
Inferential Statistics
drawing conclusions about a population based on sample data: more important, more useful, harder to analyze
Frequency Distribution
for qualitative data and groups categories and records how many observations fall into each category
negatively skewed data
forms a long, narrow tail to the left
In order to summarize qualitative data, a useful tool is a __________.
frequency distribution
Variable
general characteristic being observed on objects of interest
Risk neutral
ignore risk, always accept a prospect that offers positive gain
Bernoulli Process
independent and identical trials that each trial there are only two possible outcomes: success or failure. Probability of success and failure is the same
Big Data
massive volume of data that is difficult to manage, process and analyze using traditional tools
median
measure of central location that is not affected by outliers
Nominal and Ordinal Scales
measure qualitative data
Interval and Ratio
measure quantitative data
Statistics
methodology of extracting useful information from a data set
Geometric mean
multiplicative average that incorporates compounding
Possible responses to the survey question were: "Yes," "No," or "Don't Know." This data is best classified as
nominal scale: With nominal data all we can do is categorize or group the data.
Probability
numerical value that measures the likelihood that an uncertain event occurs between 0 and 1. 0 indicates impossible; 1 indicates definite event
A sample statistic is an estimate of
population parameter: Population parameter is estimated by sample statistic.
Arithmetic mean
primary measure of central location that is often referred to as average: additive average and ignored effects of compounding
experiment
process that leads to one of the several possible outcomes
Histograms and stem-and-leaf diagrams describe
quantitative data
Continuous
random variable assumes a uncountable values in an interval
Discrete
random variables assumes a countable number of distinct values ex: credit cards
Empirical probability
relative frequency of occurrence
weighted mean
relevant when some observations contribute more than others
cumulative frequency distribution
shows the number of items with values less than or equal to the upper limit of each class
valid probability
sum of all probabilities equals 1
Covariance
tells directions of linear relationship positive or negative
probability mass function
used to describe discrete random variables
When a characteristic of interest differs among various observations, then it can be termed a
variable: A variable is the general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree.
Structured Data
well defined length and format: numbers, dates, strings of words
The empirical rule can be used to estimate some proportions_________________.
when it is approximately symmetric and bell-shaped: The empirical rule is only applicable for approximately symmetric and bell-shaped data sets.