Chapter 1 - 5

Ace your homework & exams now with Quizwiz!

A '____________________' random variable assumes a countable number of distinct values such as x1, x2, x3, and so on

discrete

A random variable summarizes the results of an experiment in terms of numerical values and can be classified as '________________' or '___________________' depending on the range of values that it assumes.

'discrete' or 'continuous'

The z value associated with a probability of .5040 is '_____________'

.01

Recall that the population proportion p is the essential descriptive measure for a qualitative variable, and that it is estimated on the basis of its sample counterpart. What is its sample counterpart? null hypothesis p-value The sample proportion ̄P Alternative hypothesis

The sample proportion ̄P

Extremely large or small observations for a variable are referred to as '______________'.

outliers

If sample evidence is inconsistent with the null hypothesis, we '____________' the null hypothesis.

reject

Many experiments fit the conditions of a Bernoulli process. Which of the following fit the conditions of a Bernoulli process? Choose all that apply! Multiple select question. A drug is either effective or ineffective A customer defaults or does not default on a loan A college graduate gets a job, quits school or applies to graduate school A college graduate applies or does not apply to graduate school

A drug is either effective or ineffective A customer defaults or does not default on a loan A college graduate applies or does not apply to graduate school

The degrees of freedom determine the extent of the broadness of the tails of the distribution; If there are fewer degrees of freedom, the tail of the distribution is more: Narrow Broad Furry Complicated

Broad

A discrete random variable is characterized by uncountable values, whereas a continuous random variable assumes a countable number of distinct values. True False

False

The joint probability of events A and B is derived as P(A ∩ B) = P(A ∣ B)P(A). True False

False

True or false: XML and HTML markup tags both conform to standards maintained by the World Wide WEb Consortium. True False

False

What is the term used in a confidence interval that accounts for the standard error of the estimator and the desired confidence level of the interval? Estimate error Point estimate Sample proportion Margin of error

Margin of error

When determining the value tα.df, we need which two pieces of information: Degrees of freedom or alpha and sample size Sample size or degrees of freedom and Alpha Alpha or degrees of freedom or sample size sample size or alpha and degrees of freedom

Sample size or degrees of freedom and Alpha

True or false: A relational database consists of one or more logically related data files, where each data file is a two-dimensional grid that consists of rows and columns. True False

True

Finally, another common transformation of categorical variables is to create category '________'

scores

An event is any subset of outcomes of the experiment. It is called a '___________________' event if it contains a single outcome.

simple

Random variables can also be defined in terms of their cumulative distribution function, or, equivalently, P(X ? x). What is the correct mathematical sign (instead of the ?) in the P(X ? x) for the cumulative distribution function? >(greater than) ≤ (less than or equal) = (equal) < (less than)

≤ (less than or equal)

Data '________________' is a process that an organization uses to acquire, organize, store, manipulate, and distribute data

management

In addition to binning, another common approach is to create new variables through '________________' transformations of existing variables.

mathematical

For making statistical inferences, it is essential that the sampling distribution of ̄X is '____________' distributed.

normally

The 25th percentile is also referred to as the '_______________________' quartile, the 50th percentile is referred to as the '_________________________' quartile, and the 75th percentile is referred to as the '__________________' quartile.

'first' and 'second' and 'third'

For the 99% confidence interval, what is α/2? .10 .005 .015 .05

.005

For the binomial distribution, px(1 − p)n − x, represents the probability of any particular sequence with x successes and n − x failures. Use this formula to answer the following: In the Southern area of the United States, approximately 20% of adults have a college degree. We randomly ask four adults whether they have a college degree. What is the probability that none of the adults have a college degree? .4096 .512 .16 zero

.4096

For z =.11, what is the corresponding probability? .4562 .5438 .5398 None of the answers are correct

.5438

Calculate the Mean Absolute Deviation for the following data: We have observed the age of 3 individuals in a study, where the mean age is 40. The observed ages were 31, 40, and 49. What is the MAD? Not enough data 6 9 -6

6

'_________________________' theorem uses the total probability rule to update the probability of an event that has been affected by a new piece of evidence

Bayes'

Examples of categorical variables include: (Select all that apply!) Multiple select question. Weight Course Grade Marital Status Income per year

Course Grade Marital Status

Which term describes a compilation of facts, figures or other contents, both numerical and nonnumerical? Knowledge Data Information

Data

What type of variable assumes a countable number of distinct values such as x1, x2, x3, and so on? Random Discrete Continuous Normally distributed

Discrete

We are conducting a hypothesis test using α = 0.05. H0:Do not build brick-and-mortar store. HA:Build brick-and-mortar store. We determine that the p-value is .20. What is our decision? Reject the null hypothesis Re-evaluate the alpha Do not reject the null hypothesis Collect more data

Do not reject the null hypothesis

The sum of the probabilities of any list of mutually exclusive and exhaustive events do not always equal 1. True False

False

Which of the following is true of the correlation coefficient? Select all that are true! Multiple select question. If the correlation coefficient equals −1, then x and y have a perfect negative linear relationship If the correlation coefficient equals 0, then x and y are not linearly related The value of the correlation coefficient falls between zero and 1 The correlation coefficient is unit free

If the correlation coefficient equals −1, then x and y have a perfect negative linear relationship If the correlation coefficient equals 0, then x and y are not linearly related The correlation coefficient is unit free

Gender is an example of which measurement scale? Ordinal Interval Nominal Ratio

Nominal

When constructing a box plot, the first step is to use a five-number summary. What does the five-number summary contain? Select all that apply Multiple select question. Maximum value Q1, Q2, Q3, Q4 Q1, Q2, Q3 Minimum value

Q1, Q2, Q3 Maximum value Minimum value

The term '_____________' location relates to the way numerical data tend to cluster around some middle or central value.

central

We cannot describe the possible values of a '________________' random variable X with a list x1, x2,... because the value (x1 + x2)/2, not in the list, might also be possible

continuous

The '______________________' range is the difference between the third quartile and the first quartile.

interquartile

There are several measures of dispersion that gauge the variability of a data set. Select all of the measures below that are useful for measuring dispersion Select all that apply. Multiple select question. Use the mean mean absolute deviation interquartile range range

mean absolute deviation interquartile range range

The mean and the standard deviation of scores on an accounting exam are 74 and 8, respectively. The mean and the standard deviation of scores on a marketing exam are 78 and 10, respectively. Find the z-scores for a student who scores 90 in both classes. Select all that apply Multiple select question. z-score in the accounting class is z=(90-74)/8 =2 z-score in the marketing class is z=(90−78)/10=1.2 z-score in the accounting class is z=(90-74)/10 =1.6 z-score in the marketing class is z=(90−78)/8=1.5

z-score in the accounting class is z=(90-74)/8 =2 z-score in the marketing class is z=(90−78)/10=1.2

When examining the relationship between two categorical variables, a '__________' table proves very useful

contingency

In which data format does each column start and end at the same place in every row? Fixed Delimited Fixed-width

Fixed-width

Which of the measures of central location is defined as the observation that occurs most frequently? Mode Range Mean Median

Mode

True or false: The difference between cross-sectional and time series data is whether the data is evaluated at a single point in time or multiple points in time True False

True

True or false: Contingency tables and stacked column charts are two common tabular and graphical methods that help us summarize the relationship between two categorical variables. True False

True

True or false: Knowledge and information result from using DATA; True False

True

A '_________' plot shows the relationship between three numerical variables.

bubble

A '_____________' variable, also referred to as an indicator or a binary variable, is commonly used to describe two categories of a variable.

dummy

Data '_____________' is the process of defining the structure of a database.

modeling

Business analytics translates data into decisions to improve business performance through Multiple select question. qualitative reasoning descriptive statistics quantitative tools data mining

qualitative reasoning quantitative tools

A '____________' column chart is an advanced version of the column chart that we discussed. It is designed to visualize more than one categorical variable, plus it allows for the comparison of composition within each category.

stacked

A '__________________' column chart is an advanced version of the column chart that we discussed. It is designed to visualize more than one categorical variable, plus it allows for the comparison of composition within each category.

stacked

He offers an annual bonus of $10,000 for superior performance, $6,000 for good performance, $3,000 for fair performance, and $0 for poor performance. Based on prior records, he expects an employee to perform at superior, good, fair, and poor performance levels with probabilities 0.10, 0.20, 0.50, and 0.20, respectively. Calculate the expected value of the annual bonus amount $3,700 $4,000 $6,000 $4,200

$3,700

We generally test for the independence of two events by comparing the conditional probability of one event, for instance P(A∣B), to the probability, P(A). If these two probabilities are the '______________________', we say that the two events, A and B, are independent; if the probabilities differ, the two events are '________________________'.

'Same' and 'dependent'

When examining the relationship between two numerical variables, a scatter plot is a simple, yet useful, graphical tool. What does each point in a scatter plot represent? -Multiple paired observations such as (x1, x2), (y2, y3) -Two x-axis comparisons -A paired observation with one x-axis point and one y-axis point. (x1, y1) -An unpaired observation but two y-axis comparisons

-A paired observation with one x-axis point and one y-axis point. (x1, y1)

Which of the following is true of a data warehouse? Select all that apply Multiple select question. -One of its primary purposes is to support decision making -Data in a data warehouse are usually organized around subjects such as sales, customers, or products that are relevant to business decision making -It is a small-scale data warehouse or a subset of the enterprise data ware - house that focuses on one particular subject or decision area. -It can be designed to support the marketing department for analyzing customer behaviors , and it contains only the data relevant to such analyses

-One of its primary purposes is to support decision making -Data in a data warehouse are usually organized around subjects such as sales, customers, or products that are relevant to business decision making

Which of the following is NOT a correct statement about entity-relationship diagram (ERD) attributes? -The relationships between entities can only be one-to-many -A foreign key is the primary key of a related entity. -A primary key is an attribute that uniquely identifies each instance of an entity -An entity is a generalized category to represent persons, places, things, or events.

-The relationships between entities can only be one-to-many

Johnny feels that he has a 85% chance of getting an A in Marketing and a 45% chance of getting an A in Managerial Economics. He also believes he has a 35% chance of getting an A in both classes. What is the probability that he does not get an A in either of these courses? .15 .05 .25 .10

.05

An economist predicts a 70% chance that country A will perform poorly and a 35% chance that country B will perform poorly. There is also a 20% chance that both countries will perform poorly. What is the probability that country A performs poorly given that country B performs poorly? .20/.7 = 028 .35/.70=.50 None are correct .20/.35 =.57

.20/.35 =.57

Johnny feels that he has a 85% chance of getting an A in Marketing and a 45% chance of getting an A in Managerial Economics. He also believes he has a 35% chance of getting an A in both classes.What is the probability that he gets an A in at least one of these courses? Multiple choice question. .35 .45 .95 .85

.95

Suppose we want to find the value tα,df with α = 0.010 and df = 10; that is, t0.0.10,10. Using Table 5.2, The value X suggests that P(T10 ≥ x) = 0.010; what is X? 3.078 1.372 1.282 1.812

1.372

He offers an annual bonus of $10,000 for superior performance, $6,000 for good performance, $3,000 for fair performance, and $0 for poor performance. Based on prior records, he has an expected value of the annual bonus of $4,000. What is the total annual amount that Brad can expect to pay in bonuses if he has 10 employees? You do not have enough information $40,000 We need the probability of each performance level $4,000

40,000

Which of the following statements is true of the skewness coefficient? Select all that are true. Multiple select question. A negatively skewed distribution has a zero skewness coefficient A symmetric distribution has a skewness coefficient of zero The normal distribution has a skewness coefficient of 1 A positively skewed distribution has a positive skewness coefficient

A positively skewed distribution has a positive skewness coefficient A symmetric distribution has a skewness coefficient of zero

The sampling distribution of ̄P is closely related to which distribution? Normal Poisson Uniform Binomial

Binomial

Which of the following is an example of a Type II Error? Is a correct decision Can occur when the null hypothesis is true Occurs when we reject the null hypothesis Can occur when the null hypothesis is false

Can occur when the null hypothesis is false

A vertical bar chart is often referred to as which of the following? Column chart Horizontal chart Pie chart Line chart

Column chart

Often it is more in-formative to provide a range of values—an interval—rather than a single point estimate for the unknown population parameter. What two terms are used for this range of values called? Select all that apply Multiple select question. Confidence interval Population range Hypothesis test Interval estimate

Confidence interval Interval estimate

Are the following examples; the return on a mutual fund, time to completion of a task, or the volume of beer sold as 16 ounces, examples of continuous or discrete random variables? They can be discrete or continuous Discrete Neither Continuous

Continuous

A simple probability distribution for a continuous random variable is called the: Discrete uniform distribution Continuous uniform distribution Bell shaped distribution Normal distribution

Continuous uniform distribution

Which of the following is true of the covariance? Select all that are true! Multiple select question. If the covariance is negative, then x and y have a negative linear relationship. The covariance is sensitive to the units of measurement Covariance can be negative, positive, or zero We can comment on the strength of the relationships using the covariance

Covariance can be negative, positive, or zero If the covariance is negative, then x and y have a negative linear relationship. The covariance is sensitive to the units of measurement

The chefs at a local pizza chain in Cambria, California, strive to maintain the suggested size of their 16-inch pizzas. Despite their best efforts, they are unable to make every pizza exactly 16 inches in diameter. The manager has determined that the size of the pizzas is normally distributed with a mean of 16 inches and a standard deviation of 0.8 inch. What are the expected value and the standard error of the sample mean derived from a random sample of 2 pizzas? . E(X) = 12 and se(X⎯⎯⎯)=0.82√=0.57seX¯=0.82=0.57 E(X)=16EX=16 and se(X⎯⎯⎯)=0.82√=0.57seX¯=0.82=0.57 E(X⎯⎯⎯)=16EX¯=16 and se(X⎯⎯⎯)=0.84√=0.40

E(X)=16EX=16 and se(X⎯⎯⎯)=0.82√=0.57seX¯=0.82=0.57

In many instances, we calculate probabilities by referencing data based on the observed outcomes of an experiment. Which probability category is defined as the observed relative frequency with which an event occurs? Subjective probability Empirical probability Classical probabilities None are correct

Empirical probability

Which of the following statements is true regarding the kurtosis coefficient? Select all that are true. Multiple select question. Excess kurtosis is calculated as the kurtosis coefficient minus 3 A platykurtic distribution is one that has shorter tails A distribution that has tails that are more extreme than the normal distribution is leptokurtic The kurtosis coefficient of a normal distribution is zero

Excess kurtosis is calculated as the kurtosis coefficient minus 3 A distribution that has tails that are more extreme than the normal distribution is leptokurtic A platykurtic distribution is one that has shorter tails

What do we refer to events which include all outcomes in the sample space? Simple Inclusive Exhaustive Mutually exclusive

Exhaustive

What are some commonly used terms for the normal distribution? Bell-shaped distribution Poisson distribution Discrete distribution Gaussian distribution

Gaussian distribution Bell-shaped distribution

Two events are '_________________' if the occurrence of one event does not affect the probability of the occurrence of the other event.

Independent

Which of the following is true of the population proportion p? Select all that apply Multiple select question. Is used for quantitative variables of interest Is used for both quantitative and qualitative variables Is a descriptive measure for a qualitative variable It is estimated on the basis of its sample counterpart, the sample proportion ̄P

Is a descriptive measure for a qualitative variable It is estimated on the basis of its sample counterpart, the sample proportion ̄P

Which of the following is true regarding the graph depicting the normal probability density function f(x)? Select all that apply Multiple select question. Is not always symmetric around the mean Is symmetric around the mean Is often referred to as the normal curve Is often referred to as the bell curve

Is often referred to as the normal curve Is often referred to as the bell curve Is symmetric around the mean

Which of the following is true of the Central Limit Theorem? Select all the apply Multiple select question. A sample size of approximately 30 is recommended Works if population is larger than 30 Is useful to approximate Poisson distributions Is used to approximate normal distributions

Is used to approximate normal distributions A sample size of approximately 30 is recommended

The interquartile range (IQR) is the difference between the third quartile and the first quartile, or, equivalently, IQR = Q3 − Q1.Which of the following is true of the interquartile range? the average of the squared differences from the mean it is the difference between the maximum and the minimum observations of a variable Is an average of the absolute differences between the observations and the mean. It is the range of the middle 50% of the variable

It is the range of the middle 50% of the variable

What does MAD stand for, when used as a measure of dispersion? Mean Absolute Deviation Main Absolute Description Mean Absolute Data Middle Absolute Deviation

Mean Absolute Deviation

Which of the following is an example of machine-generated data? Mobile Phone conversations Information on price Social media data Meteorological data

Meteorological data

Which of the following summarizes the two correct decisions related to Type I and Type II errors? Select all that apply Multiple select question. Rejecting the null hypothesis when the null hypothesis is true Not rejecting the null hypothesis when the null hypothesis is false Rejecting the null hypothesis when the null hypothesis is false Not rejecting the null hypothesis when the null hypothesis is true

Not rejecting the null hypothesis when the null hypothesis is true Rejecting the null hypothesis when the null hypothesis is false

An effective strategy for dealing with these issues is category reduction, where we collapse some of the categories to create fewer nonoverlapping categories. The first guideline states that categories with very few observations may be combined to create the '__________' category

Other

Which of the following is a true statement regarding outliers in data analysis? (Choose all that apply) Multiple select question. Outliers may just be due to random variations Outliers may indicate bad data due to incorrectly recorded observations There are no universally agreed upon methods for treating outliers Outliers will not unduly affect the mean of a sample

Outliers may indicate bad data due to incorrectly recorded observations There are no universally agreed upon methods for treating outliers Outliers may just be due to random variations

A manager believes that 20% of consumers will respond positively to the firm's social media campaign. Also, 24% of those who respond positively will become loyal customers. Find the probability that the next recipient of their social media campaign will react positively and will become a loyal customer? P(R ∩ L) =P(R ∩ L) =P(L∣ R)/P(R)= 0.24/0.20 = 1.2 P(R ∩ L) =P(L∣R)P(R) = 0.24 × 0.20 =.048 P(R ∩ L) =P(R)/P(L∣R) =.20/.24 =.833 None are correct

P(R ∩ L) =P(L∣R)P(R) = 0.24 × 0.20 =.048

Scores on a management aptitude exam are normally distributed with a mean of 72 and a standard deviation of 8. If we are trying to find the probability that a randomly selected manager will score above 75, what is the corresponding Z value? P(Z>1.5) P(Z>-1.5) P(Z >-.375) P(Z >.375)

P(Z >.375)

There is only one population, but many possible samples of a given size can be drawn from the population. Which of the following is a constant, even though its value may be unknown? Sampling distribution Population Parameter Population Sampling statistic

Population Parameter

Which of the analytic techniques answers the question 'What could happen in the future'? Descriptive Prescriptive Predictive

Predictive

Which of the analytic techniques answers the question 'What should we do'? Prescriptive Descriptive Predictive

Prescriptive

The probability distribution of the sample mean ̄X, is also referred to as the: estimator population of X unbiased estimator Sampling distribution of ̄X

Sampling distribution of ̄X

Which of the following is a common graphical method that allows us to determine whether two numerical variables are related in some systematic way? Pie chart Scatter plot Stacked column chart Contingency table

Scatter plot

In a bubble plot, how is the third numerical variable represented? Using a different color or symbol Only two variables can be represented By rescaling the first two variables Size of the bubble

Size of the bubble

Which of the following are common measures of shape? Select all that apply Multiple select question. Range MAD or the Mean absolute deviation Skewness coefficient Kurtosis coefficient

Skewness coefficient Kurtosis coefficient

Another standardized statistic, which uses the estimator S in place of σ, is computed as T= ̄X−μ/S/√n. Which distribution does the random variable T follow? Uniform distribution Poisson distribution T distribution Normal distribution

T distribution

Scores on a management aptitude examination are normally distributed with a mean of 72 and a standard deviation of 8.We want to find the lowest score that will place a manager in the top 10% (90th percentile) of the distribution. Which of the following is true to solve this problem? Select all that apply Multiple select question. a score of 82.24 or higher will place a manager in the top 10% of the distribution The 90th percentile is a numerical value x such that P(X < x) = 0.90 We will use the inverse transformation x + μ = zσ to solve these problems. z = 1.28

The 90th percentile is a numerical value x such that P(X < x) = 0.90 z = 1.28 a score of 82.24 or higher will place a manager in the top 10% of the distribution

An experiment satisfies a Poisson process if (choose all that apply) Multiple select question. The probability of success in any interval is the same for all intervals of equal size The probability of success in any interval is proportional to the size of the interval The number of successes counted in nonoverlapping intervals are dependent The number of successes within a specified time or space interval equals any integer between zero and infinity

The number of successes within a specified time or space interval equals any integer between zero and infinity The probability of success in any interval is proportional to the size of the interval The probability of success in any interval is the same for all intervals of equal size

A Bernoulli process consists of a series of n independent and identical trials of an experiment such that on each trial: (Choose all that apply!) Multiple select question. The probabilities of success and failure remain the same from trial to trial There are more than two possible outcomes There are only two possible outcomes The probabilities of success and failure change from trial to trial

The probabilities of success and failure remain the same from trial to trial There are only two possible outcomes

Which of the following are the two defining properties of probability? Select all that appy Multiple select question. The subjective probability is based on an individual's personal judgment or experience The probability of any event A is a value between 0 and 1; that is, 0 ≤ P(A) ≤ 1. The sum of the probabilities of any list of mutually exclusive and exhaustive events equals 1. The empirical probability of an event is the observed relative frequency with which an event occurs

The probability of any event A is a value between 0 and 1; that is, 0 ≤ P(A) ≤ 1. The sum of the probabilities of any list of mutually exclusive and exhaustive events equals 1.

Which of the following are key properties of the discrete probability distribution? Select all that apply Multiple select question. The probabilities of success and failure remain the same from trial to trial The sum of the probabilities equals 1. In other words, ΣP(X = xi) = 1, where the sum extends over all values x of X The probability of each value x is a value between 0 and 1, or, equivalently, 0 ≤ P(X = x) ≤ 1 The number of successes within a specified time or space interval equals any integer between zero and infinity

The probability of each value x is a value between 0 and 1, or, equivalently, 0 ≤ P(X = x) ≤ 1 The sum of the probabilities equals 1. In other words, ΣP(X = xi) = 1, where the sum extends over all values x of X

An experiment satisfies a Poisson process if (choose all that apply) Multiple select question. The probability of success in any interval is proportional to the size of the interval The number of successes within a specified time or space interval equals any integer between zero and infinity The number of successes counted in nonoverlapping intervals are dependent The probability of success in any interval is the same for all intervals of equal size

The probability of success in any interval is the same for all intervals of equal size The number of successes within a specified time or space interval equals any integer between zero and infinity The probability of success in any interval is proportional to the size of the interval

Which of the following is true of the variance and standard deviation? Select all that apply Multiple select question. The variance is an average of the squared differences between the observations and the mean The difference between the third quartile and the first quartile The standard deviation is the positive square root of the variance. An average of the absolute differences between the observations and the mean.

The variance is an average of the squared differences between the observations and the mean The standard deviation is the positive square root of the variancen.

Which of the following is true of measures of association? Select all that are true. Multiple select question. These measures are not appropriate when the underlying relationship between the variables is nonlinear These measures quantify the direction and strength of the linear relationship between two variables, x and y. These measures reflect the typical or central value of a variable Measures the degree to which a distribution is not symmetric about its mean.

These measures are not appropriate when the underlying relationship between the variables is nonlinear These measures quantify the direction and strength of the linear relationship between two variables, x and y.

What is the probability theory rule that is a tool for breaking the computation of a probability into distinct cases? Total probability rule Bayes' Theorem Statistical analysis Conditional probability

Total probability rule

In most applications, we require some form of the equality sign in the null hypothesis True False

True

The expected value of the sample proportion P⎯⎯⎯P¯ is equal to the population proportion; that is, E(P⎯⎯⎯P¯) = p. True False

True

The probability that A occurs given that B has occurred is derived as P(A∣B)=P(A∩B)/P(B) True False

True

True or false: A weakness of 'ordinal data' is that we cannot interpret the difference between the ranked value; For example, if someone finishes first, second, or third in a foot race, there is not necessarily the same 'difference in time' between first place and second place, as there is between second place and third place. True False

True

True or false: z-score measures the relative location of an observation and indicates whether it is an outlier. True False

True

A measure of '______________' quantifies the direction and strength of the linear relationship between two variables, x and y.

association

The '__________________' coefficient describes both the direction and the strength of the linear relationship between x and y

correlation

An objective numerical measure that reveals the direction of the linear relationship between two variables is called the '_____________'.

covariance

We use hypothesis testing to resolve conflicts between two competing hypotheses on a particular population parameter of interest. Which of the following corresponds to the null hypothesis? Select all that apply Multiple select question. denoted H0 contradicts the default state or status quo denoted HA corresponding to a presumed default state of nature or status quo

denoted H0 corresponding to a presumed default state of nature or status quo

When a statistic is used to estimate a parameter, it is referred to as an '____________'.

estimator

The '_____________' coefficient is a summary measure that tells us whether the tails of the distribution are more or less extreme than the normal distribution.

kurtosis

For the binomial distribution, px(1 − p)n − x, represents the probability of any particular sequence with x successes and n − x failures. Use this formula to answer the following: In the Southern area of the United States, approximately 20% of adults have a college degree. We randomly ask four adults whether they have a college degree. Which of the following statements is true? Select all that apply Multiple select question. p =.8 n=4 Probability that all 4 adults have a college degree =.00032 Probability that one adult will have a college degree = 10.24%

n=4 Probability that one adult will have a college degree = 10.24%

What is the most widely used continuous probability distribution? the '__________________' distribution

normal

In customer satisfaction surveys, we often use ordinal scales, such as very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, and very satisfied, to indicate the level of satisfaction. In such cases, we can recode the categories using numbers 1 through 5, with 1 being very dissatisfied and 5 being very satisfied. This transformation allows the categorical variable to be treated as a '_______________________' variable in certain analytical models.

numerical

Because almost all observations fall within three standard deviations of the mean, it is common to treat an observation as an '________________' if its z-score is more than 3 or less than −3

outlier

A ___________________consists of all observations or items of interest in an analysis.

population

The formula for the variance differs depending on whether we have a sample or a '______________'.

population

On the basis of new information, we update the prior probability to arrive at a conditional probability called a '_______________________' probability.

posterior

The original probability is an unconditional probability called a '_________________' probability, in the sense that it reflects only what we know now before the arrival of any new information.

prior

A standard normal table, also referred to as the z-table, provides what information that is under the z curve? probabilities Mean and variance expected values standard values

probabilities

Because many choices we make involve some degree of uncertainty, we are better prepared for the eventual outcome if we can use '_____________________' to describe which events are likely and which are unlikely

probabilities

Match each probability concept with its definition: Instructions Probability - Experiment- Sample space- - contains all possible outcomes of the experiment. -a process that leads to one of several possible outcomes. -a numerical value that measures the likelihood that an event occurs.

probability - a numerical value that measures the likelihood that an event occurs. experiment -a process that leads to one of several possible outcomes. sample space - contains all possible outcomes of the experiment.

The '________________' is the simplest measure of dispersion; it is the difference between the maximum and the minimum observations of a variable.

range

The '_____________' coefficient measures the degree to which a distribution is not symmetric about its mean.

skewness

Which of the following defines a probability that is based on an individual's personal judgment or experience? Exhaustive probability classical probability subjective probability empirical probability

subjective probability

For a Poisson process, we define the number of '___________________' achieved in a specified time or space interval as a Poisson random variable.

successes

The basic principle of hypothesis testing is to first assume that the null hypothesis is '________________' and then determine if sample evidence contradicts this assumption.

true

An estimator is "____________________" if its expected value equals the population parameter of interest.

unbiased

The standard normal distribution is a special case of the normal distribution with a mean equal to '_________________'.

zero

True or false: A 'variable' is defined as when a characteristic of interest differs in kind or degree among various observations. True False

True

A heat map is an important visualization tool that uses '_________________' to display relationships between variables. (Please enter one word for one blank.)

color

'Each measure is a numerical value that equals zero if all observations are identical and increases as the observations become more diverse'. What measure does this describe? Dispersion Shape Mode Central Location

Dispersion

True or false: Raw data without knowledge of the business context is still useful True False

False

Which of the basic guidelines should you follow when constructing or interpreting charts or graphs? Choose all that apply! Multiple select question. -Give high values for upper limits on a graph -Axes should be clearly marked and labeled -The prettiest graph should be used for a given set of data -When creating a bar chart or a histogram, each bar/rectangle should be of the same width

-Axes should be clearly marked and labeled -When creating a bar chart or a histogram, each bar/rectangle should be of the same width

List the following steps in order they are performed to 'bin' customers into equal groups using R. Import the customer data into a data frame and label it myData With the Customers worksheet active, choose Data Mining > Transform > Transform Continuous Dada > Bin We use the cut function to bin the data. The breaks argument of the cut function specifies the ranges of the bins We now create 5 equal-sized bins for DaysSinceLastReverse (recency), NumOfOrders (frequency), and Spending2018 (monetary)

1. Import the customer data into a data frame and label it myData 2. We now create 5 equal-sized bins for DaysSinceLastReverse (recency), NumOfOrders (frequency), and Spending2018 (monetary) 3. With the Customers worksheet active, choose Data Mining > Transform > Transform Continuous Dada > Bin Which of the basic guidelines should you follow when constructing or interpreting charts or graphs? Choose all that apply! Multiple select question. Give high values for upper limits on a graph Axes should be clearly marked and labeled The prettiest graph should be used for a given set of data When creating a bar chart or a histogram, each bar/rectangle should be of the same width4. We use the cut function to bin the data. The breaks argument of the cut function specifies the ranges of the bins

A large lecture class has 280 students. The professor has announced that the mean score on an exam is 74 with a standard deviation of 8. The distribution of scores is bell-shaped. How many standard deviations above the mean would a score of 90 be? 3 2 1 1.5

2

Which of the following examples violates the 'mutually exclusive' guideline for interval construction? 300 < x ≤ 400 and 400 < x ≤ 500 200 < x ≤ 300 and 301 < x ≤ 500 100 < x ≤ 200 and 201 < x ≤ 300 300 < x ≤ 400 and 401 < x ≤ 500

300 < x ≤ 400 and 400 < x ≤ 500

If a bar chart depicts the relative frequency for type of occupations (with options as Doctor, Professor, Athlete, or Actor) as the categorical variable as a series of vertical bars, and the Doctor vertical bar has a value of .4, and there are 10 employeed individuals responding, how many Doctors were in the group of 10? 6 4 10 40

4

According to interviews and expert estimates, analytics professionals spend from ________________ of their time in the mundane task of collecting and preparing unruly data, before analytics can be applied (The New York Times ,August 17, 2014). 70-90% 10-40% 50-80% 20-30%

50-80%

A percentile is technically a measure of location; however, it is also used as a measure of relative position because it is so easy to interpret. if you know that the raw score corresponds to the 75th percentile, then you know that approximately how many students had scores lower than your score? We have no way of knowing 74% 75% 25%

75%

A scatter plot is a simple, yet useful, graphical tool. We plot each pairing: (x1, y1), (x2, y2), and so on. Once the data are plotted, according to the textbook, the graph may reveal which of the following? (Select all that apply) Multiple select question. A linear relationship exists between the two variables A bar chart would have been a better choice No relationship exists between the two variables A nonlinear relationship exists between the two variables

A linear relationship exists between the two variables A nonlinear relationship exists between the two variables No relationship exists between the two variables

Given a sample mean ̄x, a sample standard deviation s, and a relatively symmetric and bell-shaped distribution, the empirical rule states that: (Select all the apply) Multiple select question. Approximately 90% of all observations fall in the interval ̄x±2 Almost all observations fall in the interval ̄x±3 Approximately 68% of all observations fall in the interval ̄x±s Approximately 95% of all observations fall in the interval ̄x±2

Almost all observations fall in the interval ̄x±3 Approximately 68% of all observations fall in the interval ̄x±s Approximately 95% of all observations fall in the interval ̄x±2

Which 'tool' depicts the frequency or the relative frequency for each category of the categorical variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted. Bar chart Frequency distribution Hypothesis test Pie chart

Bar chart

'______________' is the process of transforming numerical variables into categorical variables by grouping the numerical values into a small number of groups

Binning

Examples of transforming numerical data include transforming: Select all that apply Multiple select question. Individual's date of birth to age There is no need to transform data Calculating Percentages Combining height and weight to create body mass index

Combining height and weight to create body mass index Calculating Percentages Individual's date of birth to age

True or false: When constructing a graph, the vertical axis SHOULD be stretched so that an increase (or decrease) of the data appears more pronounced than warranted. This will help prove your point more graphically. True False

False

True or false: When defining the 3 Vs which are defining characteristics of big data, 'velocity' refers to the immense amount of data compiled from a single source or a wide range of sources. True False

False

Which of the following are described as valid methods for visualizing a numerical variable? Select all that apply Multiple select question. Tree diagram Frequency distribution Decision tree Histogram

Frequency distribution Histogram

When constructing a histogram, we typically mark off the interval limits along the horizontal axis. What does the height of each bar represent? Choose all that are correct responses. Multiple select question. Relative Frequency Frequency of each interval Number of intervals The type of response

Frequency of each interval Relative Frequency

The basic structure of a SQL statement is relatively simple and usually consists of three keywords: Which of the following is a SQL keyword? Select all that apply! Multiple select question. Choose From Where Select

From Where Select

Recall that we use nominal and ordinal measurement scales to represent categorical variables. Which of the examples below represent a nominal scale representation of a categorical variable? Marital status (single, married, widowed, divorced, separated) The temperature of the resort location Performance of a manager (excellent, good, fair, poor). Profit and inventory level of a distribution center

Marital status (single, married, widowed, divorced, separated)

Which of the following is not a common approach for transforming categorical data? Dummy variables Category reduction Category scores Mathematical transformation

Mathematical transformation

Which numerical descriptive measure shows whether two numerical variables have a linear relationship? Measures of dispersion Measures of association Measures of central location Measures of shape

Measures of association

We can use numerical descriptive measures to extract meaningful information from data. Which measure gauges the underlying variability of the data? Measures of dispersion Measures of association Measures of shape Measures of central location

Measures of dispersion

True or false: After arranging the data in ascending order (smallest to largest), we calculate the median as (1) the middle value if the number of observations is odd or (2) the average of the two middle values if the number of observations is even. True False

True

True or false: Data in a data mart are organized using a multidimensional data model called a star schema, which includes dimension and fact tables. True False

True

True or false: In a business setting, we might use a 1:1 relationship to describe a situation where each department can have only one manager and each manager can only manage one department. True False

True

True or false: There are many standards for file formats. Two common layouts for simple text files are fixed-width and delimited format. True False

True

A line chart displays a numerical variable as a series of data points connected by a line. Which of the following are true of line charts? Select all that apply. Multiple select question. Only one line can be plotted on a chart We can plot two lines or more on a single chart Upward trends can be shown, but not downward trends Useful for tracking changes or trends over time

Useful for tracking changes or trends over time We can plot two lines or more on a single chart

Which of the following are the very first tasks most data analysts perform to gain a better understanding and insights into the data? Select all that apply Multiple select question. Counting the data Copying the data Sorting the data Visually reviewing

Visually reviewing Sorting the data Counting the data

If we have a third variable in the data set that is categorical, we can plot the two numerical variables and then add the third categorical variable. This scatter plot is called a scatter plot with a '___________' variable.

categorical

Data '___________' is the data conversion process from one format or structure to another

transformation

The only thing that differs between a population mean and a sample mean is the notation. The population mean is referred to as: x (pronounced x-bar) μ, where μ is the Greek letter mu the number of observations in a population There are many differences

μ, where μ is the Greek letter mu

"_____________", "________________" and "_______________" are the three broad categories of analytics techniques designed to extract value from data? Note: You do not need to add the word 'analytics' in your responses.

"Descriptive", "Predictive" and "Prescriptive"

An entity-relationship diagram (ERD) is a graphical representation used to illustrate the structure of the data. An '___________' is a generalized category to represent persons, places, things, or events about which we want to store data in a database table. A single occurrence of an entity is called an '______________'.

"entity" & "instance"

There are two common strategies for dealing with missing values: '_____________________' and '________________'

'omission' & 'imputation'

We refer to the population mean as a ''_______________ and the sample mean as a '___________________________'.

'parameter' and 'statistic'

If a variable has one mode, then we say it is ''__________________ If it has two modes, then it is common to call it '_______________'.

'unimodal' and 'bimodal'

Examples of common necessary mathematical data transformations include: Select all that apply Multiple select question. -Transformation of date values is often performed to help bring useful information out of the data -In order to analyze trend, we often transform raw data values into Percentages -A company might convert sales into happy customers or sad customers -A retail company might convert customers' birth dates into ages

-A retail company might convert customers' birth dates into ages -Transformation of date values is often performed to help bring useful information out of the data -In order to analyze trend, we often transform raw data values into Percentages

Recall Organic Food Superstore from the introductory case; In that case, an Entity Relationship Diagram (ERD) for the store illustrates three entities: CUSTOMER, ORDER, and PRODUCT. The relationship between CUSTOMER and ORDER entities is 1:M because: -A customer places many orders over time -An order can contain ONLY one product -Each order can only belong to one customer

-Each order can only belong to one customer

List the following steps in order they are performed to 'bin' customers into equal groups using Analytic Solver. Instructions -Open the Customer data file - Change #bin for variable to 5. Choose Equal Count for the Bins to be made with option -Choose Data Mining > Transform > Transform Continuous Data > Bin Select data range $A$1:$0$201. Check the box Variable names in the first row

-Open the Customer data file -Choose Data Mining > Transform > Transform Continuous Data > Bin Select data range $A$1:$0$201. Check the box Variable names in the first row - Change #bin for variable to 5. Choose Equal Count for the Bins to be made with option

Which of the following are reasons for data professionals to learn data wrangling skills? Select all that apply Multiple select question. -Organizations will be able to make decisions more rapidly -Analytics professionals can no longer rely on the IT department to provide data -Analytics professionals are superior to all other IT professionals -Analytics professionals need broader skill sets than data mining techniques

-Organizations will be able to make decisions more rapidly -Analytics professionals can no longer rely on the IT department to provide data -Analytics professionals need broader skill sets than data mining techniques

Which of the following is true of structured data? -Point-of-sale and financial data are examples of -structured data -Social media data such as Twitter, YouTube, Facebook, and blogs are examples of structured data -Most experts agree that only about 5% of all data used in business decisions are structured data. -Data does not conform to a predefined, row-column format

-Point-of-sale and financial data are examples of -structured data

According to the textbook, which of the following is true about databases? Select all that apply! Multiple select question. -Popular DBMS packages include Oracle, IBM DB2, SQL Server, MySQL, and Microsoft Access -Most organizations have adopted the database approach for storing and managing data. -A database is a collection of data logically organized to enable easy retrieval, management, and distribution of data -The most common type of database used in organizations today is the digital database . -A software application for defining, manipulating, and man - aging data in databases is called a database management system (DBMS).

-Popular DBMS packages include Oracle, IBM DB2, SQL Server, MySQL, and Microsoft Access -A database is a collection of data logically organized to enable easy retrieval, management, and distribution of data -Most organizations have adopted the database approach for storing and managing data. -A software application for defining, manipulating, and man - aging data in databases is called a database management system (DBMS).

Which of the following represent reasons that sampling is used to collect data? Multiple select question. -Obtaining all the population data is difficult, if not impossible -Researchers are lazy -Researchers do not want to invest the time in collecting data -Population data is expensive

-Population data is expensive -Obtaining all the population data is difficult, if not impossible

There are a number of ways to display a heat map, but they all share one thing in common—they use color to communicate the relationships between the variables that would be harder to understand by simply inspecting the raw data. Choose all the examples below that would be a good usage for a heat map. Select all that apply Multiple select question. -Show the most-or least-frequently downloaded music genres across various music streaming platforms -Show the trend of product sales over time, such as sales in one period and then the next -Show the inventory items which need to be replenished, which items have plenty on hand inventory, and which items should be evaluated to order -Show which products are the best-or worst-selling products at various stores

-Show which products are the best-or worst-selling products at various stores -Show the most-or least-frequently downloaded music genres across various music streaming platforms -Show the inventory items which need to be replenished, which items have plenty on hand inventory, and which items should be evaluated to orde

Which of the following are reasons for missing values in data? Select all that apply Multiple select question. -Respondents decline to provide the information due to its sensitive nature -Some of the questions do not apply to every respondent -There are never missing values in data -Respondents always provide all the requested information

-Some of the questions do not apply to every respondent -Respondents decline to provide the information due to its sensitive nature

For a numerical variable, instead of categories, we construct a series of intervals (sometimes called classes). We must make certain decisions about the number of intervals, as well as the width of each interval. Which of the following is a guideline for developing the intervals? Select all that apply Multiple select question. -The total number of intervals in a frequency distribution usually ranges from 5 to 20 -Interval limits are easy to recognize and interpret -Intervals are NOT mutually exclusive -Intervals are exhaustive

-The total number of intervals in a frequency distribution usually ranges from 5 to 20 -Interval limits are easy to recognize and interpret -Intervals are exhaustive

Sometimes nominal or ordinal variables come with too many categories. This presents a number of potential problems. Which of the following are potential problems highlighted in the text? Select all that apply Multiple select question. -Categories should be overlapping to complicate the analysis -If a variable has some categories that rarely occur, it is difficult to capture the impact of these categories accurately -Variables with too many categories pull down model performance -Since collecting the data is simpler, categorical data never creates difficulties in data analysis

-Variables with too many categories pull down model performance -If a variable has some categories that rarely occur, it is difficult to capture the impact of these categories accurately

We use a scatter plot to display the relationship between two numerical variables. We can expand the usage of the scatter plots to include a categorical variable. If we plot property values against square footage, then we anticipate a positive relationship between these two variables. Which of the following describes the usage of scatter plots that include a categorical variable? Select all that apply Multiple select question. -Scatter plots should only contain categorical variables -We could plot property values and square footage and use different colors to differentiate between property type -Scatter plots cannot show categorical variables so you would not use them -We can incorporate a categorical variable within the scatter plot by using different colors or symbols

-We could plot property values and square footage and use different colors to differentiate between property type -We can incorporate a categorical variable within the scatter plot by using different colors or symbols

A frequency distribution for a categorical variable groups the data into categories and records the number of observations that fall into each category. In a survey, we asked 1000 respondents which car they would purchase if they had a choice between an Audi, a Mazda, a Toyota, or a Subaru. Of the 1000 respondents, 116 chose the Audi. What is the relative frequency of Audi respondents? 116 .884 We don't have enough information! .116 (Point 116)

.116 (Point 116)

We asked 1000 respondents whether they preferred Online teaching, hybrid teaching, or attending class in person. The relative frequency of the Online teaching proponents was point 252 or (.252). How many respondents preferred online teaching? 748 252 There is no way to know We can identify the relative frequency, but not the frequency

252

Which of the following statements about 'binning' is accurate? Select all that apply Multiple select question. Bins must have equal intervals Binning reduces the noise in the data Bins must be consecutive Bins must be overlapping

Bins must be consecutive Binning reduces the noise in the data

Researchers were interested in the season records of the Major League baseball teams, including the Los Angeles Dodgers, and the Cincinnati Reds Baseball team. Researchers collected the data at the end of the season. This sample data collection method is considered to be: Population data Cross-sectional data Time Series Data

Cross-sectional data

Which is true of the use of the range as a measure of dispersion? Select all that apply Multiple select question. Focuses solely on the middle observations Is not considered a good measure of dispersion Is the simplest measure of dispersion Ignores the middle observation of a variable

Is not considered a good measure of dispersion Is the simplest measure of dispersion Ignores the middle observation of a variable

Which markup language is used for transmitting human readable data in compact files? JSON XML HTML

JSON

Which of the measures of central location is defined as the middle value of a data set; that is, an equal number of observations lie above and below it? Range Mean Median Mode

Median

What are the three most widely used measures of central location? Select all that apply Multiple select question. Mean Range Mode Median

Mode Median Mean

Which of the following are valid shapes of a histogram? Select all that apply Multiple select question. Negatively skewed Symmetric Positively skewed Correlated

Negatively skewed Symmetric Positively skewed

'__________________' data also allows us to review the range of values for each variable.

Sorting

Which of the following 'V's" are considered to be the 3 V's which are characteristics of big data? Multiple select question. Variety Volume Velocity Valuable

Variety Volume Velocity

Converting the raw data into a '______________' distribution is often a first step in making the data more manageable and easier to assess Listen to the complete question

frequency

For a numerical variable, a '______________' distribution groups data into intervals and records the number of observations that falls into each interval.

frequency

Oftentimes, a categorical variable is defined by more than two categories. For example, the mode of transportation used to commute may be described by three categories: Public Transportation, Driving Alone, and Car Pooling. Given k categories of a variable, the general rule is to create how many dummy variables? k-1 k 1 3

k-1

The most popular query language used today is '_______________' This popular query language is used for manipulating data in a relational database using relatively simple and intuitive commands.

structured query language


Related study sets