MIS 345 EXAM 1

Ace your homework & exams now with Quizwiz!

Which correlation coefficient suggests the strongest relationship? +0.5 +1 -0.1 0

+1

Expressed in percentiles, the interquartile range is the difference between the: 10th and 60th percentiles 15th and 65th percentiles 25th and 75th percentiles 35th and 85th percentiles 20th and 70th percentiles

25th and 75th percentiles

With symmetric, "bell-shaped" distributions, approximately what percent of the observations are within two standard deviations of the mean? 99.7% 100% 95% 68% 50%

95%

If a value represents the 95th percentile, this means that: there is a 5% chance that this value is incorrect there is a 95% chance that this value is correct 95% of the time you will observe this value 95% of all values are above this value 95% of all values are below this value

95% of all values are below this value

A useful way of comparing the distribution of a numerical variable across categories of some categorical variable is with: a side-by-side box plot a side-by-side plot or side-by-side pivot table neither a side-by-side box plot nor side-by-side pivot table a side-by-side pivot table

?

The limitation of covariance as a descriptive measure of association is that it

?

Displaying all correlations between 0.6 and 0.999 on a scatterplot as green and all correlations between -1.0 and -0.6 as red is known as: rank-order formatting conditional formatting conditional formatting numerical formatting categorical formatting

? (Conditional Formatting)

Which of the following are considered numerical summary measures? correlation and covariance first quartile and third quartile mean and variance covariance and variance variance and correlation

Correlation and Covariance

To examine relationships between two categorical variables, we can use: scatter plots histograms none of these choices counts and corresponding charts of the counts

Counts and corresponding charts

T/F A data set is typically a rectangular array of data, with observations in columns and variables in rows.

FALSE

T/F A relatively new aspect of business analytics is big data, which typically implies the analysis of the very large data sets that companies currently encounter.

FALSE

T/F All nominal data may be treated as ordinal data.

FALSE

T/F Correlation can be affected by the measurement scales applied to X and Y variables.

FALSE

T/F Phone numbers, Social Security numbers, and zip codes are typically treated as numerical variables.

FALSE

T/F The cross-industry standard process for data mining framework begins with data modeling

FALSE

T/F The cutoff for defining a large correlation is 0.5.

FALSE

T/F We cannot attempt to interpret correlations numerically, with the one possible exception of indicating whether they are positive or negative.

FALSE

The mode is best described as the: Third quartile 50th percentile Same as the average Middle observation Most frequently occurring value

Most frequently occurring value

Which statement is true for the following data values: 7, 5, 6, 4, 7, 8, and 12? Only the mean and median are equal. Only the median and mode are equal. The mean, median, and mode are all equal. Only the mean and mode are equal.

Only the median and mode are equal. ?

As a measure of variability, what is defined as the maximum value minus the minimum value? mean standard deviation range variance median

Range

T/F A population includes all elements or objects of interest in a study, whereas a sample is a subset of the population used to gain insights into the characteristics of the population.

TRUE

T/F Age, height, and weight are examples of numerical data.

TRUE

T/F An example of a joint category of two variables is the count of all non-drinkers who are also nonsmokers.

TRUE

T/F Comparing a numerical variable across two or more subpopulations is known as a comparison problem.

TRUE

T/F Correlation is a single-number summary of a scatterplot.

TRUE

T/F Data analysis includes data description, data inference, and the search for relationships in data.

TRUE

T/F If the coefficient of correlation r = 0 .80, the standard deviations of X and Y are 20 and 25, respectively, then Cov(X, Y) must be 400.

TRUE

T/F If the standard deviations of X and Y are 15.5 and 10.8, respectively, and the covariance of X and Y is 128.8, then the correlation coefficient is approximately 0.77.

TRUE

T/F It is possible that the data points close to a curve have a correlation close to 0, because correlation is relevant only for measuring linear relationships.

TRUE

T/F Strongly related variables may have a correlation close to zero if the relationship is nonlinear.

TRUE

T/F The advantage that correlation has over covariance is that the former has a set lower and upper limit.

TRUE

T/F The correlation between two variables is unitless and always between -1 and +1.

TRUE

T/F The median of a data set with 30 values would be the average of the 15th and the 16th values when the data values are arranged in ascending order.

TRUE

T/F The scatterplot is a graphical technique used to indicate the relationship between two numerical variables.

TRUE

T/F Three important themes run through the Business Analytics: Data Analysis & Decision Making text: data analysis, decision-making, and dealing with uncertainty.

TRUE

T/F We must specify appropriate bins for side-by-side histograms in order to make fair comparisons of distributions by category.

TRUE

T/F The authors of the Business Analytics: Data Analysis & Decision Making text describe three types of models: graphical models, algebraic models, and spreadsheet models.

TRUE

Scatter plots are also referred to as: all of these choices X-Y charts crosstabs contingency charts none of these choices

X-Y charts

If the correlation of variables is close to 0, then we expect to see: an upward sloping cluster of points on the scatterplot no explanation of how the scatterplot looks based on the correlation a cluster of points with no apparent relationship on the scatterplot a cluster of points around a trendline on the scatterplot a downward sloping cluster of points on the scatterplot

a cluster of points with no apparent relationship on the scatterplot

Examples of comparison problems include: recovery rate for a disease broken down by patients who have taken a drug and patients who have taken a placebo starting salary of recent graduates broken down by academic major all of these choices salary broken down by male and female subpopulations cost of living broken down by region of a country

all of these choices

We study relationships among numerical variables using: scatterplot charts covariance all of these choices none of these choices correlation

all of these choices

A sample of a population taken at one particular point in time is categorized as: time-series cross-sectional categorical

cross-sectional

One characteristic of "paired variables" is that: each variable has a different number of observations both variables are positive values each variable has the same number of observations one variable is a negative value and the other is a positive value

each variable has the same number of observations

Tables used to display counts of a categorical variable are called: crosstabs either crosstabs or contingency tables contingency tables neither crosstabs nor contingency tables

either crosstabs or contingency tables

The limitation of covariance as a descriptive measure of association is that it only captures positive relationships does not capture the units of the variables is very sensitive to the units of the variables none of these options is invalid if one of the variables is categorical

is invalid if one of the variables is categorical

Correlation is useful only for: conveying the same information in a simpler format than a scatterplot measuring the strength of a linear relationship measuring the strength of a nonlinear relationship automatically calculating covariances assessing the weakness of a linear relationship

measuring the strength of a linear relationship

The most common data format is: stacked unstacked short long

stacked

Correlation and covariance measure: none of these choices the direction of a linear relationship between two numerical variables the strength of a linear relationship between two numerical variables the strength and direction of a linear relationship between two numerical variables the strength and direction of a linear relationship between two categorical variables

the strength and direction of a linear relationship between two numerical variables

The daily closing values of the Dow Jones Industrial Average over a period of 30 days are best described as _____ data. discrete nominal cross-sectional time-series

time-series


Related study sets

Vol. 1 ch.4 workforce safety and wellness

View Set

Chapter 6: Risk Aversion and Capital Allocation to Risky Assets (Review Questions)

View Set

Rubins - Female Reproductive System

View Set

American Government Chapter 14: Foreign Policy

View Set

Occupational Rehabilitation and Return-to-Work Programming

View Set

The Industrial Revolution (Chapter 9)

View Set