Chapter 1-4

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A frequency distribution for a categorical variable groups the data into categories and records the number of observations that fall into each category. In a survey, we asked 1000 respondents which car they would purchase if they had a choice between an Audi, a Mazda, a Toyota, or a Subaru. Of the 1000 respondents, 116 chose the Audi. What is the relative frequency of Audi respondents?

.116 (Point 116)

In general, a boxplot is constructed as follows: Rank these steps in the correct order!

1. Plot the five-number summary values in ascending order on the horizontal axis 2. Draw a box encompassing the first and third quartiles 3. Draw a dashed vertical line in the box at the median 4. Calculate the interquartile range. Draw a line from Q1 to the minimum value and Q3 to the maximum value 5. Use an asterisk (or other symbol) to indicate observations that are farther than 1.5 x IQR from the box. these observations are considered outliers

A large lecture class has 280 students. The professor has announced that the mean score on an exam is 74 with a standard deviation of 8. The distribution of scores is bell-shaped. How many standard deviations above the mean would a score of 90 be?

2 reason: ( ̄x+2s=74+(2×8)=90

We asked 1000 respondents whether they preferred Online teaching, hybrid teaching, or attending class in person. The relative frequency of the Online teaching proponents was point 252 or (.252). How many respondents preferred online teaching?

252

Which of the following examples violates the 'mutually exclusive' guideline for interval construction?

300 < x ≤ 400 and 400 ≤ x ≤ 500

If a bar chart depicts the relative frequency for type of occupations (with options as Doctor, Professor, Athlete, or Actor) as the categorical variable as a series of vertical bars, and the Doctor vertical bar has a value of .4, and there are 10 employeed individuals responding, how many Doctors were in the group of 10?

4 (10) = 4 doctors

Calculate the Mean Absolute Deviation for the following data: We have observed the age of 3 individuals in a study, where the mean age is 40. The observed ages were 31, 40, and 49. What is the MAD?

6 =abs((31-40)+(40-40) +(49-40))/3 = 6

A percentile is technically a measure of location; however, it is also used as a measure of relative position because it is so easy to interpret. if you know that the raw score corresponds to the 75th percentile, then you know that approximately how many students had scores lower than your score?

75%

A scatter plot is a simple, yet useful, graphical tool. We plot each pairing: (x1, y1), (x2, y2), and so on. Once the data are plotted,according to the textbook, the graph may reveal which of the following? (Select all that apply)

A nonlinear relationship exists between the two variables A linear relationship exists between the two variables No relationship exists between the two variables

When examining the relationship between two numerical variables, a scatter plot is a simple, yet useful, graphical tool. What does each point in a scatter plot represent?

A paired observation with one x-axis point and one y-axis point. (x1, y1)

Which of the following statements is true of the skewness coefficient? Select all that are true.

A symmetric distribution has a skewness coefficient of zero A positively skewed distribution has a positive skewness coefficient

The Empirical Rule provides precise statements regarding the percentage of observations that fall within a specified number of standard deviations from the mean. Which of the following is a correct statement? Select all that apply!

Approximately 68% of all observations fall within +/- 1 standard deviations of the mean Approximately 95% of all observations fall within +/- 2 standard deviations of the mean Almost all observations fall within +/- 3 standard deviations of the mean

A measure of ____ quantifies the direction and strength of the linear relationship between two variables, x and y.

Association

Which of the basic guidelines should you follow when constructing or interpreting charts or graphs? Choose all that apply!

Axes should be clearly marked and labeled When creating a bar chart or a histogram, each bar/rectangle should be of the same width

Which of the following is true of structured data? A. Social media data such as Twitter, YouTube, Facebook, and blogs are examples of structured data B. Point-of-sale and financial data are examples of structured data C. Data does not conform to a predefined, row-column format D. Most experts agree that only about 5% of all data used in business decisions are structured data.

B. Point-of-sale and financial data are examples of structured data D should be about 20%

Which 'tool' depicts the frequency or the relative frequency for each category of the categorical variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted.

Bar chart

A vertical bar chart is often referred to as which of the following?

Column chart

An objective numerical measure that reveals the direction of the linear relationship between two variables is called the ' ____

Covariance

Which of the following is true of the covariance? Select all that are true!

Covariance can be negative, positive, or zero The covariance is sensitive to the units of measurement If the covariance is negative, then x and y have a negative linear relationship.

Researchers were interested in the season records of the Major League baseball teams, including the Los Angeles Dodgers, and the Cincinnati Reds Baseball team. Researchers collected the data at the end of the season. This sample data collection method is considered to be:

Cross-sectional data

'Each measure is a numerical value that equals zero if all observations are identical and increases as the observations become more diverse'. What measure does this describe?

Dispersion

Which of the following statements is true regarding the kurtosis coefficient? Select all that are true.

Excess kurtosis is calculated as the kurtosis coefficient minus 3 A distribution that has tails that are more extreme than the normal distribution is leptokurtic A platykurtic distribution is one that has shorter tails

True or false: When defining the 3 Vs which are defining characteristics of big data, 'velocity' refers to the immense amount of data compiled from a single source or a wide range of sources.

F

T/F Data privacy evaluates moral problems related to data.

F (data ethics)

True or false: When constructing a graph, the vertical axis SHOULD be stretched so that an increase (or decrease) of the data appears more pronounced than warranted. This will help prove your point more graphically.

False

Which of the following are described as valid methods for visualizing a numerical variable?

Frequency distribution Histogram

When constructing a histogram, we typically mark off the interval limits along the horizontal axis. What does the height of each bar represent? Choose all that are correct responses.

Frequency of each interval Relative Frequency

Which of the following is true of the correlation coefficient? Select all that are true!

If the correlation coefficient equals −1, then x and y have a perfect negative linear relationship If the correlation coefficient equals 0, then x and y are not linearly related The correlation coefficient is unit free

Which is true of the use of the range as a measure of dispersion?

Ignores the middle observation of a variable Is not considered a good measure of dispersion Is the simplest measure of dispersion

For a numerical variable, instead of categories, we construct a series of intervals (sometimes called classes). We must make certain decisions about the number of intervals, as well as the width of each interval. Which of the following is a guideline for developing the intervals?

Interval limits are easy to recognize and interpret Intervals are exhaustive The total number of intervals in a frequency distribution usually ranges from 5 to 20

Which of the following is true of a stacked column chart?

It is designed to visualize more than one categorical variable, plus it allows for the comparison of composition within each category.

The interquartile range (IQR) is the difference between the third quartile and the first quartile, or, equivalently, IQR = Q3 − Q1.Which of the following is true of the interquartile range?

It is the range of the middle 50% of the variable

Examples of categorical variables include: (Select all that apply!)

Marital Status Course Grade

When constructing a box plot, the first step is to use a five-number summary. What does the five-number summary contain?

Maximum value Q1, Q2, Q3 Minimum value

What does MAD stand for, when used as a measure of dispersion?

Mean Absolute Deviation

Which numerical descriptive measure shows whether two numerical variables have a linear relationship?

Measures of association

We can use numerical descriptive measures to extract meaningful information from data. Which measure gauges the underlying variability of the data?

Measures of dispersion

Gender is an example of which measurement scale?

Nominal

Which of the following is a true statement regarding outliers in data analysis? (Choose all that apply)

Outliers may just be due to random variations Outliers may indicate bad data due to incorrectly recorded observations There are no universally agreed upon methods for treating outliers

Which of the following is a common graphical method that allows us to determine whether two numerical variables are related in some systematic way?

Scatter plot

There are a number of ways to display a heat map, but they all share one thing in common—they use color to communicate the relationships between the variables that would be harder to understand by simply inspecting the raw data. Choose all the examples below that would be a good usage for a heat map.

Show which products are the best-or worst-selling products at various stores Show the inventory items which need to be replenished, which items have plenty on hand inventory, and which items should be evaluated to order Show the most-or least-frequently downloaded music genres across various music streaming platforms

In a bubble plot, how is the third numerical variable represented?

Size of the bubble

Which of the following are common measures of shape?

Skewness coefficient Kurtosis coefficient

Which of the following are valid shapes of a histogram?

Symmetric Negatively skewed Positively skewed

If the correlation coefficient equals -1, then x and y have a perfect negative linear relationship

T

If the correlation coefficient equals 0, then x and y are not linearly related

T

If the correlation coefficient equals 1, then x and y have a perfect positive linear relationship

T

If the covariance is negative, then x and y have a negative linear relationship

T

If the covariance is positive, then x and y have a positive linear relationship

T

If the covariance is zero, then x and y have no linear relationship

T

T/F The difference between cross-sectional and time series data is whether the data is evaluated at a single point in time or multiple points in time

T

T/F a good measure of dispersion should consider differences of all observations from the mean

T

True or false: A weakness of 'ordinal data' is that we cannot interpret the difference between the ranked value; For example, if someone finishes first, second, or third in a foot race, there is not necessarily the same 'difference in time' between first place and second place, as there is between second place and third place.

T

True or false: After arranging the data in ascending order (smallest to largest), we calculate the median as (1) the middle value if the number of observations is odd or (2) the average of the two middle values if the number of observations is even.

T

True or false: z-score measures the relative location of an observation and indicates whether it is an outlier.

T

Perhaps we are interested in whether certain personality types are more prevalent among Generation X, Baby Boomers, or Millennial employees. What does the 51 in the table represent? . Analyst Diplomat Generation X 51 42 Baby Boomers 35 22 Millennials 45 35

The number of generation X employees who had personality types categorized as Analyst

Which of the following is true of the variance and standard deviation?

The standard deviation is the positive square root of the variance. The variance is an average of the squared differences between the observations and the mean

Which of the following is true of measures of association? Select all that are true.

These measures quantify the direction and strength of the linear relationship between two variables, x and y. These measures are not appropriate when the underlying relationship between the variables is nonlinear

True or false:Contingency tables and stacked column charts are two common tabular and graphical methods that help us summarize the relationship between two categorical variables.

True

A line chart displays a numerical variable as a series of data points connected by a line. Which of the following are true of line charts? Select all that apply.

Useful for tracking changes or trends over time We can plot two lines or more on a single chart

We use a scatter plot to display the relationship between two numerical variables. We can expand the usage of the scatter plots to include a categorical variable. If we plot property values against square footage, then we anticipate a positive relationship between these two variables. Which of the following describes the usage of scatter plots that include a categorical variable?

We could plot property values and square footage and use different colors to differentiate between property type We can incorporate a categorical variable within the scatter plot by using different colors or symbols

A ___ ' plot shows the relationship between three numerical variables.

bubble

If we have a third variable in the data set that is categorical, we can plot the two numerical variables and then add the third categorical variable. This scatter plot is called a scatter plot with a ' ____ variable

categorical

The term ' ____ ' location relates to the way numerical data tend to cluster around some middle or central value.

central

A heat map is an important visualization tool that uses ' ____ ' to display relationships between variables. (Please enter one word for one blank.)

color

When examining the relationship between two categorical variables, a ' ___ ' table proves very useful

contingency

The ' ____ ' coefficient describes both the direction and the strength of the linear relationship between x and y

correlation

The 25th percentile is also referred to as the ' ____ ' quartile, the 50th percentile is referred to as the '____ ' quartile, and the 75th percentile is referred to as the ' ____ quartile

first, second, third

Converting the raw data into a ' ___ ' distribution is often a first step in making the data more manageable and easier to assess

frequency

For a numerical variable, a _________ distribution groups data into intervals and records the number of observations that falls into each interval.

frequency

The ___ ' range is the difference between the third quartile and the first quartile.

interquartile

The ____ ' coefficient is a summary measure that tells us whether the tails of the distribution are more or less extreme than the normal distribution.

kurtosis

A ___ ' chart displays a numerical variable as a series of data points connected by a line

line

What is the primary measure of central location?

mean

Which of the measures of central location is defined as the middle value of a data set; that is, an equal number of observations lie above and below it?

median

What are the three most widely used measures of central location?

median mean mode

Which of the measures of central location is defined as the observation that occurs most frequently?

mode

Because almost all observations fall within three standard deviations of the mean, it is common to treat an observation as an ' ___ ' if its z-score is more than 3 or less than −3

outlier

Extremely large or small observations for a variable are referred to as ' ____

outliers

We refer to the population mean as a ___ and the sample mean as a ___

parameter, statistic

The formula for the variance differs depending on whether we have a sample or a ' ____

population

The ___ ' is the simplest measure of dispersion; it is the difference between the maximum and the minimum observations of a variable.

range

There are several measures of dispersion that gauge the variability of a data set. Select all of the measures below that are useful for measuring dispersion.

range mean absolute deviation interquartile range

The ___ ' coefficient measures the degree to which a distribution is not symmetric about its mean.

skewness

A ____ ' column chart is an advanced version of the column chart that we discussed. It is designed to visualize more than one categorical variable, plus it allows for the comparison of composition within each category.

stacked

If a variable has one mode, then we say it is ' ____ f it has two modes, then it is common to call it ____

unimodal, bimodal

The mean and the standard deviation of scores on an accounting exam are 74 and 8, respectively. The mean and the standard deviation of scores on a marketing exam are 78 and 10, respectively. Find the z-scores for a student who scores 90 in both classes.

z-score in the accounting class is z=(90-74)/8 =2 z-score in the marketing class is z=(90−78)/10=1.2

The only thing that differs between a population mean and a sample mean is the notation. The population mean is referred to as:

μ, where μ is the Greek letter mu


Set pelajaran terkait

Structure and function: muscle fiber type

View Set

Finance lecture 10 - cost of capital

View Set

Extracellular matrix degradation

View Set

Business Structures + Mastery Test

View Set

Compare and Contrast Articles and Constitution

View Set

Chem 104 Final Practice Problems

View Set

Life Span Development Chapter 12

View Set