Data Analytics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Ensuring that data visualizations are visually accessible, leveled for the intended audience, and scaffolded with context (in other words, accessible to everyone) is an example of...

universal design

Does this ______ measure what I think it does? Do I have ______ that aligns to my ______?

variable, data, question

Information redundancy means (1) _____ and is important because (2) _____.

(1) having the same information shown with multiple different visual cues in a chart (2) it helps organize and prioritize information on a chart

Which one of these is NOT a categorical variable?

Blood pressure measured daily: 110/90, 120/92, 115/86, 120/90

The 5 main types of data analysis are:

Descriptive, Exploratory, Inferential, Causal, Predictive

If you want to run an AB test for a feature on a website that receives 100,000 visitors per day from the U.S.A, Canada, United Kingdom, Australia, and South Africa, which of the following would be an appropriate sample?

Every 10th site visitor from each country until you have sampled 10,000 visitors

True or False: Because it is a subset of the population, a random sample will always perfectly represent the population.

False

True or False: We can always get precise summary information about the variables in a dataset just by viewing them in the dataframe.

False

True or False: We can always get precise summary information about the variables in a dataset just by viewing them in the dataframe. (Selected)Incorrect:

False

True or False? If you choose a good algorithm, you don't need to worry about the quality of the data used to create your predictive model.

False

True or False? You should always fill in missing numeric variables with the mean of the column

False

True or false? Conclusions drawn from a descriptive analysis with one dataset can be extended to other datasets.

False

You are reading an article about retirement ages. The article reports that the median retirement age is 65.4 years old with an interquartile range of 7.2 years. Which of the following statements about the distribution of retirement ages is true?

Half of the ages in the distribution are less than 65.4 and the other half are greater than 65.4. The middle 50% of the data spans 7.2 years.

In which group can ALL chart types be used to visualize the distribution of hair length in a classroom?

Histogram, box plot, density curve

Given the description of the dataset, determine the type of each categorical variable. Variable: Description ID: Student ID Name: Name of the person Age: Numerical value for each student's age Letter_Grade: Categorical value for the student's grade (A, B, C, D or F) Class_Subject: Subject of class (Math, Chemistry, History, etc.) ______ is an ordinal variable and ______ is a nominal variable.

Letter_Grade, Class_Subject

Which of these issues would compromise the accuracy of a dataset?

Some researchers measure distances in meters and others measure in yards and records the value without units.

Which of the following is true about the distribution shown?

The distribution may be considered skewed because it is asymmetrical with a longer tail on the left side.

You are reading about a survey on time spent cleaning different areas of the home. Participants were asked how many hours per week they spent cleaning each room of their home. The mean and standard deviation times in hours are shown in the table. Which statement about the distributions of cleaning times is true?

The distributions for the living room and dinning room have the same center value, but the distribution for the living room has a wider spread

You are creating a report on annual car sales from a dataset that contains information on every car sold at your dealership this year. One of the variables is the color of the car, which includes categories like "gray", "white", "red", and "blue". To describe the variable for color, you want to make a table of measurements. What kind of measurements could go in your table for this kind of variable?

The frequency and proportion of each color in the dataset

You are creating a report on annual car sales from a dataset that contains information on every car sold at your dealership this year. One of the variables is the color of the car, which includes categories like "gray", "white", "red", and "blue". To describe the variable for color, you want to make a table of measurements. What kind of measurements could go in your table for this kind of variable?

The frequency and proportion of each color in the dataset.

In an exploratory analysis of weather station data, you plotted the temperature at 9 am each day against the daily maximum temperature and found that there was a strong linear relationship between morning temperature and the daily max: as morning temperature increased, so did the maximum temperature reached that day.

The morning temperature is correlated with the maximum temperature

In an exploratory analysis of weather station data, you plotted the temperature at 9 am each day against the daily maximum temperature and found that there was a strong linear relationship between morning temperature and the daily max: as morning temperature increased, so did the maximum temperature reached that day. As a result, you can say:

The morning temperature is correlated with the maximum temperature.

Which of the following is NOT a major goal of analyzing data?

To know the future with absolute certainty

True or False: Data viz authors can and should make annotations on graphs.

True

True or False: Finding the mean number of cities for every region of a country is an example of aggregating data.

True

We can go beyond correlation and assign causation with causal analysis:

When we do a carefully designed experiment or if we use advanced techniques with observational data that meet strict assumptions.

A correlation coefficient of -0.97 tells us about the ______ relationship between two variables. The negative sign indicates that low values of the first variable are associated with ______ values of the second, and high values of the first variable are associated with ______ values of the second. The distance of the coefficient from zero tells us the ______ of the relationship, which is because -0.97 is very close to -1.

______, numeric, ______, ______, ______, ______.

Which of the following is NOT a requirement for data visualizations to be truthful, legible, and accessible?

appropriate complexity

Which of the following is NOT a component of vision-based accessibility for data viz?

avoiding red and green

This plot shows the ______ of the numeric variable x, which gives the possible ______ of a variable and their ______.

distribution, values, frequencies.

When data visualizations are misleading or confusing, it's usually because...

chart-makers have conscious or unconscious bias, or lack of knowledge or skill

Predictive analyses allow us to make ______ predictions about the future. Predictive analyses often use ______ techniques to identify the ______ of future outcomes.

data-driven, supervised machine learning, ______.

Predictive analyses allow us to make _____ predictions about the future. Predictive analyses often use _____ techniques to identify the _____ of future outcomes.

data-driven, supervised machine learning, likelihood

______ coding can result in ______ categories, ______ data can contribute to ______ summaries, and ______ can make grouping similar terms difficult

inconsistent, splitting, missing, inaccurate, typos

Categorical variables can be either _______ (ordered) or _______ (unordered).

ordinal, nominal

Extreme values that are distant from the rest of the distribution are called _______. These extreme values may heavily influence the more than the _______ . A measure that is not as heavily influenced by outliers and _______ is called a _______ statistic.

outliers, mean, median, skewness, robust

Inferential analysis lets us draw conclusions about a _____ based on results from a _____ .

population, sample

You want to explore the relationship between two numeric variables: age and height. To visually check the relationship, it would be best to make a _______ of the variables. To get a numeric summary of the linear relationship between the variables, you should also get their _______.

scatter plot, _______

Leaning on established color associations when designing data visualizations is...

sometimes good, and sometimes bad. Color associations can pull on both helpful prior knowledge and harmful stereotypes.

Descriptive analyses describe major patterns in data through _____ and visualization of measures of _____ and spread.

summary statistics, central tendency

When considering the audience of a data visualization, it's most important to keep in mind...

the audience's reading level and familiarity with the subject matter


Kaugnay na mga set ng pag-aaral

Algebra 2: Midterm Exam (study guide)

View Set

Test 2: Scrotum, Prostate, Penis

View Set

AWS All Chapters - All Review Questions

View Set

Chapter Eight: Segment and Interim Reporting

View Set