118D Exam 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

If you loaded the Yelp user data into Tableau and included the yelping_since date field, if you dragged that field to the columns shelf, by default, columns would be created for: A.Each year that users joined Yelp B.Each day that users joined Yelp C.Each day of the week that users joined Yelp D.Each month that users joined Yelp

A

What is the difference between pie and donut charts? Which is considered a better option? What chart type is a good alternative to pie charts?

A Pie chart is a circle divided into sections corresponding to percentages. A donut chart (slightly better) is a circle composed of wedges representing the fractions/percentage of the data, contains a center whitespace to clarify your data.

What is the "vis" level of detail? (see the whitepaper on LODs)

A key aspect of exploring data is understanding the structure of the source. For example, you may have restaurant inspection data that at the most granular level is listed by its street address. You may then want to aggregate the data to view properties by zip code, city, state, or even country. In Tableau, you typically do this by dropping the dimensions you care about into your view (e.g., city, state). Depending on the dimensions you've chosen to add to the view, your data will be aggregated accordingly —to the "viz level of detail", or Viz LOD for short.

When doing scatter plots, what happens if you include multiple fields on the row and/or column shelves?

A matrix of scatter plot is created

When wrangling our data in Spark, we could filter the Yelp data to include only businesses in certain states or only users who had written at least 10 reviews. Which of these could we also do in Tableau? A. We could filter both businesses by state and users based on their review count. B. We could filter businesses based on state, but NOT users based on their review count. C.We cannot filter out rows or columns of data in Tableau; all filtering must be done before loading the data. D.We could filter users based on their review count since that field is numeric, but we could not filter businesses based on state because it is a text field.

A.

In the treemap exercise, why did we use a FIXED LOD expression in calculating the gender percentage for each name? A. To calculate the percentage using data for the entire year range from 1910 to 2018, regardless of the years selected in the filter. B.To exclude names where there were too few records for that name. C.To calculate a separate gender percentage for men and for women. D.To be able to filter on gender.

A. To calculate the percentage using data for the entire year range from 1910 to 2018, regardless of the years selected in the filter.

Which of the following is true regarding dual-axis line charts? A.A dual-axis line chart will have two measures selected in the row shelf and the second measure is identified as dual-axis. B.One of the limitations of Tableau is that it cannot generate a dual-axis line chart. C.A dual axis line chart requires two dimensions (on the row or column shelf), and the charts will be stacked vertically. D.A dual-axis line chart requires two dimension fields and the charts will be shown side-by-side.

A.A dual-axis line chart will have two measures selected in the row shelf and the second measure is identified as dual-axis.

The chapter discusses pie and donut charts. Which of the following is true about these charts? A.Both pie and donut charts are considered poor choices, but the donut chart is slightly better. B. To enhance the storytelling capability of the donut chart, the center should be kept blank. C.Pie charts are generally considered a poor choice since people are not good at understanding angles when there are too many slices, but donut charts solve this problem. D.Pie and donut charts are generally considered a poor choice, but work best when comparing a large number of categories (more than 10).

A.Both pie and donut charts are considered poor choices, but the donut chart is slightly better.

How did you generate the images in Tableau that you embedded back in your notebook?

Analysis → extract file Use the displayHTML and showimage function displayHTML(showimage(File))

When you create a calculated field, how is it identified in the data pane (on the left-hand side)?

As a measure

In comparison to Spark, a parameter in Tableau is most similar to which of the following? A.DataFrame B. Integer variable C.Temporary view D.SQL query

B. Integer variable

Which of the following is true when creating a filter? A.To be used as a filter, a field must be included in either the row shelf or the column shelf B.A range filter could not be created for a discrete dimension such as the business category (e.g., shopping vs. restaurants) C.Filters must be defined before selecting the chart type (e.g., bar vs. line chart) D. To be used in a filter, a field cannot have more than 25 different values

B.A range filter could not be created for a discrete dimension such as the business category (e.g., shopping vs. restaurants)

In creating the title for our chart in the treemap video, which of the following did we use? A.A set B.A window function C. Two parameters D.An LOD expression

B.A window function

What is the difference between discreet and continuous fields? Is the state field in the business table discreet or continuous? What about the number of compliments a user has received or the number of votes on a review?

Blue indicates that a field is discrete. Green indicates that a field is continuous. Discrete fields can be sorted but continuous fields cannot. If the field is discrete, there will be a separate header for each value on the x axis. For example, separate headers like Jan, Feb, March etc. are discrete fields.

When loading Yelp data into Tableau (as a CSV file), which of the following fields would be characterized as a Dimension in the data pane? A.The average stars field from the business data B.The review count from the user data C.The name field from the business data D.The number of useful votes for each review in the review data E.The number of likes a tip received from the tip data

C.The name field from the business data

Which of the following is true regarding selecting the chart type? A. the chart type must be selected before specifying the row and column fields B.The chart type must be selected before specifying any filters C.You can explore different common chart types with your data using the "show me" panel D.The line chart is the default chart type if no type is selected

C.You can explore different common chart types with your data using the "show me" panel

When accessing Spark from Tableau using ODBC, does our Spark cluster need to be running? Does the notebook used to generate the data need to be attached, calculated, and/or the DataFrame cached?

Connect to Spark > Open Tableau; Connect menu; More... Databricks (pop-up connection menu) > In Databricks from the running cluster, find the JDBC/ODBC tab, copy and paste into Tableau connection menu: Server Hostname; never changes HTTP path; new for each cluster Databricks Username and Password also required.

Using an ODBC driver, we can connect Tableau to Databricks. What do we need to write out our DataFrame as to be able to access it from Tableau?

Create DataFrame in Notebook to import to Tableau; df_businessData = spark.read.json('/yelpdata/business.bz2') print ("record count:", df_businessData.count() ) df_businessData.printSchema() then a DataFrame with the desired fields is created; df_business = df_businessData.select("business_id", "name", "address", "state", "city", "latitude", "longitude", "postal_code") df_business.show(truncate=50)

In creating a treemap, the initial chart that Tableau creates based on the dimensions and measures selected is a: A.Area Chart B.Box and Whiskers Plot C,Scatter Plot D. Bar Chart

D. Bar Chart

The categories used in the business data are hierarchical. For example, the top-level category "Active Life" includes a number of sub-categories for different activities, and one of those is "Diving". The Diving category includes the sub categories "Free Diving" and "Scuba Diving". If a business has "Scuba Diving" as a category, it will also always include "Diving" and "Active Life". If we wanted to show the number of businesses in each of the categories in Yelp (where each business is usually in multiple categories based on the hierarchy), Which of the following chart types would be a good choice for showing the relative size of each category (size being based on the number of businesses in that category)? A.Scatter plot B.Choropleth map C.Line chart D. Treemap

D. Treemap

After loading the Yelp business data into Tableau, assume we drag the following two measures up to the row and column shelves respectively: Latitude (generated) Longitude (generated) Which of the following best describes why the map that is shown is still blank? A. We need to drag a numeric measure field such as the number of records onto the color field of the marks card. B.No data can be displayed because some of our data is for cities outside of the United States. C.We still need to drag either the actual latitude or longitude field in the data onto the marks card. D. We have not included a geographic dimension such as the state or city in our chart yet

D. We have not included a geographic dimension such as the state or city in our chart yet

What method is used to visualize a DataFrame in Spark?

Display () method

If you want to build a packed bubble chart, what other visualization type do you create first and then convert to a packed bubble chart?

Drag a dimension to column, a measure to row shelves, produce a bar chart, as is the default functionality, then use Show Me card and select Packed Bubble chart.

If your data was displayed on a map of the world, how could you create a group for just the North American cities? Can you do this interactively?

Drag a dimension(country) to the filter shelf and chose U.S. To do it interactively right click the "Dimension Pill" and select "Show filter"

Calculated fields can be used in the row or column shelf, but they cannot be used in the marks card such as in the color shelf. T/F

F

If we want to create multiple visualizations of the Yelp data, we need to re-import the data into a new workbook for each chart we want to create. T/F

F

Differences between parameters and filters

Filters are specific to the data source, parameters aren't. Filters are created on the worksheet lever, while parameters can be used across the entire workbook. We must use a parameter instead of filter while trying to filter across different data sources.

What are the measures listed in italics font? Where did they come from? Where they in the data file?

Generated fields; names in Italics generated by Tableau

In a heat map, you can set the colors to be "full color range" or the default which maximizes contrast (so very pale to full intensity). Think about data examples in Yelp where you would want to use each of these.

Heat map; compare categorical data using color, represents values by a variable in a hierarchy. Start by placing one or more dimensions on the columns shelf and one or more dimensions on the row shelf.

How did you turn a stacked bar chart into a 100% stacked bar chart?

In the Stacked Bar Chart to 100% right-click SUM(Sales) on the Columns shelf in the Primary Setup tab, and then click Add Table Calculation. In the Table Calculation dialog box: In the Calculation Type drop-down menu select Percent of Total. Under Summarize the values from select Cell or Table (Across), and then click OK.Note: You can also select Table (Down) if the measure is on the Rows shelf.

What is a dual-axis line chart? How is it created?

Is a line chart with multiple lines. Dual axis chart lines can be created by bringing two measures to the rows shelf and then right clicking on the second measure and selecting Dual-axis from the drop down menu.

For the display function's plot options, what do "Key", "Values", and aggregation" do? How would they be reflected in a bar, line, or area chart?

Key would be the control variable appearing on the x axis.Values will appear on the y axis. They must be numeric, so that they can be combined using the aggregation function.

Are charts displayed in code or markdown cells when using the display function?

Markdown

Is there a required number of columns in a DataFrame for generating a chart using display.

No

Can the result for a formula be text or does it have to be numeric?

Numeric

When we were working with the SSA data in Tableau, the M/F values were initially interpreted as Boolean. How did we change the data type?

On the data source page, click on the "ABC" in the Gender column and check the "string" option to change the datatype from Boolean.

For the rows and column shelves, which is the x-axis in our chart, which is the y-axis?

Row; X-axis and Column; Y-axis.

Similar to using the IF function in SQL, or the when().otherwise() function in PySpark, a calculated filter can be added in Tableau that contains an IF function based on other fields in the data. T/F

T

The Yelp dataset contains 10 metro areas which are identified by state and the cities within each metro area, but not by country. We could filter data by state to include just Canadian cities, but we could also create a categorization for the Canadian cities by using the lasso tool in the map visualization. T/F

T

When creating a treemap in Tableau, the size of the box and the color of each box both represent the aggregate value for that box (such as the sum of the number of people in our name treemap). True False

T

When you use the display method, what type of visualization is initially shown?

Table

Does the display() function take a DataFrame as a parameter, temporary view, JSON file?

Temp view

If you recalculate a DataFrame that was used to generate a chart using the display function, does the chart need to be recalculated, or is it updated automatically.

cells would have to be run again

When we loaded the business data, it included latitude and longitude fields. In Tableau, these showed up as measures, but there were also generated latitude and longitude measures. How do these differ? What are the generated latitude and longitude fields based on?

The generated latitude and longitude are aggregated values generated by Tableau. The data imported are referred to as actual Lat/Lon. Generated Lat/Lon are based on the average of those values.

What are the shelves on the marks card and what do they do to the visualization?

The shelves on the marks card allows one to add context and detail to the marks in the view. A user can add any fields to one or multiple properties to the marks card to change the level of detail on the visual. The properties of the marks card are color, size, tooltip, label, detail and shape.

When visualizing popular names, why did we need a parameter and a filter for gender?

To see either only female, only male, or both (dependent on years). Filter provided flexibility

Does Tableau infer a schema for our data when it's loaded (similar to Spark), or do we need to first define the schema? Can we change the data type of a column in our data after importing?

Type "default" in schema search, select the available option. A table tab will appear, click on search symbol (no input needed), double-click on table (in this case businessmapping).

When creating a heat map, we can either use the default setting for color (minimum to maximum intensity) or we can select the "full color range" option. For which of the following charts based on the Yelp data would the full color range option be most appropriate? a.In order to see if those users who have more fans also write more reviews, we are creating a chart of our users that shows the number of reviews written and the number of fans. B. We want to show those categories that tend towards the extremes - either 1 or 5 stars, so we are creating a chart with the average star ratings for each top-level category by metro area. C.In order to see if businesses that are open more get more reviews, we are creating a chart that shows the average number of reviews per year for businesses based on the number of hours they are open per week in each metro area. D.We want to show those categories that tend to get the most reviews in each metro area, so we are creating a chart with the number of reviews for each top-level category by metro area.

We want to show those categories that tend towards the extremes - either 1 or 5 stars, so we are creating a chart with the average star ratings for each top-level category by metro area.

Do dimensions or measures get aggregated when added to the column or row shelf?

When measures are dragged to a view, they get aggregated by default.

Can an IF - THEN - ELSE - END formula check multiple conditions?

Yes

Can calculate fields be used with marks (such as color or size)?

Yes

In Spark if we wanted to exclude rows for certain states in our business data, we could use Spark SQL and specify in the WHERE clause which states we wanted to include. Can we exclude data in the same way in Tableau? What feature would we use? As an example, if you had the category associated with each review, how could you exclude reviews in certain categories?

Yes. Drag the state column to filter. Go to wildcard. Enter whatever state in match value and choose exactly matches.

Can we define a range using a parameter? If so. how?

Yes. We can first create a calculated field with the desired range. Then, use that to create a parameter or simply add it to the filter shelf.

Assume we are creating a chart for our business data and want to show the number of businesses in each of the 10 metro areas in our data. If we wanted each show each metro area as a different color, which of the following features in Tableau would we use to set the color? a. Marks card b. pages shelf c. columns shelf d. show me pane e. filter shelf

a.

After running the display function, which of the following can you change in the generated chart? The type of chart, columns used from the DataFrame, the sort order, rows used, aggregation?

all

The categories in Yelp are hierarchical with 22 categories at the top. If we wanted to show how many times each category was used, which chart type would work well?

bar chart, treemap

What are the dimensions and measures in the Tableau interface?

fields in our business data text or date -> categorical Dimensions. Numeric -> Measures

Our data fields are measures and dimensions. When creating a scatter plot (not a matrix of plots), your row and column shelves should contain dimensions, measures, or both?

measures

Do fields you filter on need to be on the rows or columns shelves?

no

Where are pie charts a good option?

not many good reasons to use it. They can be good to visualize parts of a whole or percentages.

Can the values entered for parameters be restricted?

yes

If shown some data, can you pick the plot options for generating different charts?

yes

When filtering, can you set ranges for discrete values, numbers, or dates?

yes


Ensembles d'études connexes

Nurses touch: Wellness and self care practice assessment

View Set

APUSH Ch. 28 Multiple Choice, AP US history Chapter 30, Chapter 31 Multiple Choice, chapter 32 APUSH, APUSH Chapter 33 Multiple Choice, Chapter 34 Multiple Choice APUSH

View Set

Biology 205 : Chapter 5 ( Integumentary system )

View Set

Combo with "Mastery of the Environment: Evaluation & Intervention" and 2 others

View Set

Systems Analysis & Design - Exam 1 Review - Ch. 1 - 5

View Set

Kinematics Chapter Test Study Guide

View Set