Data Vis Quiz Until Midterm
Suppose you know you have decimal number data that ranges in value from 0 to 10. If you separate the range into five equal bins, what histogram would result from the data: 1.1, 1.2, 2.1, 2.2, 2.3, 3.1, 6.1, 8.1, 8.2? * 2, 3, 1, 3, 0 * 2, 4, 0, 1, 2 * 2, 3, 1, 1, 2 * 2, 3, 1, 0, 3
2, 4, 0, 1, 2 The five equal bins are from 0-2, 2-4, 4-6, 6-8 and 8-10. (Fortunately we don't have to worry about values on the bin boundaries.) The histogram counts the number of values in each bin. Bin 0-2 counts two values: 1.1 and 1.2. Bin 2-4 counts four values: 2.1, 2.2, 2.3, and 3.1. Bin 4-6 counts no values. Bin 6-8 counts one value: 6.1. Bin 8-10 counts two values: 8.1 and 8.2.
How many items can human working memory (short-term memory) typically hold? * 3-7 items * 30-70 items * 300-700 items * 3000 - 7000 items
3-7 items Our working memory can only hold 3-7 items at a time, though a single item in our working memory can be a collection of items in our long-term memory.
If you have a table with five fields and ten records, and you pivot three of the fields, how many records does the resulting table have? * 7 records * 10 records * 13 records * 30 records
30 records. When you pivot three fields, then you would replace each record in the previous table with three records in the new table. Each of those three records would not have the three pivoted fields, but would instead have a new field indicating the previous record's field name and a second new field indicating the previous record's field value. Since each record has been replaced by three new records, 10 fields become 30 fields.
Which of these is the best choice to visualize the sale of a shirt according to its shirt size? * A line chart * A scatter plot * A bar chart * A pie chart
A bar chart Appropriate choice to compare a quantitative dependent variable (sales) with a discrete or nominal independent variable (shirt size).
Why is filtering, which removes information from the display, an important part of the information visualization process? * Filtering removes data points which makes the visualization render faster. * A chart can sometimes contain too many visual elements that can be overwhelming, obscuring and distracting. * Removing information from a chart does not make the chart more effective. * Filtering reduces the number of records processed which makes the visualization respond faster.
A chart can sometimes contain too many visual elements that can be overwhelming, obscuring and distracting. Indeed, filtering can improve focus on a particular subset of the data of interest, and this is an important part of the information visualization process.
Which of these best demonstrates a cartogram? * A geographical map where the size of countries is proportional to their population. * A visualization of the population of countries is visualized as the area of disks, which are packed in an arbitrary order * A visualization where the names of the countries are scaled by their population and packed in an arbitrary order * A geographical map where the color of countries is assigned according to their population
A geographical map where the size countries is proportional to their population.
Which best characterizes a social network? * A graph with many low degree nodes and fewer high degree nodes * A graph with similar number of low degree and high degree nodes * A graph with fewer low degree nodes and many high degree nodes * A graph with only low degree nodes
A graph with many low degree nodes and fewer high degree nodes
What is the main advantage to data visualization by using a lens for zooming? * The lens zooms only a portion of the display which is faster than zooming the entire display. * The lens zooms only a portion of the data which is faster than zooming all of the data. * A lens separates nearby data items to help resolve detail while retaining the spatial context of the whole dataset. * A lens provides easy access to changing the magnification of the zoom.
A lens separates nearby data items to help resolve detail while retaining the spatial context of the whole dataset. The lens has a visible location in the display area of the chart which provides important context for the area being zoomed.
Given the following dataset, which best communicates the sales data? Date Sales 1/1/2020 89,660.85 2/14/2020 74,373.60 7/4/2020 88,642.80 10/31/2020 40,488.95 * A line chart connecting vertices plotted with a vertical quantitative continuous sales axes and a horizontal quantitative continuous date axis. * A scatter plot of four disjoint datapoints plotted with a vertical quantitative continuous sales axes and a horizontal quantitative continuous date axis. * A bar chart were the height of four evenly spaced bars (one for each date) is plotted on a quantitative continuous vertical axis. * A bar chart in three-dimensional perspective, where the height of four evenly spaced bars (one for each date) is plotted on a quantitative continuous vertical axis, plotted such that the line of sight positions the earlier data in front and the later data further away.
A line chart connecting vertices plotted with a vertical quantitative continuous sales axes axis and a horizontal quantitative continuous date axis. Both sales and dates are quantitative continuous. The dates are not evenly spaced, so plotting dates along a quantitative continuous axis accurately spaces them according to the duration between the sales data points. The lines connecting the data points reasonably communicate that sale dollar amounts would be continuous between the dates of the data points, and are approximated by linear interpolation. The use of three-dimensional perspective rendering is not well motivated by this data. The data points only have two values, and additional values are more effectively integrated through mark variation in 2-D than by utilizing a third dimension. Also, the perspective distortion makes it difficult to determine the relative differences of the bar top heights.
Which of the following would effectively visualize these four fields: Year, Country Name, Region, Population * A table of bar charts * A scatter-plot of tables * A table of tables * A table of scatterplots
A table of bar charts For example, the rows could be [Region][Country Name] and the columns could be [Year][Population] which would produce a table pane for each Country Name and Year, and this pane would hold a single horizontal bar indicating the population of that country that year.
Evaluate dashboard primary goal and aesthetics, to make the viewer become more engaged with the dashboard and its underlying data.
An appealing data visualization helps with engagement, but is not a primary concern.
Which of these is a more perceptually accurate mapping of a quantitative value than the other options? * As the color of a bar in a bar chart * As the length of a bar in a Gantt chart * As the y coordinate in a scatter plot * As the angle in a pie chart
As the y coordinate in a scatter plot Position is better than length, angle, and color for mapping a quantitative value.
On which of these colors does the human eye have the most difficulty focusing? * Blue * Green * Red * Yellow
Blue Because the chromatic aberration of the eye's lens, the blue end of the optical spectrum of light tends to focus off the retina. If you have sharp details that need to be displayed in a shade of blue, try to avoid pure blue hues.
A light gray box drawn on top of a dark gray background will make the light gray box appear . * Darker * The same as it appears on a white background * Brighter
Brighter The dark gray box will make the light gray box appear even brighter because the human visual system's lateral inhibition will detect and accentuate the difference.
Given a plot of life expectancy based on country and birth year, you look up your country and birth year, find the displayed life expectancy, and conclude you will probably live that long. This is an example of . * Subductive reasoning * Inductive reasoning * Abductive reasoning * Deductive reasoning
Deductive reasoning. This is an example of deductive reasoning because we are drawing the conclusion implied by the given data.
When creating an overview visualization of a large dataset, it is most important to: * Use many different colors to make it appealing to draw the viewer in to investigate further * Display only an important subset of the datapoints so as to not overwhelm the user * Pack as many details as possible into the display to be as efficient and informative as possible * Display all of the data using a simple representation and axes that spread the data out as much as possible
Display all of the data using a simple representation and axes that spread the data out as much as possible. The goal of an overview is to allow the user to get their head around all of the data, without overwhelming the user with details.
Which of these is the least important criterion when visually ordering the elements of a chart. * Displaying plotted measure values in order from smaller to larger to understand their extremes. * Listing ordinal field values in order to make them easier to find in a list. * Clustering data based on similarity of one or more fields. * Displaying field values in database record order to facilitate interactivity through more rapid access times.
Displaying field values in database record order to facilitate interactivity through more rapid access times. The speed advantage for this ordering is likely negligible, and the user may infer some importance to the ordering.
Which of these is the least important criterion when visually ordering the elements of a chart. * Listing ordinal field values in order to make them easier to find in a list. * Displaying the field values in database record order to facilitate interactivity through more rapid access times. * Clustering data based on similarity of one or more fields. * Displaying plotted measure values in order from smaller to larger understand their extremes
Displaying field values in database record order to facilitate interactivity through more rapid access times. The speed advantage for this ordering is likely negligible, and the user may infer some importance to the ordering.
Multidimensional scaling (MDS) and finding the shortest route among available flight connections from one airport to another.
Distances between large cities as shows an example of MDS, but routing specifically does not figure into MDS.
When visualizing data, you should keep your eyes focused on one point for the entire duration of the visualization. * False, because your visual system will play tricks on your perception of the data. * True, because your visual system will better detect any changes to datapoints during the visualization.
False, because your visual system will play tricks on your perception of the data. Focusing on a single point causes a temporal inhibition in the light sensors and can play tricks in your perception.
Listing ordinal field values in order to make them easier to find in a list.
Hick's law says that one finds thing in logarithmic time in an ordered list versus linear time in an unordered list.
Which one of the 3-D depth cues below indicates surface orientation? * Shadowing * Occlusion * Stereopsis * Illumination
Illumination Occlusion and shadowing only indicate the surface closest to the observer (or light source), and stereopsis provides relative cues of distance across an image, whereas the illumination of a surface changes based on how the surface is facing the light source (and for specular reflection, the viewer).
Evaluate importance when visually ordering the element of a chart to display plotted measure values in order from smaller to larger to understand their extremes.
Important as this can easily answer questions about smallest to largest.
Evaluate: Filtering reduces the number of records processed which makes the visualization respond faster.
Indeed speed is important for interactive data visualization, but interactivity was not described as an important part of the information visualization process.
Which of the following is NOT an important part of the ""process and provenance"" of interactive dynamics of visualization as outlined by Heer and Shneiderman when documenting your visualization? * Recording previous charts the user has tried. * Indicating how to change the view e.g. from one chart type to another. * Guiding someone else through a visualization story. * Sharing a visualization online so others can see.
Indicating how to change the view e.g. from one chart type to another. This is an aspect of view manipulation and is indeed not part of the ""process and provenance"" used for documenting your visualization experience.
In the Data Visualization Framework, what does the Mapping Layer do? * It associates geometry with data. * It ensures the data is in the proper format for visualization. * It is the geographical map layer underneath a layer containing a chart of locations. * It maps a user interaction into chart actions.
It associates geometry with data. The mapping layer associates appropriate geometry with corresponding data channels.
Evaluate use many different colors to make it appealing to draw the viewer in to investigate further when creating an overview visualization of a large dataset.
It is helpful for a chart, including an overview to be appealing, but this is not necessary nor even high priority for effective visualization. Furthermore, adding many colors for the sake of visual appeal could create confusion and misrepresent the data.
What does ""brushing"" mean in a dashboard visualization? * Selecting data points by sweeping a large circular cursor over them. * Manipulating a filter in one chart to see its effect in other charts. * Filtering every other data item from view. * User selection of colors for chart elements to make them more semantically meaningful.
Manipulating a filter in one chart to see its effect in other charts. Through an ""action"" that relates the selections and filters between multiple charts.
Betweeness centrality of a node for a graph with only low degree nodes.
Not quite-a social network has both low and high degree nodes.
Evaluate zooming a scatterplot happens when you drag a new field to the filter shelf and select a strict subset of the field's values.
Not zooming; this will reduce the number of records used to generate a scatterplot but won't necessarily zoom on any particular section of the scatterplot.
Which one of the 3-D depth cues below is the strongest? * Occlusion * Shadowing * Lighting * Stereopsis
Occlusion Because if a point on object A and object B project to the same point on the image plane, the fact that you see object A and not object B at that point provides incontrovertible evidence of a depth ordering that A is closer than B.
Evaluate ""process and provenance"" and recording previous charts the user has tried.
Part of ""process and provenance"" and documents what was tried and rejected in the visualization process, and can be useful to perhaps try again.
Which is NOT an element of vector graphics? * Vertices * Strokes * Fills * Pixels
Pixels While vector graphics are sometimes converted to a rectiliniear raster graphics array of pixel color values for display on a raster device, some applications work directly from the vector graphics specification, such as plotter or a laser display.
Which of these is the best example of providing details on demand for the purpose of information visualization? * Placing the mouse pointer over a datapoint brings up a popup window with more information about the datapoint. * Progressively adding more detail to an overview visualization at the rate of one item every 10 seconds. * Clicking on a datapoint to remove it from the chart to make room for a label. * Intervieing the intended dashboard user to determine which information is important to display.
Placing the mouse pointer over a datapoint brings up a popup window with more information about the datapoint. Example used to to demonstrate details on demand. Another good example would be selecting a data point to fill a neighboring window with further data about the data point, or clicking on a datapoint to go to a second screen showing further data on it.
Which of these mappings is the best choice if you want to visualize a data value but you don't know if the value corresponds to a shirt's price, size, or color? * Position * Length * Color * Area
Position It's the most effective perceptual mapping for quantitative (price), ordered (size), or nominal (color) values.
Which clause of an SQL query corresponds to fields dragged onto Tableau's filter shelf? * The ""Where"" clause. * The ""From"" clause. * The ""Order"" clause. * No clause, the fields would follow the ""Select"" statement as the fields that should be queried so they can be filtered from the result.
The ""Where"" clause. In SQL, the ""Where"" clause indicates a filter for the query.
Which one of these Tableau functions is not an aggregation of a measure that projects the values of the measure along one or more dimensions into a single value? * The attribute function ATTR() * The sum function SUM() * The minimum function MIN() * The function COUNT() that counts the number of values of a measure
The attribute function ATTR() The attribute function checks to make sure that only one value for a measure is reported by the query. (However, the one value could be reported multiple times.) Attribute function is not projecting multiple values into a single value as an aggregation would, because if there were multiple different values present, the attribute function would result in the non-numeric asterisk ""*"" character.
Evaluate display only an important subset of the datapoints so as to not overwhelm the user when creating an overview visualization of a large dataset.
The goal of an overview is to first display all of the data, and later to allow the user to focus on a subset of the data.
You're given two circles of the same size. The left one is surrounded by smaller circles and the right one is surrounded by larger circles. Which appears larger? * The left one * The right one * Neither
The left one The perceptual processing of the human visual system is designed not to accentuate differences
Which of these choices is the most perceptually accurate way to map a quantitative value? * The gray level of a bar in a bar chart * The area of a bar in a bar chart * The volume of a box in a 3-D bar chart * The length of a bar in a bar chart
The length of a bar in a bar chart Length is more perceptually accurate at mapping quantitative values than the other choices.
Which best characterizes the betweenness centrality of a node? * The total distance to all other nodes * The portion of all the shortest paths between any two nodes that pass through the node * The inverse of the distance to the farthest node * The number of nodes connected to a node by some path
The portion of all the shortest paths between any two nodes that pass through the node The betweenness centrality (BC) is high for a node when it is visited often along the shortest path between other nodes, and can be used to simplify a graph (by removing edges between low BC nodes) or to reveal communities (by removing high BC nodes).
What is crossfiltering? * Filtering the cross product of two fields. * Filtering one field but observing the effects in a second correlated field. * A selection region created by dragging the mouse diagonally in the shape of a ""red cross"" thickened plus sign instead of the usual triangle, designed to capture more datapoints along the field axes and less along the diagonals where the fields depend more on each other. * The same filter is used for multiple charts.
The same filter is used for multiple charts. The benefit is that controlling the filter through one chart has its effects realized in a second chart.
Which of these best characterizes what principal component analysis (PCA) is used for? * To find low dimensional structure in a high dimensional dataset * To find in which dimension of a high dimensional dataset the data varies the most * To increase the dimensionality of a low dimension dataset to reveal hidden structure * To find the best way to run an elementary school
To find low dimensional structure in a high dimensional dataset PCA uses the eigenvectors corresponding to the largest eigenvalues of the covariance matrix of a high dimensional dataset to show the directions in the high dimensional space of the data where the data is varying the most.
Recall that we had a Tableau table of regions that displayed the calculated field [Regional Total] which was defined as ""{Include [Year]: SUM([Value]}]."" The dimension [Year] was filtered to include only years from a particular decade. Thus the AVG([Regional Total]) reported the average over a decade of years of the regional total value. What would be reported by MAX({Include [Country]: AVG([Value[})? * The values would be averaged over a decade for each country in the region, and the country's decade-average value that is the largest would be returned. * The values would be averaged over all countries in the region for each year, and the largest year's value would be returned. * The values would be averaged for each year for each country in the region, and the value for the country in a year that is the largest would be returned. * The values would be averaged over the decade for each country in the world, and the country's value that is the largest would be returned.
The values would be averaged over a decade for each country in the region, and the country's decade-average value that is the largest would be returned. The simple aggregation AVG([Value]) would have returned the average value for all countries in the region for each year. The LOD expression ""Include [Country]"" means that the country dimension is no longer part of the AVG([Value]) aggregation, so a separate AVG() aggregation is computed for each country that computes the average value over a decade, and then the maximum of those decade averages computed for each country is returned.
Which best characterizes a tree map? * The visualization of a tree of quantitative values as a hierarchy of rectangles where rectangle area corresponds to data value * The simplification of a tree to its essential backbone by removing nodes with low betweenness centrality. * The regions on a graphical choropleth map shaded green to indicate forests. * The visualization of a tree where the root node is placed at the center of a circle and the leaf nodes are positioned on the circle.
The visualization of a tree of quantitative values as a hierarchy of rectangles where rectangle area corresponds to data value The rectangles corresponding to sibling nodes fit inside the rectangle corresponding to their parent node.
Evaluate ""brushing"" in a dashboard with filtering every other data item from view.
This could help clarify the data in a view, but is not what ""brushing"" means.
Evaluate ""brushing"" in a dashboard with selecting data points by sweeping a large circular cursor over them.
This is a handy ""painting"" approach for selection, but is not what ""brushing"" means.
Which of these best characterizes what multidimensional scaling (MDS) is used for? * To find the shortest route among available flight connections from one airport to another * To layout nodes in a complete graph (every node connected to every other node by an edge) according to desired edge lengths * To find the directions in a high dimensional dataset where the data varies the most * To layout nodes in a planar graph so that the edges do not cross
To layout nodes in a complete graph (every node connected to every other node by an edge) according to desired edge lengths MDS is used to layout points when you are given the desired distances between the points.
What is the primary goal of the layout of a visualization dashboard? * Aesthetics, to make the viewer become more engaged with the dashboard and its underlying data * To minimize the mouse interaction distance between common user input options, such as selection, menus, and buttons. * To visually organize the charts so the viewer can better find and understand the data. * Size, to ensure that the charts use every bit of space available to maximize the visual information conveyed by the dashboard
To visually organize the charts so the viewer can better find and understand the data. The goal of a visualization dashboard is understanding the data through multiple charts. An important feature alongside the layout of the dashboard is a user's ability to navigate the dashboard.
In which one situation below would be the most appropriate to treat date as a category variable instead of a quantitative continuous variable. * Total annual sales for the year in order of best year to worst year. * Total sales for the decade in order of decade. * Sales in January 1st of each year in order from 2000 to 2020. * Sales at randomly chosen dates throughout the year, in date order.
Total annual sales for the year in order of best year to worst year. Because the order is not the date order, it would be inaccurate to treat measures over the dates as continuous and one would not want to interpolate between them.
In what order does a data visualization graphics pipeline process information? * Pixel processing, then vertex processing, then rasterization * Rasterization, then vertex processing, then pixel processing * Pixel processing, then rasterization, then vertex processing * Vertex processing, then rasterization, then pixel processing * Vertex processing, then pixel processing, then rasterization * Rasterization, then pixel processing, then vertex processing
Vertex processing, then rasterization, then pixel processing The graphics pipeline accepts vector graphics primitives described as vertices, so it processes vertices first. Rasterization converts vector graphics primitives into the pixel locations used to display them on a display screen. Pixel processing is used to further process the pixels output from rasterization, e.g, , to compute individual colors.
How could zooming be considered filtering? * Zooming is a filter on the range of values of the row and column fields of a scatterplot. * Zooming a scatterplot happens when you drag a new field to the the size shelf. * Zooming is not in any way filtering. * Zooming a scatterplot happens when you drag a new field to the filter shelf and select a strict subset of the field's values.
Zooming is a filter on the range of values of the row and column fields of a scatterplot. A scatterplot uses the row/column field values as the x and y positions, so limiting their range would map a subset of the elements to the chart display area and effectively zoom the scatterplot.
Clustering
While not a ""sort"" it becomes an important visualization tool for discovering structure in a dataset.
rasterization
converts vector graphics primitives into pixel locations used to display them on a display screen
Principal component analysis
dimensionality reduction method--like multidimensional scaling
cartogram
distortion of a map of regions by data values associated with those regions"
deductive reasoning
drawing the conclusion implied by the given data
occlusion and shadowing
indicates surface closest to the observer (or light source)
Graph centrality
inverse of the distance to the farthest node
Which of the following is a measure, as opposed to a dimension, in the WDI database? * indicator code * population * country code * year
population Indeed population, specifically total population, is a measure. It is a quantitative value that as reported per country per year.
stereopsis
provides relative cues of distance across an image
Isolation metric
total distance to all other nodes
Vertices
used by vector graphics to describe shapes, such as polygons
Fills
used by vector graphics to indicate the interior of a shape
Strokes
used by vector graphics to indicate the outline of a shape
pixel processing
used to further process the pixels output from rasterization, e.g., to compute their individual colors
Suppose we have data in the following table. Country Name Year France 1980 France 1990 EastGermany 1980 WestGermany 1980 Germany 1990 What would be the result of the cross product operation [Country Name] x [Year] * {France1980, France1990, EastGermany1980, WestGermany1980, Germany1990} * {France1980, France1990, EastGermany1980, EastGermany1990, WestGermany1980, WestGermany1990, Germany1980, Germany1990} * {France19801990, EastGermany1980, WestGermany1980, Germany1990} * {France, EastGermany, WestGermany, Germany, 1980, 1990}
{France1980, France1990, EastGermany1980, EastGermany1990, WestGermany1980, WestGermany1990, Germany1980, Germany1990} Every element of one field combined with every element of the other field, regardless of whether or not a particular combination appears in a record in the table.
Suppose we have data in the following table. Country Name Year France 1980 France 1990 EastGermany 1980 WestGermany 1980 Germany 1990 What would be the result of the nest operation [Country Name] / [Year]? * {France19801990, EastGermany1980, WestGermany1980, Germany1990} * {France, EastGermany, WestGermany, Germany, 1980, 1990} * {France1980, France1990, EastGermany1980, WestGermany1980, Germany1990} * {France1980, France1990, EastGermany1980, EastGermany1990, WestGermany1980, WestGermany1990, Germany1980, Germany1990}
{France1980, France1990, EastGermany1980, WestGermany1980, Germany1990} The nest operation combines the values of a pair of fields but only the field values that co-exist in the same record.
