Visualization
What is a simple graph?
-No multi-edges -No loops
What are ways to visualize the graph?
-Node-link diagram -Adjacency matrix -Enclosure
What are ways for temporal partitioning navigation?
-Panning -Rotation -Zooming: Geometric and semantic
What are the different types of fields?
-Scalar: univariate (one attribute per cell) -Vector: one vector per cell -Tensor: multi-dimensional arrays at each point
For ordered data, what are the three types?
(1) Sequential: Homogeneous from min to max (number of people in countries) (2) Diverging: Multiple sequences meeting at a common zero point (elevation data with sea level) (3) Cyclic: Time (hours, week, month, year)
How do you construct a scatterplot matrix (SPLOM)
-N dimensions -> N(^2) scatter plots -Plot every pairwise of the dimension -Create symmetry along the diagonal
How do we deal with large datasets?
-Temporal partitioning -Spatial partitioning -Aggregation and filtering
What is Ames Room and what does it show us?
A distorted room used to create an optical illusion that a person is bigger. It shows us that what we see depends on our expectation and that our relative judgement is stronger than our absolute judgement.
What is a directed graph?
A graph that discerns between A -> B and B -> A
What is a hypergraph?
A graph with edges connecting any number of vertices
What is a tree?
A graph with no cycles
What are force directed layouts?
A physical model where edges = spring and vertices = repulsive magnets (think of the spring layout in networkx) Aim: to minimize the overall summed-up forces
What is a network/graph composed of?
A set of vertices and a set of edges connecting the vertices
What are the different actions to achieve different targets?
Actions: Analyze, search, query Targets: All data, attributes, network data, spatial data
What are the three types of share data?
All data shared: same data, different encoding Overview + detail: one view shows entire data, other one user can select a subset to view Small multiples: different partition of data is in each view with the same encoding
What are fields?
Attribute values associated with cells (which contain data from a continuous domain)
For parallel coordinates, how are attributes and items represented?
Axes represent the attributes and lines connecting them represent the items
Name a graph that can compare two or more attributes
Bar chart: for categorical Line chart: showing trends over time
What are marks?
Basic geometric elements in an image -points, lines, areas
What dimension of color should you use when encoding order?
Brightness or saturation
What is presentation?
Showing something you already know
What are attribute types?
Categorical Ordered
What are channels?
Changed appearance of marks based on attributes -Position (horizontal, vertical, both), color, shape, tilt, size (length, area, volume)
What are the operations for categorical data?
Compare equality only with no implicit order
What are the three types of map projections?
Cylinder projection: allow the entire surface to be visible Plane projection: A tangent point corresponds to a center point in the projection Cone projection: Latitude = circles around the projection, Longitude = straight lines
What is the data-ink ratio?
Data ink ---------------------------- Total ink used in graphic
What is the difference between a data model and a conceptual model?
Data model: low-level description of data (e.g. floats) Conceptual model: Add mental construction on top of the data model (includes semantics, support reasoning)
What are the big 4 Vs of data?
Data... Volume Veracity Velocity Variety
What is Tufte's Design Principles?
Design should be... -Clear and detailed with proper labelling and scales -The size fo the graphic effect should be proportional to numerical quantities -We should maximize the data-ink ratio -Chart hunks should be avoided (extra visual elements that don't add to the message are a distraction)
How should you deal with the overplotting problem?
Edge bundling: edges that link the same group of nodes are tied together
What are geometry dataset types?
Explicit spatial positions
What is the purpose of visualization
Exploration, confirmation, presentation
What are the two types of table?
Flat table: One item per row, each column is an attribute, unique (implicit) key, no duplicates Multidimensional table: Index based on multiple keys
What is Anscombe's Quartet?
Four datasets, each of two dimensions (x,y) with the same statistics (whether mean, variance, correlation, or linear regression), but they are different in variables Importance: Shows us what we would miss if we didn't visualize
What is the hierarchical edge bundling?
From start to end, we draw a curve that goes along the hierarchy rather than a straight line through
What are data types?
Fundamental units/building blocks of dataset types
What is the difference between general purpose tools and narrow tools?
General purpose: Tableau - Flexible, can handle a wide range of data - Cannot be used to solve complex domain-specific problems Narrow tool: -Designed for specific contexts and datasets -The designer must make a lot of choices and that user cannot override them
What is geometric vs semantic zooming?
Geometric: just makes things bigger Semantic: Adds details as it zooms
How does enclosure on tree maps work?
Given a hierarchy, we enclose everything within the branch
Name some graphs that help convey a many attributes
Glyphs: Encode multiple attributes in an object or symbol Heat map: matrix data-- each cell is a pixel value encoded with color --strength: it easily scales to a larger margin, it' good for homogeneous data --tip: Heat map data must have a meaningful order, otherwise it cannot be done
What is great about the shape channel?
Good for many classes and doesn't require grouping/ordering
What is an adjacency matrix good for?
Good for neighborhood-related tasks (NOT for path-related tasks)
Name some graphs to show data distribution
Histogram Density plot Box plot (this is a good one because it's very compact) Violin plot (box plot + probability density function)
What are the dimensions of color?1
Hue Saturation Brightness
What are the Gestalt Laws? Name a few; which is the strongest?
In summary: Humans perceive objects as organized features (the whole is greater than the individual parts) A few: Proximity, containment, connection, similarity, continuity, common fate, closure Connection is the STRONGEST
Why do we depend on vision?
It has good bandwidth relative to other senses and it works in parallel with the other senses
What is great (and not so great) about the color channel?
It is good for categorical data BUT it doesn't work well for quantitative data because there is no order and it only works well when you have a limited number of categories
What is great about the value/luminance/saturation channel?
It's good for quantitative data when length/area is used
What is overplotting? How do you address this?
It's when you have too many items plotted on top of each other Solution: decrease opacity, sample your data, change the scale
What are the 5 data types? Describe them.
Items: Discrete individual entities Attributes: Measured/observed/logged properties Links: Relationships between items Positions: Spatial data providing location in the 2D/3D space Grids: Sampling strategies for continuous data
What is the difference between bar charts and line charts?
Lines imply connections and you cannot use them for categorical data
What is linking and brushing?
Linking: Coordination between the MCV views Brushing: To select groups of data points in order to see the effect across the views
What is low-level vision driven by? What drives attention?
Low-level vision: Object features Attention: Existing knowledge, expectations, and goals
What are the two types of channels? What is the most effective channel?
Magnitude channels (how much?) -Ordinal/Quantitative: Position, length, saturation Identity channels (what/where) -Categorical data: shape, color (hue), spatial region Position is the MOST EFFECTIVE
What is a Phrase Net?
Network diagrams that link subjects that occur in the same sentence
What is perceptual hysteresis?
Once you see something, you cannot un-see it
What is an isarithmic map?
One where the color coding is continuous (visually blends well)
What are the operations for ordered data?
Ordinal: Greater/less than the given Quantitative: --Interval: Have arbitrary zero points, cannot compare directly, only differences can be compared. e.g. dates, location, temperature --Ratio: Have absolute zero points when there is nothing to be measured, can measure ratios/proportions, e.g. length, mass
What are the different types of preattentive features? Which is the most distracting?
Orientation, length, closure, size, curvature, density, hue, flicker, direction of motion Flicker is most distracting and grabs the most attention
What is the difference between perception and cognition?
Perception: identification and interpretation of sensory information from a physical stimulus; it requires no conscious effort Cognition: processing information, applying knowledge, and drawing conclusions Think: Hearing vs understanding what someone is saying
Name some graphs that can show data composition
Pie chart: encodes data using angle and area Stacked bar chart: encodes data using length of a line
What is a dot map?
Places a dot for every data item. No matter what data you have, this will result in a population map. Use it wisely.
For each of the channels, name their characteristics
Position: S: Yes, A: Yes, Q: Yes, O: Yes, L: fairly big Length and Area: S: Yes, A: Yes, Q: Yes, O: Yes, L: high Shape: S: Yes, A: limited, Q: No, O: No, L: vast Value/ Luminance/ Saturation: S: Yes, A: Yes, Q: Somewhat, O: Yes, L: limited Color: S: Yes, A: Yes, Q: No, O: No, L: limited
What is Mercator projection?
Projection onto a cylinder wrapped around the globe such that the angles are preserved (all of the lines remain straight and do not curve)
What are the 'semantics' of data
Real world meaning of the data
What are the three spatial axis orientations?
Rectilinear layout: scatterplots, bar charts Parallel layout: parallel coordinates Radial layout: radial bar chart
What is the most common type of color blindness?
Red-green blindness
Why do we use computers in the visualization process?
Scale (easier than drawing by hand, interactions let us 'drill down' into the data, we can use algorithms) Efficiency (we reuse charts for different datasets) Quality (gives us precise data-driven rendering) Story-telling (scripted work)
Name a graph that can show the relationship between two or more attributes
Scatterplot: uses spatial position to encode values
What are the characteristics of channels?
Selective: Can we differentiate between two marks? Associative: Does it support grouping? Quantitative: Can we quantify the difference between two marks? Order: Can we see a change in order? Length: How many unique marks can we see?
What are the major design choices in multi-coordinated views?
Share encoding: Do the views have the same visual encoding? Share data: How much data is shared between the two views? Share navigation
What is the lie factor
Size of effect shown in graphic ---------------------------------- Size of effect in data
What makes up spectral colors vs non-spectral colors?
Spectral color: a single wavelength Non-spectral: Multiple wavelengths
How do you show data composition over time?
Stacked area chart
What are the differences between structured and unstructured data?
Structured: Known data types and semantics Unstructured: No predefined model, may be text/image/video-heavy
What is a choropleth map? What is the problem with this?
The area is uniformly colored/shaded proportional to an attribute. Problemo: Larger areas appear more important Solution: Use a proportional symbol map-- scale the size of the point by the population
What is change blindness? What's an example of when we may be worried about it?
The details of an image cannot be remembered across separate scenes if we have not really focused on them. Interrupting amplifies this effect. We must mitigate this when using animation to encode time-dependent data.
Why do we use interactivity?
The limitations of a single static view has on people's experience, it only shows one aspect of the visualization's potential
What is visualization?
The process that transforms data into interactive graphical representations
What is a cartogram?
The size of the region scaled to an attribute value. Please don't ever do these.
What are dataset types? Name the 5 and what they are made of.
They are collections of information that are the target of our analysis. (1) Tables: items, attributes (2) Networks/trees: items (nodes), links, attributes (3) Fields: Grids, positions, attributes (4) Geometry: items, positions (5) Clusters/sets/lists: items
How do multiple coordinated views work?
They use space to show different parts of the data in linked views.
What is task abstraction? Give an example.
To strip away the jargon and transform domain-specific tasks into abstract tasks Domain specific: Is it better to water flowers once or twice a week? OR Do patience that are over 60 years old have a better prognosis? Abstract task: Are the values different between the two groups?
What is Exploratory Data Analysis?
To summarize main characteristics without having formulated a prior hypothesis; encourages people to visually examine their datasets and formulate a hypothesis to test example: The London Epidemic
What is confirmation used for?
To verify or falsify a hypothesis
What are map projections?
Unfold the hull of a sphere to fit onto a 1-dimensional plane
What is the strength of radial layouts?
Visualizing cyclic patterns
What is sampling vs interpolating used for?
We sample to represent continuous data. We interpolate to reconstruct a new view of the data
What is redundant encoding?
When many channels are used to represent one piece of information
What is preattentive processing?
When properties are detected by our low-level visual system-- we manipulate these to draw the audience's attention to certain areas of interest
What is inattentional blindness?
When someone fails to notice an otherwise clearly visible stimulus in plain sight. It happens because of our expectation and where we have focused our attention. Think of the 🐒
When should you automate/not visualize?
When we have well-defined questions and a well-defined dataset
What is the process related to data type?
When we obtain the data we structure it as such: *Data model: Given floats 32.5 and 54.0 *Conceptual model: Provide the semantics = temperature *Data type: Then construct the classes --Quantitative (numerical temperature) --Ordinal (hot, warm, cold) --Categorical (burned vs not burned)
When should you visualize?
When you don't know the questions to ask in advance/ you have ill-specified problems
Why is the visualization idiom design space use?
With a large design space we can make many choices (warning: many things can also go wrong)