Data Visualization
spaghetti chart
a chart depicting possible flows through a system using a line for each possible path
shot chart
a chart that displays the location of shot attempts by a basketball player during a basketball game with different symbols or colors indicating successful and unsuccessful shots
bar chart
a chart that shows a summary of categorical data using the length of horizontal bars to display the magnitude of a quantitative variable
column chart
a chart that shows numerical data by the height of a column for a variety of categories or time periods
high-low-close stock chart
a chart that shows the high value, low value, and closing value of the price of a share of stock over time
funnel chart
a chart that shows the progression of a numerical variable to typically smaller values through a process, for example, the percentage of website visitors who ultimately result in a sale
treemap
a chart that uses the size, color, and arrangement of rectangles to display the magnitudes of a quantitative variable for different categories, each of which are further decomposed into subcategories
stacked column chart
a column chart that shows part-to-whole comparisons, either over time or across categories. different colors or shades of color are used to denote the different parts of the whole within a column
color wheel
a common chart used to show the relationships between primary secondary and tertiary hues for a primary color model
data dashboard
a data visualization tool that gives multiple outputs and may update in real time
proximity
a gestalt principle stating that people consider objects that are physically close to one another as belonging to a group
enclosure
a gestalt principle that states that objects that are physically enclosed together are seen as belonging to the same group
connection
a gestalt principle that states that people interpret objects that are connected in some way as belonging to the same group
control chart
a graphical display in which a variable of interest is plotted over time relative to lower and upper control limits
stock chart
a graphical display of stock prices over time (high-low-close is type of stock chart)
scatter chart
a graphical representation of the relationship between two quantitative variables. one variable is shown on the horizontal axis and the other is shown on the vertical axis
flicker
a movement type of pre-attentive attribute
motion
a movement type of pre-attentive attribute that involves directed movement and can be used to show changes within a visualization
random variable
a quantity whose values are not known with certainty
bubble chart
a scatter chart that displays a third quantitative variable using different sized dots, which we refer to as bubbles
diverging color scheme (diverging color palette)
a set of colors used in a chart or in a series of related charts to describe values of a quantitative variable for which there is meaningful reference value, such as a target value or the mean (heat map colors)
categorical color scheme (categorical color palette)
a set of colors used to describe a categorical variable when the categories have no inherent ascending or descending order
sequential color scheme (sequential color palette)
a set of colors used to describe the values of a quantitative variable or a categorical variable when the categories have an inherent ascending or descending order (map of america example, gradient coloring of same color)
frequency distribution
a summary of data that shows the number of (frequency) of observations in each of several nonoverlapping bins(classes)
orientation
attribute associated with form. it refers to the relative positioning of an object within a data visualization
form
attributes of orientation, size, shape, length, and width
geographical map
chart that shows characteristics and the arrangement of the geography of our physical reality
clustered column chart
chart that shows multiple variables of interest on the same chart, with the different variables usually denoted by different colors or shades of a color
sankey chart
chart that typically depicts the proportional flow of entities where the width of the line represents the relative flow rate compared to the widths of the other lines
cross-sectional data
collected from several entities at the same or approximately the same point in time
analogous colors
colors adjacent to each other on a color wheel
complementary colors
colors that are directly opposite each other on a color wheel
serif fonts
contain serifs - traditional text for print/paper viewing
categorical variable
data for which categories of like items are identified by labels or names. arithmetic operations cannot be performed on categorical variables
categorical data
data for which categories of like terms are identified by labels or names. arithmetic operations cannot be performed on categorical data
quantitative variable
data for which numerical values are used to indicate magnitude, such as how many or how much (can be counted up)
hierarchical data
data that can be represented with a tree-like structure where the branches of the tree lead to categories and subcategories
hue
defines the base of the color
variation
differences in values of a variable over observations
sans-serif fonts
do not contain "serifs" - cleaner text for screen viewing
warm hues feelings
energy passion and danger (yellow, orange, red)
preattentive attributes
features of a data visualization that can be processed by iconic memory. pre-attentive attributes related to visual perception are generally divided into four categories: color, form, spatial positioning, and movement
relative frequency
frequency measure in a distribution analysis that computes the fraction or proportion of observations in each of several nonoverlapping bins (classes)
choropleth map
geographic map that uses shades of a color, unique colors, or symbols to indicate quantitative or categorical variables by geographic region or area
length
horizontal vertical or diagonal distance of a line or bar/column
distribution
how items are dispersed
correlation/relationship
how two variables are related to one another
spatial positioning
location of an object within some defined space
prescriptive analytics
mathematical or logical models that suggest a decision or course of action (ie. mathematical optimization models, decision analysis, and heuristic or rule-based systems)
data-ink ratio
measures the proportion of "data-ink" to the total amount of ink used in a table or chart where data-ink is the ink used that is necessary to convey the meaning of the data to the audience
quantitative data
numerical values are used to indicate magnitude, such as how many or how much. Arithmetic operations such as addition subtraction multiplication and division can be performed on quantitative data
color
pre-attentive attribute for data visualization that includes the attributes of hue, saturation, and luminance
color
property of an object that results from the way the object reflects or emits light
colorblindness
reduced ability to accurately perceive some colors
gestalt principles
refer to the guiding principles of how people interpret and perceive what they see
size
refers to the relative amount of 2D space that an object occupies in a visualization
shape
refers to the type of object used in data visualization
ranking
relative order of items
saturation
represents the amount of gray in the color and determines the intensity or purity of the hue in the color
luminance
represents the relative degree of black or white in the color (dark to light)
population
set of all elements of interest in a particular study
color scheme
set of colors (hues, saturations, and luminances) that are to be used in a data visualization or a series of related data visualizations
funnel chart
shows progression of quantitative variable for various categories from larger to smaller values
line chart
similar to scatter charts in that each point represents a pair of quantitative variable values, but in a line chart, a line connects the points
serifs
small end-of-stroke features that are visual in fonts
cool hues feelings
soothing, calm, reassuring (purple, blue, green)
similarity
states that people consider objects with similar characteristics as belonging to the same group
sample
a subset of the population
primary hues
RED GREEN BLUE
When should tables be used instead of a chart?
- needs to refer to specific numerical values - needs to make precise comparisons between different values and not just relative comparisons - values being displayed have different units or very different magnitudes.
charts that show ranking
-bar -column
Charts that show composition
-bar -stacked bar -stacked column -treemap -waterfall -funnel
charts to avoid
-radar chart -pie chart -area chart -combo chart
Charts that show distribution
-scatter -bubble -column -bar -choropleth map
Charts that show a relationship
-scatter -bubble -line -stock -column -bar -heat map
big data
Any set of data that is too large or complex to be handled by standard data-processing techniques using a typical desktop computer. Big data includes text, audio, and video data.
time series data
Data collected over several points in time (minutes, hours, days, months, years, etc.)
Data Visualization
The graphical representation of data and information using displays such as charts, graphs, and maps
bins
The nonoverlapping groupings of data used to create a frequency distribution. Bins for categorical data are also known as classes
Analytics
The scientific process of transforming data into insights for making better decisions
predictive analytics
techniques that use mathematical models constructed from past data to predict future events or better understand the relationship between variables (ie. regression analysis, time series forcasting, computer simulation, and predictive data mining)
decluttering
the act of removing the non-data-ink in the visualization that does not help the audience interpret the chart or table
four v's - volume
the amount of data generated
cognitive load
the amount of effort necessary to accurately and efficiently process the information being communicated by a data visualization
color symbolism
the cultural meanings and significance associated with color
four v's - variety
the diversity in types and structures of data generated
white space
the portion of a data visualization that is devoid of markings
short-term memory
the portion of memory that holds information for about a minute. it utilizes chunking, or grouping things together to hold about four chunks of visual information at one time
iconic memory
the portion of memory that is processed fastest. it is process automatically and the information is held there for less than a second
long term memory
the portion of memory where information is stored for an extended amount of time. most long-term memories are formed through repetition and rehearsal but can also be formed through clever use of story telling
visual perception
the process through which our brains interpret the reflections of light that enter our eyes
four v's - veracity
the reliability of the data generated
descriptive analytics
the set of analytical tools that describe what has happened (ie. data queries, reports, descriptive summary or statistics, and data visualization)
four v's - velocity
the speed at which the data are generated
color psychology
the study of the innate relationships between color and human behavior
width
thickness of the line or bar/column
heat map
two dimensional graphical representation of data that uses different shades of color to indicate magnitude
waterfall chart
visual display that shows the cumulative effect of positive and negative changes on a variable of interest. the basis of the changes can be time or categories and changes are represented by the columns anchored at the previous time or categories cumulative level
4 V's
volume velocity variety veracity
composition
what makes up the whole of an entity under consideration