Data Visualization

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

outliers in IQR

+/- 1.5 IQR

position attributes

-2D position -motion

what dashboards are not

-a display that is primarily used for data exploration and analysis -a portal -a scorecard -a report that people use to look up specific facts

rules to follow

-always label your axes for scale and variable encodings -always keep your geometry in check -always include your sources -always consider your audience -always avoid 3D effects -always avoid intercept deceptions

time series conventions

-always use the horizontal axis for time scale and the vertical axis for quant. scale -vertical bars when you want to emphasize individual values or compare categories of values, rather than overall pattern -lines only when you want to emphasize the pattern of change over time -points only when values were collected at irregular intervals of time or there is a little short term up and down fluctuation in the values -vertical box plots when you want to display how a distribution changes through time

serif

-appropriate font for data labels to speed processing -when there is a lot of text to read

characteristics of lines

-color hue and intensity -unless in black and white -points and gridlines are useful to compare values on different lines

interactions (few 2009)

-compare -sort -add variables -filter -highlight -aggregate -re-express -zoom/pan -re-scale -access details -annotate -bookmark

visualization amplifies cognition as it

-conveys meaning -increases working memory -facilitates search -facilitates discovery -supports inference -enhances detection hierarchy, relational, temporal, spatial

spatial

-dot distribution, graduated symbols, cartogram, choropleth

graphical inference (wickham)

-exploratory analysis may combine graphical methods, data transformations, and statistics -use questions to uncover more questios -formal methods may be used to confirm, sometimes on held-out or new data -visualization can further aid assessment of fitted statistical models

representation summary

-facilitate cognitive processing -some representations are innately better than others

common kernel functions for density estimation

-gaussian -rectangular -triangular -epanechnikov -cosine

why interaction?

-give control to the user -guide the user through your story -handle too much data or too many variables -allow for data exploration and new questions

color attributes

-hue -intensity

the roles of text

-label -introduce -explain -reinforce -highlight -sequence -recommend -inquire

form attributes

-length -width -orientation -shape -size -enclosure -curvature -added marks

common conventions

-like colors mean like things -color saturation indicates higher and lower values -categories are arranged and plotted from one extreme to another

temporal

-lines and motion (time)

common ways to summarize data are

-measures of average -measures of variation -measures of correlation -measures of ratio

visualization lies

-no zero line -dual axes -flipped y axis -doesn't add up -limited scope -strategic binning -problems with area/dimension

relationships among categorical items

-nominal -ordinal -interval -hierarchical

sorting

-often uncovers much more meaning in data -provide extremely quick and easy means to re-sort data in different ways -provide the means to link multiple graphs and easily sort the data in each graph the same way -provide the means to sort items in a graph based on various values, especially the values that are featured in the graph

dashboard design best practices

-organize information to support meaning and use -maintain consistency to enable quick and accurate interpretation -pet supplementary information within reach -make the experience aesthetically pleasing -expose lower-level alerts -keep viewers in the loop -when needed, accommodate real-time monitoring

characteristics of bars

-orientation -proximity -fills -borders -base value

LOESS curves

-performs multiple local regressions that place higher weighting on closer points -provides a richer visual representation of the trend and doesn't require an a priori model -doesn't provide a simple regression function to describe the trend

relationships among quantitative values

-rankings -ratios -correlations

process and provenance (heer and shneiderman)

-record analysis histories for revisitation, review and sharing -annotate patterns to document findings -share views and annotations to enable collaboration -guide users through analysis task or stories

view manipulation (heer and shneiderman)

-select items to highlight, filter, or manipulate them -navigate to examine high-level patterns and low-level detail -coordinate views for linked, multi-dimensional exploration -organize multiple windows and workspaces

characteristics of points

-shape -fill -color

quantitative stories always feature relationships

-simple associations between quantitative values and categorical items -more complex associations among multiple sets of quantitative values

relational

-suggests patterns of connections -heatmap, chord diagram, sankey diagram

hierarchy

-suggests relationship direction -stacked schemes (vertical, horizontal, center/periphery relationship) -nested schemes (treemap)

which graphs to use with multivariate datasets

-tableplots -scatterplot matrices -star graph -icon solutions

use tables when

-the display will be used to look up individual values -precise values are required -the quantitative values include more than one measure -both detail and summary values are included -the display will be used to compare individual values

use graphs when

-the message is contained in the shape of the values -the display will be used to reveal relationships among whole sets of values

common components of a chart/graph

-title/subtitle -data region -data label -data encoding -annotation -legend -grind lines -note -x/y axis -tick mark

where is the boundary

-to show relationships consider consider linking graphical representations of data objects using lines or ribbons of colors -consider putting related information inside a closed contour -color or texture can be used to define regions that have more complex shapes

secondary data component design

-trend lines -reference lines -annotations -scales -tick marks -grid lines -legends

properties of representation

-understanding without training -resistance to alternative conclusions -cross culture validity -immediacy (hard wired)

fundamental usage requirement features

-update frequency -user expertise -audience size -technology platform -screen type -data types

treemap

-use of color and form to represent two quantitative values -may also use relative position to represent nested/hierarchical data -precision not important -boxes represent entities -better for large data sets where smallest category still relatively significant

cartograms

-use of size of predefined units (states) to represent distribution of variable values -topology maintained -explains via familiar units

ranking designs basics

-use one axis for categorical items and use the other axis for a quantitative scale -bars are almost always preferred -except when the quantitative scale doesn't begin at zero, then use points -sorting is key to effectively communicate a ranking

aesthetics

-use subdued colors over bright colors -use off-whites instead of stark whites in background -align content and follow good layout principles -use legible font

data and view specification (heer and shneiderman)

-visualize data by choosing visual encodings -filter out data to focus on relevant items -sort items to expose patterns -derive values or models from source data

font sizing and spacing

1 inch = 72 points 1 pica = 12 points 12 points = 16 pixels

visualization priniciples

1. adopt novel approaches to visualization only when anticipated benefits are greater than the cost of learning + cost of inconsistency 2. when two visualizations can support the same task, adopt the tool that is innately more effective 3. visualization tool development cost is less than benefits from visualization tool

13 common mistakes in dashboard design (few 2013)

1. exceeding the boundaries of a single screen 2. supplying inadequate context for the data 3. displaying excessive detail or precision 4. choosing inappropriate media of display 5. expressing measures indirectly 6. introducing meaningless variety 7. using poorly designed display media 8. encoding quantitative data inaccurately 9. arranging the data poorly 10. ineffectively highlighting what's important 11. cluttering the screen with useless decoration 12. misusing or overusing color 13. designing an unappealing visual display

what should a good chart do?

1. show the data 2. induce the viewer to think about the substance rather than about methodology, graphic design, technology of graphic production 3. avoid distorting what the data has to say 4. present many numbers in a small space 5. make large data sets coherent 6. encourage the eye to compare different pieces of data 7. reveal the data at several levels of detail, from a broad overview to the fine structure 8. serve a reasonably clear purpose: description, exploration, tabulation, or decoration 9. be closely integrated with the statistical and verbal descriptions of a data set

tufte's principles of graphic integrity

1. the representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented 2. clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. write out explanations of the data on the graphic itself. label important events in the data. 3. show data variation, not design variation 4. in time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units 5. the number of information carrying (variable) dimensions depicted should not exceed the number of dimensions in the data 6. graphics must not quote data out of context

tufte's principles of graphical execellence

1. the well-designed presentation of interesting data- a matter of substance, statistics, and design 2. consists of complex ideas communicated with clarity, precision, and efficiency 3. that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space 4. nearly always multivariate 5. requires telling the truth about the data

performance monitoring process

1. update high-level situation awareness 2. identify and focus on particular items that need attention: update awareness of this item in greater detail and determine whether an action is required 3. if action is required, access additional information that is needed, if any, to determine an appropriate response 4. respond

5 things to know how people perceive charts

1. we don't go in order 2. we see first what stands out 3. we see only a few things at once 4. we seek meaning and make connections 5. we rely on conventions and metaphors

data-ink ratio (tufte 1983)

= data ink/total ink used to print the graphic proportion of a graphic's ink devoted to the non-redundant display of data-information 1.0 - proportion of a graphic that can be erased

data density

= number of entries in data array/area of data graphic

the lie factor (tufte 1983)

=size of effect shown in graphic/size of effect in data avoid confounding design variation with data variation the scale of the graphic should always correspond to changes in the data being represented

"Data viz is often the most effective way to describe, explore, and summarize a set of numbers by looking at a picture of those numbers... well-designed data graphics are usually the simplest and at the same time the most powerful"

Edward Tufte (Visual display of quant info)

"The greatest value of a picture is when it forces us to notice what we never expected to see"

John Tukey

bean plot

a more complete way to represent a distribution that shows the smoothed density of points over a window called the 'bandwidth' combines box plot, density plot, and a rug in the middle

gestalt principles of perception

a psychological theory of perception that suggests the mind understands external stimuli as whole rather than the sum of their parts. the wholes are structured and organized using grouping laws

rows

a series of flowlines that create horizontal divisions of space on a page

dashboard

a visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so the information can be monitored at a glance (Few 2013)

LOESS curve bandwidth

aka the smoothing parameter (alpha), the trend line can look different depending on this

flowlines

alignments that break the space into horizontal bands

no chart junk

all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information (tufte)

isopleth

an isoline on a graph showing the occurrence or frequency of a phenomenon as a function of two variables quantitative value used to impose isolines (constant value)

where should legends appear on the graph

anywhere they fit as long as they don't interfere with more important components of the graph

numbers that summarize

better communicate your quantitative message by reducing large datasets to a few numbers that summarize the data

emphasize size

bigger objects, words, and numbers

should legends have borders

borders don't add any meaning and draw attention away from the data. avoid borders

ordinal data

categories with order -very happy to very sad -use a color scale

fundamental aspects of design

color typography layout and composition

analogous color palette

colors next to each other on the color wheel

split complementary color palette

colors next to the one opposite the color wheel (triangle)

complementary color palette

colors opposite the color wheel

design

communication and persception

boxes (box and whisker plots)

comparing distributions across categories

continuous data

connecting the dots warning- line implies continuity and the line between the dots may not be appropriate depending on the context

attention is drawn to

contrasts, similarity

alpha

controls the flexibility of the LOESS regression function larger values produce smoothest functions in the data that wiggle the lease in response to fluctuations in the data

CMYK

cyan- 100% magenta- 0% yellow- 36.1% black- 0%

scripts

decorative, thin, and wide fonts are generally hard to read and should be used sparingly and only if they truly add to the design of the visualization

interquartile range

difference between the 7th and 25th percentiles

distribution quantitative message

displays the way in which one or more sets of quantitative values are distributed across their full quantitative range, from lowest to the highest and everything in between

rug plot

draws a small vertical tick at each observation in a histogram

purposes of data visualizations

explore: confirm and analyze explain: inform and persuade

part-to-whole graph

features how individual values that make up the whole of something compare to each other and the whole

deviation relationships

features how one or more sets of quantitative values differ from a reference set of values

linear trend lines

fit the data with the best line using least square regression these trend lines provide easy to interpret equations describing the linear trend

the four key pre-attentive attributes

form, color, position, motion

anscombe's quartet

four datasets with the same properties (mean, sample variance, correlation) but very different graphs

ranking quantitative message

graph that displays how a set of quantitative values relate to each other sequentially, sorted in ascending or descending order

statistics

graphical data analysis

the shrink principle (tufte 1983)

graphics can be shrunk way down

scatterplot matrices

great way to roughly determine if you have a linear correlation between multiple variables

spatial zones

groups of modules that cross multiple rows and columns

"simplify, simplify, simplify"

henry david thoreau

HSL

hue- 158% saturation- 100% lightness- 50%

when can you eliminate a legend

if categorical variables are encoded using color, shape, etc. a legend can be used to label them

modules

individual units of space created from intersecting rows and columns

when designing a set of glyphs to represent quantity

mapping to any of the following glyph attributes will be effective: size (length/area), lightness, saturation, vertical position -never use volume of a 3D glyph to represent quantity

choropleth

maps with color -use of color to represent distribution of 'standardized' variable values using pre-defined boundaries -loses gradient subtlety -explains via familiar units

ratio

meaningful zero

ratio

measure the relationship between a single pair of values and can be expressed in four ways 1. statement 2. fraction 3. rate 4. percentage

nominal data

multiple categorical states -urban/suburb/rural

interval data

numbers with quantifiable differences -dates

emphasize color intensity

objects, words, and numbers that are darker or brighter than the norm

emphasize enclosure

objects, words, and numbers that are enclosed by lines or background fill colors

emphasize hue

objects, words, and numbers that have a hue that is different from the norm

pre-attentive processing

occurs below the level of consciousness at an extremely high speed and is tuned to detect specific visual attributes

list

one categorical variable

monochromatic color palette

one color

bar chart

one continuous and one categorical variable

pie chart

one continuous and one categorical variable

boxplot

one continuous variable

histogram

one continuous variable

nominal v. ordinal

order your categories if -the categorical variable has a specific order

visual information seeking mantra

overview, zoom and filter, then details-on-demand

voronoi diagram

partitioning of a plane into regions based on distances to points in a specific subset of the plane

the cleveland dotplot

plots of points that each belong to one of several categories. the bars are replaced by dots at the values associated with each category aka strip charts

icons

provide a universal communication mechanism and create visual interest in the reader

standard deviation

provides a single value that measures variation of a set of data values relative to the mean

quantitative stories include two types of values

quantitative and categorical

RGB

red- 0 green- 255 blue- 163

HEX codes

red- 00 green- FF blue- A3

correlation

relationships between variables. displayed in a graph when it is designed to show whether two paired of quantitative values vary in relation to one another

bubblechart

scatterplot with third attribute represented by pre-attentive selection of form (size)

time series data

series of quantitative values that show how something has changed over time

frequency distributions

show the number of times something occurs within consecutive intervals over the entire quantitative range

violin plot

similar to box plot with a symmetrical rotated kernel density plot on each side

white space

similar to tufte's idea of maximizing the data ink and removing chart junk

nominal/categorical comparison

simplest of all and typically least interesting -goal: display a set of discrete quantitative values so they can be easily read and compared

emphasize orientation

slanted words and numbers

kernel density plot

smoothed line over a histogram based on the declared 'bandwidth', the kernel function is used to estimate a density for each band. then all density estimates are added together weighted functions used in non-parametric estimation

median

sort values in order and then find the value that falls in the middle of the set

gutters

space that separates rows and columns or two facing pages

perception in a nutshell

specialized neurons extract features involuntarily --> rapid assembly of information into significance; object identification (and action) --> visual working memory (using three chunks) with active conscious attention

variation

spread, IQR, standard deviation

step chart

standard line chart implies steady change from point a to point b so the values dont change in between time points

spread

subtract the lowest value from the highest value

mean

sum of all values divided by the number of values

glyphs

symbols meant to represent a numerical attribute of an entity, representing through the use of our building blocks of form, color, position, and motion

margin

the space that separates the context from the edge of the page

rule of thirds

the subject isn't centered in the image and draws the viewer's eye into the composition instead of glancing at the center

triad color palette

three colors all opposite from each other

bubble

three continuous variables or two continuous variables and one categorical variable

xref list

two categorical variables

scatterplot

two continuous variables

line graph

two continuous variables with one being time -doesn't need a zero

binary data

two states -positive negative

when can you eliminate tick marks?

unnecessary for axes with categorical variables most useful when knowing precise locations of values the longer the scale line, the more tick marks it should contain

heatmap

use of color to represent quantitative value imposed upon an area/location (vs. leveraging predefined boundaries agnostic to the mapped values) can also be used in a highlight table

to make symbols in a set maximally distinctive

use redundant coding wherever possible, for example make symbols differ in both shape and color

when developing glyphs

use small, closed shapes to represent data entities, and use color, shape and size of those shapes to represent attributes of those entities

emphasize width

use thicker lines (including words and numbers that are boldfaced)

stacked bars

useful when there are subcategories and the sum of the subcategories is meaningful for your quantitative message

time series visualizations

useful when your quantitative messages includes -change -rise -increase -fluctuate -grow -decline -decrease -or trend

midrange

value midway between the highest and lowest value in the set (different than median)

mode

value that occurs the most in the set of values- if no value appears more than once- there's no mode

bullet chart

variation of a bar graph developed by Few. features a single primary measure and compares that measure to one or more measures to enrich its meaning. its displayed in the context of qualitative ranges of performance (poor to good) with varying intensities of a single hue to make them discernible

coxcomb chart

variation of a pie chart that represents numbers using the area of the circle segments instead of the radius (polar)

columns

vertical divisions of space on a page

how should you arrange labels in legends

vertical or horizontal is fine, whichever best fits your design

how visible should legends be

visible and legible but not as prominent as the actual data

Data visualization

visual display of quantitative information through the use of -points -lines -coordinate systems -numbers -symbols -words -shading -color

principle of enclosure

we perceive objects as belonging together when they are enclosed in a way that appears to create a boundary around them

principle of proximity

we perceive objects close together as belonging to a group

principle of continuity

we perceive objects that are connected as part of the same group

principle of closure

we perceive open, incomplete, and unusual forms as closed, whole, and regular

principle of similarity

we tend to group objects together that are similar

motion attribute

when animation is used in a visualization, aim for motion in the range of 0.5 to 4 degrees/second of a visual angle

sans-serif

when there are shorter amount of text, captions, text in charts, headings

"a graphic display has many purposes, but it achieves its highest value when it forces us to see what we are not expecting"

william cleveland


संबंधित स्टडी सेट्स

Microbiology 240 - Exam 1 Multiple Choice

View Set

MAN 4720 Exit Interview Questions

View Set

Biology Honors: Natural Selection

View Set

PrepU: Chapter 19-Lung Assessment, Med Surg 2. Chapter 21, CC Ch. 21, Ch 21 Respiratory Care Modalities, PrepU Resp AH, MedSurg Chapter 21 Respiratory Care Modalities, Ch 21 - Respiratory Care Modalities, Ex. 4-Ch. 21 (Med Surg) Resp. Care Modalities

View Set

Unit 18 Phys, Immune System Cells

View Set