Exam #1
What is a weak correlation?
0 to 0.4
What is a moderate correlation?
0.4 to 0.6
What is a strong correlation?
0.6 to 1
r (correlation) ranges from, what?
1 - (-1)
What are the four main steps to the visualization process?
1. Define the data you have 2. What do you want to know about your data? 3. What visualization methods should you use? 4. What do you see using the visuals and does the data make sense?
What indicates a strong correlation?
8 and -.8
What are some of the reasons we visualize data?
Answer questions (or discover them)
What is the most commonly used charts?
Bar Chart
What are the three types of Coordinate Systems?
Cartesian Polar Geographical
What is the Ordinal Scale?
Categories where order matters. Ex. horrible, bad, okay, good
What is scale?
Dictates where the shapes are placed and how objects are shaded
What are the types of correlation?
Direction Magnitude Other relationships
Quantitative data can be...
Discrete & continuous
What is a Categorical Scale?
Discrete placement in bins. Ex. A,B,C
What is chart close to a pie chart called?
Donut chart
Examples of "Out of the Box" visualization tools...
Excel, Google Charts, Microsoft Power BI, Tableau, Geospatial Tools
What are some of the reasons we visualize data?
Expand memory
What is the Logarithmic Scale?
Focus on percent change. you multiply or divide to move up or down. Ex. 1, 10, 100,.
Pros of Google Charts...
Free to create charts. Includes interactive, animated and geospatial data graphics. Integrates well with google apps suite. Can easily access data from different computers
What are the easiest online solutions to geospatial mapping tools?
GOOGLE, yahoo, and Microsoft maps
What is Length?
How long the shapes are. length of bars in bar graph provides visual cues. The longer the bar, the longer the absolute value. Starts the axis at zero as people visually compare the distance from 0 to the end of the bar. If not done at 0 then people misrepresent info
Continuous Data
Infinite number of possible intermediate values. Ex. 1.5 lbs
Cons of Tableau...
Initial data preparation required, recently Tableau has launched PREP a separate software to PREP data. Expensive. In the free public version, any data you upload to the servers becomes publicly available
Pros of Tableau...
Integrates a wide range of data sources and file types. careful thought given to design and aesthetics. allows for interactive spatial animated and dashboard displays. powerful community collaboration
What is Color Saturation?
Intensity of a color hue. density of a given color. Ex gradients like light and dark red. Color can be used to show categories. Used to highlight certain aspects of data visualization
Name the types of Scales...
Linear Categorical Percent Logarithmic Ordinal Time Numerical
What are some of the reasons we visualize data?
Make data accessible
What are some of the reasons we visualize data?
Make decisions and persuade others to make decisions
1. What data do you have?
Maybe primary or secondary data we have collected. Time to analyze data is short compared to gathering it
Cons of EXCEL...
NOT interactive, requires customization to adhere to design standards. Not that great aesthetically for presentations, and may not process large datasets (1GB)
Why is Python called Python?
Named after the British comedy Series "Monty Python's Flying Circus". Van Rossum needed something short, unique, and slightly mysterious
Cons of Microsoft Power BI...
Not many options to configure visuals, problem with large data sets
Quantitative data
Numerical data that can be aggregated and measured
Scatter Plot
Often used to visualize the relationship between two variables.
What are some of the reasons we visualize data?
Persuade using evidence through narrative
What are the NINE visual cues?
Position Length Angle Direction Shapes Area Volume Color Saturation Color Hue
Discrete Data
Predefined at exact points, no "in between". Ex. 1 person
Examples of "Programming" visualizations tools...
R, Python, JavaScript
What is the Percent Scale?
Representing parts of a whole. Ex. 0%,25%
Cons of Google Charts...
Requires customization to adhere to standard designs, can't process large data sets
What is Angle?
Rotation between vectors. used in pie charts, commonly used to represent parts of a whole. Donut charts do not use angles since the center of the circle is cut out- arc lengths are used as visual cue
Iconic Memory
Short term or working memory. People can keep up to 4 chunks of visual information
What chart is used for categories and time?
Stacked bar chart
Cons of JavaScript...
Steep learning curve. Requires skills in working with HTML and JSON
Data visualization
The graphical representation of information and data
Where should you provide context through?
Through familiar colors and images, informative titles, and familiar objects and concepts.
Pros of Microsoft Power BI...
Tightly integrated with other Microsoft tools- excel, azure, cloud service, SQL server; highly intuitive user interface, more affordable compared to tableau. Can import data from wide range of sources
What are some of the reasons we visualize data?
To find patterns and see data in context
What is a Time Scale?
Units of months, days, or hours.
What is a Numerical Scale?
Users numbers
What is the Linear Scale?
Values are evenly spaced. it is always adding or subtracting one to move up and down scale. Ex. 1,2,3,4,5...
What are the data visualization components?
Visual Cues Coordinate Systems Scale Context
Edward Tufte's definition of data visualization
Visualization is complex ideas communicated with clarity, precision, and efficiency
Nathan Yau's definition of data visualization
Visualization is often framed as a medium for storytelling. The numbers are the source material, and the graphs are how you describe the source
Long-term Memory
Visuals can more quickly help us recall things from our verbal memory
Pros of JavaScript...
Web-based scripting language. Freely available and allow users to create sophisticated web-based visualizations
What is Position?
Where in space the data is. commonly used in scatter plots, compare values based on where others are place in the coordinate system. Easy to notice outliers and clustering. One disadvantage is that data points not labeled can be hard to grasp right away Ex. scatter plot
The goal of data visualization is to
aid our understanding of data by leveraging the human visual system's highly tuned and identify outliers
Bubble Chart
allows you to compare three variables at once: x, y, and area variable. bubble should be sized based on area.
3. What visualization methods should you use?
bar chart, pie chart or what else. Maybe chart that results in comparison, relationship, distribution, composition.
What charts are easy to use for non-technical audiences?
bar charts pie charts
A Cartesian coordinate system is made up of...
bar charts & line graphs
Histograms and similar to...
bar graphs and continuous density plots
When creating a bar chart, you should be aware of what?
bar height and width. start the axis at zero can be vertical or horizontal
What is ARCGIS?
built for desktop mapping. User interface, no coding required. Used by professional cartographers, graphics departments
Smoothing and Estimation
can be useful for better understanding of trends and predictive purposes. Can help fit trend lines.
Pros of Python...
can handle large amounts of data without crashing. Useful for analyses and heavy computation. Clean and easy to read syntax
Moderate negative correlation points are...
clustered together with some space between them moving down in a rightward motion
Strong Negative correlation points are...
clustered very close together in a downward right position
When creating pie charts, you should be aware of what?
color blind people include percentages
Continuous Temporal Data
constantly changing line graph (time series-cahrt) Step chart Smoothing and estimation
Continuous Density Plots
continuous instead of bins. we know how all data is distributed like a stats graph
Cons of "R" programming tool...
default chart outputs require design refinements like lack of titles. Use R to create graphs and edit using adobe illustrator or ink map. R is good for exploratory but not that good for presenting (explanatory aspect)
Temporal Data can be...
discrete and continuous
No correlation points...
do not follow a pattern. A.K.A scatter plot.
Negative correlation points move...
down and to the right
What is a Histogram?
encodes data using height/length as the visual cue. Uses bin sizes to represent data. Bins should be big enough to see the variability in data.
Stap Chart
fairly constant data for a while and then jumps up. good for tax rates and interest rates.
Categorical Data
for groups or categories, Mutually exclusive labels without any numerical value. Ex. name, gender, product types
Pros of "R" programming tool...
free open-source statistical programming language. Can write your own functions and packages to make graphics the way you want
A Geographic coordinate system is made up of...
geospatial graphs
2. What do you want to know about your data?
get sense from stakeholders to know what to answer. The more specific the better and getter more out of analysis
Cons of Python...
great starting point for data exploration, not very good aesthetically so might need other software for presenting like adobe/ink map
Visualizing Relationships between variables
how can you tell as something goes up, does another thing go up, go down, and is it a causal or correlative relationship?
What is Area & Volume?
how much 2-d/3-d space. bigger objects represent greater values. Make sure the scaling is correct. Keep in mind how many dimensions are present.
Gantt Chart
mainly used for project planning. bars expand the appropriate column corresponding to the time
Correlation
means one things trends to change a certain way as another thing changes.
Aggregating Data From Different Sources
negative employees maybe error in data entry, maybe order by firm or number of employees
What is a tree map?
not as common good to organize hierarchies. color and area are used as visual cues
Weak negative correlation points are...
not cluttered at all. Very sporadically placed on the graph. still moving in a downward right motion.
Ordinal data
o similar to categorical data but has a clear order. Ex. level of education, satisfaction level, salary bands
Other relationships
outlier, clustering, non-linear
A Polar coordinate system is made up of...
pie charts
what visual cue does a scatter plots use?
position
Direction correlation
positive or negative correlation
Coefficient of correlation
quantifies how tightly couples the values of two variables are with respect to each other.
Structuring Data
reviews are textual data and are based on opinion. There is no format to follow. You can quantify data through star reviews.
What is a Box Plot?
shows range, median, and quartiles of data. Uses position & height/length visual cues. Less specific than histograms or density plots.
What is Direction?
slope of a vector in space. commonly used in line graphs. Direction helps with noticing trends and with time. Slope is used to signal sharp changes
Cleaning Data
some parts of table don't use the same format of spelling Texas. Maybe organize names alphabetically and delete duplicate data. There are missing observations. Maybe organize based on recency
Magnitude correlation
strong or weak correlation
Pros of EXCEL...
supports processing of data, compatible with word and PowerPoint, relatively easy to learn, widely used
What is Shape?
symbols as categories. used to denote categories and objects. Visually shapes are readily recognized. Ex. Nathan hot dog contest
What are the types of distribution?
symmetric, left skewed, right skewed
Exploratory Data
testing a hypothesis (visual confirmation) and mining for patterns, trends, and anomalies (visual exploration)
Positive correlation points move...
up and to the right
Scatter Plot Matrix
useful to see relationships among multiple variables. Allows comparison across multiple dimensions.
What is Color Hue?
usually referred to as color. refers to the different colors like blue and green. Be mindful of color blindness. For executive presentations maybe you can use colors and shapes simultaneously
Explanatory Data
usually simple everyday visualizations —line charts, bar charts, pies, and scatter plots conveying a single message
Validation Data
valid values and ranges
Discrete Temporal Data
values from specific points or blocks in time. bar graph stacked abr graph Gantt Chart Points
Line graphs are good for...
visualizing the evolution of several quantities over time.
Mean and Median are used to refer to...
what is normal or average
4. What do you see, and does it make sense?
what we hypothesize or maybe opposite of that. You may need to go to step two once again since new questions arise