Data Analytics Exam 2 Review
ETL
Extract, Transform, Load
Gestalt Principles Examples
Proximity Similarity Enclosure Connection Symmetry Continuity Closure Figure and Ground
Data-to-ink ratio
Tufte principle, ratio of data to ink should be maximized non-data ink and redundant data ink (chart junk) should be removed essentially less is more remove backgrounds, redundant labels, borders, and bold effect, lighten labels, lighten or remove lines, label directly
warehouse v. production
Warehouse - storage for older files Production - day to day files Configure your data by having Power BI workbooks work off a warehouse rather than production
human element
critical thinking and judgement
(T/F)
dashboards are constructed from visualizations
bottom-up
data drives your research, you know the scope but you aren't sure what's in the data
Card
shows any number, such as a sum or an average
text tables
simple text table that shows data along two columns in tabular format Don't have too many columns in a text table
canvas
space where you design your dashboards
pre-attentive processing
tuned to detect a special set of visual attributes, which results in certain elements standing out - all without conscious thought
measures
typically numeric and are considered a dependent variable, value is a function of one or more dimensions e.g. units, unit price, profit
Tables
a grid with rows and columns, useful for comparing values of one or more categories
'NUMBER OF ITEM TYPES'
a measure that determines how many different products there are
development of dashboards
a process that relies on different types of knowledge
instance
a specific object e.g. Vincent Van Gogh is an instance of Painter Honda Pilot is an instance of Car
drillthrough
connect two or more reports that have the same content
Tufte Principles: Departure
"Simple is Better" depends on your audience An audience with a significant stake in the subject matter often demands precision with lots of details
color
- no color overload - be consistent and effective when using color - cater the color for your audience - understand color context
Analytics Lifecycle
1. Define scope and approach 2. Data Identification & Gathering 3. Data integration and Mapping 4. execution of Analytics 5. Visualization and Reporting 6. Present Findings & Enhance
Analytics Life Cycle
1. Request data 2. Receive data 3. Perform pre-load checks 4. Load data into data analysis platform 5. Perform post-load checks 6. Add custom fields 7. Perform Analytics 8. Export results/data
Elements of Visualization
Data colors Size Data Label Plot Area Tooltip Buttons Text Box Image Shapes
"manage relationships"
Here can add or delete a link
Storytelling approach
Memorable Relatable Lead to action
(T/F) A good dashboard should make your point without the need for words or explanations
True
Edit Queries
allows the analyst to manipulate the data using a powerful calculation engine with ability to do arithmetic, string, date, aggregation, and other operations support for Boolean logic and conditional statements Add a new column to date to enhance visualizations
Get Data
button used to get to data connectors such as text/CSV, Access database, Excel, etc.
single row functions
can be applied to one or more columns in a record operate on a record by record basis all records are uniformly affected by single row functions
Append Query
combines tables together
fields tab
define how the data selected in the Fields panel should be used as part of a visualization
data model
define the relationships between tables
formatting tab
defines how the dashboard should look
Gestalt principles
describe how our mind organizes individual elements into groups to make sense of an entire visual
Pre-attentive attributes
emphasis quantity color
visual tools that aid in pre-attentive processing
emphasis, quantity, color
ERP Systems
enterprise resource planning SAP, Oracle, Microsoft Dynamics ERP, Oracle NETSUITE
Explanatory Data Visualization
explains what the audience needs to know shows specific relationships in data, such as link between causes and results usually used for client presentation
emphasis
form, position, motion
foreign key
has values consistent with the primary key of another table may have non-unique values across records
slicer/action/filters
help interlink multiple visualizations on a page
Tufte's Principles
highlight that "excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency"
analytics mindset
includes the ability to interpret and share the results of data analytics techniques with stakeholders
slicers
interactive visuals that enable the creation of focused data can choose any combination of values always interact with other visuals used to dynamically select a more focused data set use the Ctrl key to select multiple boxes
data transforming or cleansing
involves converting data from one format to another to load it into an analytics tool
primary key
is never blank or null, contains unique values across all records
top-down
leverage the objective/hypothesis to defile the analysis criteria know what you're looking for an you're going to find it
normalization
results in data being kept in different database tables to minimize storage cost and prevent duplication of records
'NUMBER OF ARTISTS'
measure that can be used in visualizations only
"Any" Data Type
mixes text and numbers
unstructured data
new form of data that is currently much more difficult to analyze, can't prove that this came from an automatic system
Edit queries
opens the query editor Apply Changes - save the changes you made and exit the query editor
dimensions
qualitative, categorical data, considered an independent variable e.g. the product (chair, table, desk, bookshelf)
flat file
raw data in its simplest form
data visualization
representing (often large) amounts of data in visual form so as to bring out hidden meanings, trends, and other attributes
stacked bar charts
use horizontal bars to make comparisons between categories, shows how different categories relate to one another using a numeric field this is represented as a bad, uses multiple fields to compare the categories
date masks
used to convert between date data stored as characters vs. the actual date data type
fields panel
used to select the data that will become part of the dashboard
Exploratory Data Visualization
used to understand the data, develop and assess a hypothesis or question or find a pattern in the data allows the audience to explore data for further analyses is conducted for a problem that has not been clearly defined
report view
workspace/toolbox for designing dashboards divided into 8 parts