Data Analytics / Tableau
R
Correlation analysis = strength of the linear relationship between x and y from -1 to 1
Fill Handle
The small square in the lower right corner of a selected cell
Table calculation: Relative
The table calculation will be computed relative to the layout of the table. They might move across or down the table. Rearranging dimensions in a way that changes the table will change the table calculation results (scope and direction)
Table calculation: Fixed
The table calculation will be computed using one or more dimensions. Rearranging those dimensions in the view will not change the computation of the table calculation (partitioning and addressing)
Polynomial trend line
use a polynomial model when the relationship between x and y is complicated and shows more of an S curve
Power trend line
use a power model when its not linear, but somewhere between exponential and logarithmic
Tree maps
use a series of nested rectangles to represent hierarchical relationships of parts to the whole useful when you have hierarchies and dimensions with a high number of distinct values (cardinality) to create: place a measure on size shelf, and place one (or more) dimension on the detail shelf
Exponential trend line
use an exponential model when each additional increase in x will result in a dramatic/greater increase in y
Nesting IF functions to test for more than one condition
use an or/and/nor function
Aggregate calculation
use if aggregations must be performed prior to other operations avoids averaging averages ie) the sum of: [current enrollment]/[max enrollment]
Level of Detail calculation
use if you want an aggregate to be fixed or exclude a value gives you more control on the level of granularity of your aggregation
Row level calculation
use when you are certain that you will use either the value as a dimension or an aggregation of the row-level values will make sense ie) sum of current enrollment / sum of max enrollment
Gantt chart
useful for showing if certain events overlap
Stacked bar charts
useful when you want to understand part-to-whole relationships
Measures
values that are aggregated (summed, averaged, counted, or max/min)
Dimensions
values that determine the level of detail at which measures are aggregated. Can be discrete, only can be continuous if its a numeric or date
Sparklines
visualizations that use multiple small line graphs that are designed to be read/compared quickly
LOD: Include
will also use dimensions in view/viz
LOD: Exclude
will also use dimensions in view/viz, but not ones in the LOD
How to improve extract performance
•Hide unused fields •Use only the data you need •A subset will render/respond faster than the whole •Use the 'Optimize' feature after creating calculated fields or hiding fields •Store extracts on Solid State Disks
Reasons to use an extract
•Need better performance than what a live connection would offer •Need portability of data •Using Legacy connections (JET driver) •Want to share a packaged workbook •For Tableau Reader or somebody without access to data source
When NOT to use an extract
•You have sensitive data - doesn't require a password •You need to manage security/access-attempts •Need to see real time changes in your source data. •Volume of data is impractical
Inner Joins
the default, most exclusive type of join that will include the least amount of results
Filled maps
"cloropleth maps"
Discrete fields
(blue) are distinct and separate from one another Can be a dimension or a measure
Continuous fields
(green) values that flow from first to last in a continuum and have an order that wouldn't make sense to change. Can be a dimension if it's numeric or a date, and can be a measure
What is a good time to use a Box and Whisker chart?
to add additional statistical contexts to distributions
Linear trend line
use a linear model when x increases y at a constant rate
How many lines should a line chart have?
2-4
Logarithmic trend line
use a logarithmic model when expecting the law of diminishing returns
R-squared
Regression analysis = how well the x variables can predict the y variables Coefficient of determination. Proportion of the variance in the response variable that is predicable from the explanatory variable
"trend lines" in tableau
Regression lines Analysis --> trend lines
Tall & Narrow vs Short & Wide
TN is good structure SW is poor structure - easier to read, but table calculations and averages across measures is tedious
Lollipop chart
a bar chart with a circle to show a data point relevant to the bar
Data Densification
a broad term that indicates that missing values or records are filled in
Bullet chart
a good way to visually compare a primary measure with a secondary measure (shows progress towards a goal)
Logical layer
a semantic layer made up of logical tables or objects that are related (made up of one or more physical tables)
Conditional Formatting
a useful feature in excel that dynamically assigns a color/symbol based on a set of values that are relative to each other
Freeze Panes
a useful feature to always keep a certain number of rows and/or columns visible on the excel worksheet
Pivot tables (in excel)
a useful way to aggregate well structured data to gain insight quickly, accurately, and dynamically value areas = fields in this area are summarized, max/min, for each record fitting the row/column ideally the values have significant variation
Dumbell chart
a variation of the circle plot- compares two values, emphasizing the distance between the two values
IF argument: Logical_test
any value/expression that can be evaluated as TRUE or FALSE
Date values
are continuous by default, necessary for forecasting
Date parts
are discrete by default, separated parts of a date
Named Ranges
by using these, you can make formulas much easier to understand and maintain
Understanding the scripts a tableau files is running against a production system is useful for...
debugging performance issues, determining a more efficient indexing structure, and maintaining security across control lists for accountability reporting
Direction
defines how the table calculations move within the scope options: down, across, down then across, across then down
Scope
defines the boundaries within which a given table calculation can reference other values options: table, pane, cells
Relationship
defines which field connects the tables together
The 4th argument of the VLOOKUP function
determines if the lookup value has to be an exact match or not, is NOT mandatory, and doesn't have to be a constant
Anscombe's Quartet
four datasets that have nearly identical summary statistics but appear very different when graphed each dataset has eleven (x,y) and they were constructed to illustrate the importance of graphing data before analyzing it and the effect of outliers on statistical properties
Multiplicative forecasting
good when seasonal variation increases over time constraints: can't use this when the measurement to be forecasted has one ore more values that are <= 0 or even just REALLY close to 0
Additive forecasting
good when seasonal variation is constant over time
Distribution bands
helps analyze outliers on a scatter plot
Extracts
in tableau, pulling some or all of the data from the original source and stores it in a special separate file that effectively makes a copy of the data for the tableau visualization to use
Cardinality
indicates how many records in one table could potentially relate to the records of another table
Referential integrity
indicates whether we expect all records to find a match or whether some records could potentially be unmatched
IF argument: Value_if_false
is the value returned if logical_test is FALSE
IF argument: Value_if_true
is the value returned if logical_test is TRUE
Physical layer
made up of the tables that come from the underlying data source that may be joined or unjoined with conventional joins or unions created from custom SQL statements
Table calculations
make it possible to compare and perform calculations on aggregate values across the rows of resulting tables are performed AFTER the initial data query for the tableau worksheet, by the client machine, and never the enterprise level hardware helps get data from sources that aren't structured that well
LOD: Fixed
only uses dimensions in the LOD calculation
F5
press _____ in tableau to bypass the cache and refresh the data source from the data
Forecasting
requires the x-axis to be a time measure (full date field) additive vs multiplicative
Waterfall chart
shows how parts successively build up to a whole
Slope chart
shows the change in values from one period/status to another
Marimekko chart
similar to a stacked bar chart, but additionally uses varying widths of bars to communicate additional info