Data Analytics / Tableau

Ace your homework & exams now with Quizwiz!

R

Correlation analysis = strength of the linear relationship between x and y from -1 to 1

Fill Handle

The small square in the lower right corner of a selected cell

Table calculation: Relative

The table calculation will be computed relative to the layout of the table. They might move across or down the table. Rearranging dimensions in a way that changes the table will change the table calculation results (scope and direction)

Table calculation: Fixed

The table calculation will be computed using one or more dimensions. Rearranging those dimensions in the view will not change the computation of the table calculation (partitioning and addressing)

Polynomial trend line

use a polynomial model when the relationship between x and y is complicated and shows more of an S curve

Power trend line

use a power model when its not linear, but somewhere between exponential and logarithmic

Tree maps

use a series of nested rectangles to represent hierarchical relationships of parts to the whole useful when you have hierarchies and dimensions with a high number of distinct values (cardinality) to create: place a measure on size shelf, and place one (or more) dimension on the detail shelf

Exponential trend line

use an exponential model when each additional increase in x will result in a dramatic/greater increase in y

Nesting IF functions to test for more than one condition

use an or/and/nor function

Aggregate calculation

use if aggregations must be performed prior to other operations avoids averaging averages ie) the sum of: [current enrollment]/[max enrollment]

Level of Detail calculation

use if you want an aggregate to be fixed or exclude a value gives you more control on the level of granularity of your aggregation

Row level calculation

use when you are certain that you will use either the value as a dimension or an aggregation of the row-level values will make sense ie) sum of current enrollment / sum of max enrollment

Gantt chart

useful for showing if certain events overlap

Stacked bar charts

useful when you want to understand part-to-whole relationships

Measures

values that are aggregated (summed, averaged, counted, or max/min)

Dimensions

values that determine the level of detail at which measures are aggregated. Can be discrete, only can be continuous if its a numeric or date

Sparklines

visualizations that use multiple small line graphs that are designed to be read/compared quickly

LOD: Include

will also use dimensions in view/viz

LOD: Exclude

will also use dimensions in view/viz, but not ones in the LOD

How to improve extract performance

•Hide unused fields •Use only the data you need •A subset will render/respond faster than the whole •Use the 'Optimize' feature after creating calculated fields or hiding fields •Store extracts on Solid State Disks

Reasons to use an extract

•Need better performance than what a live connection would offer •Need portability of data •Using Legacy connections (JET driver) •Want to share a packaged workbook •For Tableau Reader or somebody without access to data source

When NOT to use an extract

•You have sensitive data - doesn't require a password •You need to manage security/access-attempts •Need to see real time changes in your source data. •Volume of data is impractical

Inner Joins

the default, most exclusive type of join that will include the least amount of results

Filled maps

"cloropleth maps"

Discrete fields

(blue) are distinct and separate from one another Can be a dimension or a measure

Continuous fields

(green) values that flow from first to last in a continuum and have an order that wouldn't make sense to change. Can be a dimension if it's numeric or a date, and can be a measure

What is a good time to use a Box and Whisker chart?

to add additional statistical contexts to distributions

Linear trend line

use a linear model when x increases y at a constant rate

How many lines should a line chart have?

2-4

Logarithmic trend line

use a logarithmic model when expecting the law of diminishing returns

R-squared

Regression analysis = how well the x variables can predict the y variables Coefficient of determination. Proportion of the variance in the response variable that is predicable from the explanatory variable

"trend lines" in tableau

Regression lines Analysis --> trend lines

Tall & Narrow vs Short & Wide

TN is good structure SW is poor structure - easier to read, but table calculations and averages across measures is tedious

Lollipop chart

a bar chart with a circle to show a data point relevant to the bar

Data Densification

a broad term that indicates that missing values or records are filled in

Bullet chart

a good way to visually compare a primary measure with a secondary measure (shows progress towards a goal)

Logical layer

a semantic layer made up of logical tables or objects that are related (made up of one or more physical tables)

Conditional Formatting

a useful feature in excel that dynamically assigns a color/symbol based on a set of values that are relative to each other

Freeze Panes

a useful feature to always keep a certain number of rows and/or columns visible on the excel worksheet

Pivot tables (in excel)

a useful way to aggregate well structured data to gain insight quickly, accurately, and dynamically value areas = fields in this area are summarized, max/min, for each record fitting the row/column ideally the values have significant variation

Dumbell chart

a variation of the circle plot- compares two values, emphasizing the distance between the two values

IF argument: Logical_test

any value/expression that can be evaluated as TRUE or FALSE

Date values

are continuous by default, necessary for forecasting

Date parts

are discrete by default, separated parts of a date

Named Ranges

by using these, you can make formulas much easier to understand and maintain

Understanding the scripts a tableau files is running against a production system is useful for...

debugging performance issues, determining a more efficient indexing structure, and maintaining security across control lists for accountability reporting

Direction

defines how the table calculations move within the scope options: down, across, down then across, across then down

Scope

defines the boundaries within which a given table calculation can reference other values options: table, pane, cells

Relationship

defines which field connects the tables together

The 4th argument of the VLOOKUP function

determines if the lookup value has to be an exact match or not, is NOT mandatory, and doesn't have to be a constant

Anscombe's Quartet

four datasets that have nearly identical summary statistics but appear very different when graphed each dataset has eleven (x,y) and they were constructed to illustrate the importance of graphing data before analyzing it and the effect of outliers on statistical properties

Multiplicative forecasting

good when seasonal variation increases over time constraints: can't use this when the measurement to be forecasted has one ore more values that are <= 0 or even just REALLY close to 0

Additive forecasting

good when seasonal variation is constant over time

Distribution bands

helps analyze outliers on a scatter plot

Extracts

in tableau, pulling some or all of the data from the original source and stores it in a special separate file that effectively makes a copy of the data for the tableau visualization to use

Cardinality

indicates how many records in one table could potentially relate to the records of another table

Referential integrity

indicates whether we expect all records to find a match or whether some records could potentially be unmatched

IF argument: Value_if_false

is the value returned if logical_test is FALSE

IF argument: Value_if_true

is the value returned if logical_test is TRUE

Physical layer

made up of the tables that come from the underlying data source that may be joined or unjoined with conventional joins or unions created from custom SQL statements

Table calculations

make it possible to compare and perform calculations on aggregate values across the rows of resulting tables are performed AFTER the initial data query for the tableau worksheet, by the client machine, and never the enterprise level hardware helps get data from sources that aren't structured that well

LOD: Fixed

only uses dimensions in the LOD calculation

F5

press _____ in tableau to bypass the cache and refresh the data source from the data

Forecasting

requires the x-axis to be a time measure (full date field) additive vs multiplicative

Waterfall chart

shows how parts successively build up to a whole

Slope chart

shows the change in values from one period/status to another

Marimekko chart

similar to a stacked bar chart, but additionally uses varying widths of bars to communicate additional info


Related study sets

Nutrition Midterm Spring Semester

View Set

MCB 2004 Mastering Biology Chapter 14 Questions

View Set

RENAL TRANSPLANTATION IMMUNOLOGY

View Set

Chapter 50: Nursing Care of a Family when a Child has a Vision or Hearing Disorder

View Set

Security+ 1.4 Explain penetration testing concepts

View Set

Chapter 25: Vital Signs Practice Questions

View Set