Domain 1.3 - Prep Builder
Union information
- All files in a union created by a wildcard search need to be the same file, these files also must be in the same parent or child directory and have similar structures. when specifying a wildcard ,match, it also helps if your files have similar names. - It is not possible for a wildcard union to be applied to all files types, it is not applicable to Tableau extracts. - it is not possible for to create a union through a wildcard search in all versions of tableau prep builder 2018 and on - Unlimited unions of files available with wildcard, otherwise 10 is the limit - If new files are added to the directory, they will be automatically updated the next time that the flow is opened or run form the command line - Wildcard union can be updated from the tableau prep builder user interface by right clicking the input step with the wildcard union configured, and click refresh - Mismatch fields must be merged in a cleaning step - The union created through a wildcard search happens in the input step, not in the flow. A union in a flow uses multiple inputs. A wildcard union uses a single input made of multiple unioned files. Results must be viewed in a cleaning step.
Join information
- Common fields can have different names but must have similar values - Fields have different names they must be joined manually - A join can have multiple join clauses - you can join fields that were not used in the original join clause, or you can join a field from the original join clause to additional fields. - No all join clauses need to have the same operator
Review recorded changes
- Every cleaning operation that you apply in Tableau Prep Builder is recorded in the Changes pane, to the left of the profile pane. -Annotation icons are shown next to the cleaning operation in the Changes pane, in the profile pane for the relevant field, and above the step in the flow where the change was made. -You can undo any operation from within the Changes pane. To undo a change, hover over the operation and click the X, or right-click and select Remove. -Selecting Undo from the top menu bar will undo the last change made to the flow, including any edits made within the Changes pane. -Click any operation listed in the Changes pane to see the state of the data after that change. -Drag and drop any of your cleaning operations in the Changes pane to reorder the changes. -Any dependency issues with reordering cleaning operations will be flagged with an error indicator. Editing an operation - Hover over the field and select the Edit Field button to edit an operation from within the Changes menu. Edits can be made to the calculation as well as the field name from within the Edit Field menu.
Aggregation options
- Sum, Average, and Median - Count and Count Distinct - Minimum and Maximum - Standard Deviation and Standard Deviation Population - Variance and Variance Population - Percentile
Pronunciation
- Use Pronunciation when similar sounding values have been entered with different spellings. - algorithm indexes values phonetically by how they sound when spoken, uses the Metaphone3 algorithm to generate keys based on the value's English pronunciation.
Spelling
- Use Spelling when the spelling of similar values differs due to added or missing letters. - The algorithm computes an edit distance between two text values and then groups them together when the edit distance is less than a default threshold value. - The edit distance is the number of edit operations (insertion, deletion, or substitution) required to transform one value into the other. For example, the values "Portland" and "Portand" have an edit distance of 1, since you can perform one edit to delete the "l" in "Portland" to get "Portand." - matches invalid values to the closest valid values that differ by adding, removing, or substituting characters. - use sliders for Spelling and Pronunciation to adjust the strictness of the grouping parameters. - The Spelling option is available in any Tableau supported language.
Configure the join
- Use the settings tab of the join profile to edit the configuration until we get the results we want 1.) Edit join clause: Under Applied Join Clauses, click on either of the fields to open the dialog box. -You can select other fields from the list if you are not satisfied with the fields chosen by default. 2.) Manually configure the join when no fields are selected by default: Common fields contain common values, but they may not always have the same name. -Click Add under Applied Join Clauses. Then select the related fields 3.) select the operator for the join clause - By default, Tableau Prep Builder will use the equal to (=) operator. -can change the operator by using the drop-down menu between the fields. 4.) Add additional join clause: can join fields that were not used in the original join clause, or you can join a field from the original join clause to additional fields. -click the Add icon to open the dialog box and then select the common fields for the additional clause. 5.) Special join type: By default, Tableau Prep Builder will use an inner join type. You can change the join type from the default (inner join) to something else by clicking on areas in the graphic under Join Type to select or deselect the corresponding rows to include from each table. 6.) Review the summary of join: The distribution bars in the Summary of Join Results show: Rows included and excluded from each table
Aggregation information
- We can group by multiple fields - we can aggregate multiple fields - We can aggregate multiple fields and use different mathematical options - we can group by numeric fields such as unique ID
Resolve fields that do not exist in all tables
- When fields don't exist in one or more of the tables in a union, you will see null values for that field in the other tables. - You may choose to keep the fields with the resulting null values or remove the fields. You can do this in the union step or in a new step in the flow. Let's resolve the fields in the union step.
Review new fields and perform operations
- When you reviewed the results of the wildcard union in a cleaning step, you noticed two new fields, Table Names and File Paths. 1.) Toggle back to profile pane view to see the values in the profile cards for those fields. The Table Names field lists the tables used in the union, and the File Paths field lists the files used in the union. 2.) to rename double-click Table Names in the field header or select Rename Field on the More options menu
Filter values using the context menu
- With the Selected Values filter, you can pick and choose the values that you want to keep or exclude for a field, even values that aren't in your sample. - In the right pane, click the Keep Only or Exclude tab to select your action. - Enter search terms to search for values. - Click Add a value to add values that are in your data set but aren't included in your sample. - Click Done to apply your filter.
Create a join
- can create a join step anywhere in a flow, after connecting to and adding at least two tables to the flow. - need to have a common field - contain similar values, whether or not the field names match. 1.) Once similar fields are identified, add a join step in the flow plane - drag the cleaning step from one table to the cleaning step from the other table and release it when the Join option highlights.
Editing single values
- can edit it in the profile pane. - can also edit nulls to a more useful value if the data was unknown in data entry - When you change a field value, it changes that value for every row. - Consider starting by adding a cleaning step before editing values. Cleaning steps allow cleaning operations to be isolated, categorized, and explained by adding a step description.
Edit multiple values using Group and Replace
- group and replace multiple values using the context menu or using Manual Selection within Tableau Prep Builder. - the group and replace takes place by simply pressing Ctrl while selecting each value. Then right-click and select Group. Manual Selection: - clicking the field's More options menu, choosing Group and Replace, and then clicking Manual Selection. Use Manual Selection when you want full control of the values in each group.
Filter selected values
- open the More options (...) menu on the profile card where you want to apply the filter, select the Filter menu item, and then click Selected Values from the Filter menu.
Common Characters
- useful to fix capitalization, formatting issues, and typos with reversed letters or when similar values have been entered with the text in varying order. - The algorithm creates an alphabetical string formed by each unique letter in the text. - method is case insensitive and only applies to numbers and letters. - generating the key, each string is first transformed into lowercase letters and all special characters including white spaces, punctuation, and control characters are removed. - Common Characters uses the n-gram algorithm.
Create an aggregate step in the flow
--can create an aggregate step anywhere in a flow, after connecting to and adding at least one table to the flow. 1.) Add aggregation step Click the plus (+) icon next to the cleaning step and then click Aggregate. 2.) Note the pane sections: -The left pane shows a list of fields with options to add them to the other panes -The Grouped Fields pane to indicate which fields define the new granularity -The Aggregated Fields pane to indicate which fields to aggregate 3.) Aggregate fields: - Double-click the field name(s) to add the field(s) to the Aggregated Fields pane You can also use other methods to select the aggregated field(s), including: - Drag fields to and between the desired panes. - Click Add All or Remove All to add or remove fields in bulk. - Search for fields, if there are many fields in your data. 4.) Set the type of aggregation: In the Aggregated Fields pane, click the drop-down menu to see the different options for aggregating your data. - Alternatively, select the aggregation type and field at the same time -can click the aggregation type for the desired field in the left pane and select a new (or the same) aggregation level.
There are two ways to create a union in Tableau Prep Builder:
-Add tables to a union using a union step in a flow. -Automatically find files or database tables to union based on a wildcard search in an input step in a flow.
Validate your data by previewing in Tableau Desktop
-Click on the cleaning step where you applied cleaning operations and review those changes in the Changes pane. -Insert a step to see data before cleaning To validate the necessity of the change, you need to see the state of the data before cleaning operations were applied. To do so, you will need to insert a step. -Preview data before cleaning Right-click the Data Before Cleaning step. Then select Preview in Tableau Desktop. The flow is run through that step. Tableau Desktop will launch to a blank worksheet with the data loaded in the state it appears at that step in the flow (before cleaning operations were applied). -Build a view -Preview data after cleaning Return to Tableau Prep Builder. Now you would like to preview the data after cleaning operations were applied to confirm that the data type change Right-click the Data After Cleaning step. Then select Preview in Tableau Desktop. -Build a view to compare
Copy and Paste Operations
-Copy and Paste operations let you quickly apply changes made in one field to another. -Right-click a cleaning operation and select Copy. -Right-click and select Paste to duplicate the operation. -Drag and drop the new cleaning operation to a different field to apply the operation to that field. -You can copy an operation once and then paste and drag it multiple times to apply the same operation to multiple fields. -You can also copy a cleaning operation from the Changes pane in one step, and then paste it into the Changes pane of another step so you can reuse your work! *This option is not available for input steps with wildcard unions or for operations that impact multiple fields, such as calculated fields.
Examine the distribution of field values
-Field values can be organized into equal-sized ranges of values to help you better understand your data. -The distribution of data is the frequency, or count, of occurrence for data values. -Equal sized bins are used to group values into ranges. -Summary view of a field's profile card, Tableau uses a histogram to display the shape of a field's frequency distribution. Values are grouped into ranges, or bins. The longest bar indicates the most common range of values.
Options menu
-From the More options menu, select Duplicate Field, A new field is created with the same name and a modifier. For example, "Ship Date-1" -It's helpful to rename fields so they are meaningful to other people and for yourself. Often, you will want to rename fields that you have split. -Keep only fields:Hold Control and select the fields you want to keep. Click More options and select Keep Only -To undo a change simply type Ctrl-z, Ctrl-y, or hit the Undo button. -Changing data types same as tableau
Evaluate union results
-Select the union step in the flow and in the union pane review the metadata, such as the number of fields, field names, and data types of the fields. -The mismatched field pane shoes all of the mismatched color coded fields
Create the union *using files as the data sources
-These steps are similar when you create a union with wildcard search using database tables as the data source.
union with a wildcard search
-This method applies to files and database tables, and it works well when you have multiple, related tables with the same data source type. -wildcard unions can be used to automatically union new files or worksheets or tables that are created on a recurring basis, such as weekly or monthly. -If you have many similarly named tables with the same structure, you can use a wildcard search to union them in the input step. -keeps the flow tidy and simple by adding a single data input to the flow -there is no limit to the number of tables you can union with a wildcard search. *can only include up to 10 tables. otherwise limited to 10 tables -This feature applies only to Excel and .csv (text) files and database tables stored in Amazon Redshift, Microsoft SQL Server, MySQL, Oracle, and PostgresSQL.
Show profile plane (Most left)
-When selected, Tableau Prep shows the profile pane and the data grid. The profile pane view is the default and is useful for getting a sense of the data and the field value distributions.
Manually select fields to merge
-With the Airline field name selected in the Mismatched Fields section of the union pane summary, press the Control key. -Then select the Carrier field name in the list. -Right-click either of the field names. -Click Merge Fields. -With the fields selected, you can also click the Merge Fields button on the toolbar.
Data preparation in Tableau Prep
-process of getting your data cleaned, shaped, and combined so that it can be analyzed and shared. -to look for potential issues, examine the structure of your data in the profile pane and then identify nulls, outliers, and unexpected values.
Apply cleaning operations
-select a profile card, and then click an option in the toolbar or from the More options menu. You can also right-click the profile card to access the menu. -
Using the Common Characters method
1. ) Open the More options menu on the profile card where you wish to apply the group and replace operation, select Group and Replace, and then click Common Characters. 2.) Select a group marked by a paperclip to see its members. Modify the members according to your data needs using the slider to change the grouping parameters and manual selection. *The Common Characters algorithm does not consider pronunciation.
Create a union in a flow
1.) Connect to additional files. 2.)Create a union in the flow. a. Drag one of the new tables in the flow to the Calculations Jul 2019 week1 flights step. b. When the Union option appears, drag and release the table on Union 3.)Add other tables to the union. For both remaining tables, a. Drag the table to the union step in the flow, Union 1 in this example. b. When the Add option appears, drag and release the table on Add.
Filter data
Can exclude one value, keep a range of values, keep things that match a pattern, or filter a range of specific values. In Tableau Prep Builder, you usually use filtering to keep a subset of the data to perform more cleaning operations and exclude data that is not needed for the output.
Summary of join results
Click on the Excluded distribution bars in the summary. The unmatched rows for each table will display in the profile cards and the data grid.
Use drag and drop - options for merging
Click the profile card header of one of the fields you wish to merge, drag it to the profile card of one of the other fields, and drop when you see Drop to merge fields. Repeat this process for the other field.
Tableau prep workspace
Connections pane (A) where you connect to your data sources, and three coordinated areas that help you interact with and explore your data: Flow pane (B): A visual representation of your operation steps as you prepare your data. This is where you add steps to build your flow. Profile pane (C): A summary of each field in your data sample. See the shape of your data and quickly find outliers and null values. Data grid (D): The row level detail for your data.
Use data grid view - options for merging
Control + click the field headers, Then do one of the following: - Select Merge Fields on the toolbar. - Right-click one of the field headers to open the context menu and then select Merge. - Click the More options (...) icon on one of the field headers to open the context menu and then select Merge.
Use list view - options for merging
Control + click the field names, Then do one of the following: Select Merge Fields on the toolbar. - Right-click one of the field name fields to open the context menu and then select Merge. - Click the More options (...) icon next in one of the field name fields to open the context menu and then select Merge.
Use the toolbar - options for merging
Control + click the fields and then select Merge Fields on the toolbar. Tableau Prep Builder automatically renames the merged fields with the name of the first field you selected
use the context menu - options for merging
Control + click the fields. Right-click one of the field headers to open the context menu and then select Merge. Tableau Prep Builder automatically renames the merged fields with the name of the first field you selected
Aggregating non-numerical fields
Count and Count Distinct, Minimum and Maximum, Percentile
Group by selected field(s)
Double-click the field name(s) to select the field(s) to group You can also use other methods to select the aggregated field(s), including: - Drag fields to and between the desired panes. - Click Add All or Remove All to add or remove fields in bulk. - Search for fields, if there are many fields in your data. Set the level of grouping: - Click the drop-down menu to change the level of grouping. Alternatively, select the grouping level and field at the same time: -you can click the GROUP to select the grouping level for the desired field in the left pane. Tableau Prep Builder adds the field to the Grouped Fields pane and sets the grouping level at the same time.
Consider calculated fields
If one table in the union contains a calculated field, then the other tables will have null values in that field. You can create a new calculated field by copying and pasting the calculated field cleaning operation.
Understand the requirements
In order to successfully create a union with a wildcard search, the files must be in the same parent or child directory. -They must also have the same data source type and similar structures -specifying a wildcard match, it also helps if your files have similar names. -In order to successfully create a union using database tables, the tables must be in the same database and the database connection must support using a wildcard search to union. -Wildcard search will not search across schemas or across the database connection to find tables. -Same data source type
join profile
Join step: Click the join step in the flow plane to open the join profile The settings tab: Use this section of the join profile to view and configure the join clauses and join type as well as to view a summary of the join results. Join clauses plane: This pane allows you to see the values in each field in the join clause. Mismatched join clause values are displayed in red text. Join Results pane: Shows the distribution of values for each field in the joined data.
Which of the following correctly expresses a difference between joins and unions?
Joins require a shared field and unions require a similar structure
Determine which to use: joins versus unions
Joins: relates two or more tables into a single data structure. The tables must have a common field to define the relationship between the rows - Use a join when the data you want to analyze have a field in common but exist in separate tables or data sources - data sources do not need to be the same type - Other than the common field, the structures do not need to match. - Typically, a join adds fields (columns) to a data structure, making the data wider Union: appends rows from one table to another table. - require that the tables be similarly structured. -unions lengthen the data, making it taller. (add rows)
Group at different levels in one flow
Output at different grouping levels: - To preserve some data at different levels of detail, aggregate or output data in a branch of the flow and then output both levels of granularity. -Ex: we could output the data before and after aggregating it, so we have both daily and monthly data for different audiences or analyses.
Filter using a wildcard match
Scenario: you have a field with email addresses and you only want to see email addresses associated with a specific organization. Or perhaps you are trying to exclude values that match a certain pattern The wildcard match filter is perfect for these types of scenarios. - Click More options (...), Filter then Wildcard Match. - Type in your string, and select a match option.
child directory
See subdirectory. A directory or folder contained in another directory or folder. Also called a child directory or folder.
Which of the following options can you use to merge mismatched fields after a wildcard union?
Select multiple profile cards in the profile pane and merge fields
A note about nulls
Some methods of mishandling null values can create serious reporting errors. Converting null to 0 could be detrimental to a clean data set. A value of null could mean that nothing was sold, that the shipment didn't arrive, or that someone forgot to input the data into that field. Nulls may have different meanings; for example, a null in a middle name might mean that the person doesn't have one, or that we don't know what it is. -Null values can also be removed by selecting More Options (...), and selecting Filter, Null Values. - Select the Non-null values option to remove nulls. (Keeps non null values -use the filter dialog box to select one or more values to include or exclude from your data set.
Filtering values
String: Calculation, Selected Values, Wildcard Match, Null Values Number: Calculation, Selected Values, Range of Values, Null Values Date/Time: Calculation, Selected Values, Range of Dates, Relative Date, Null Values
Grouping levels
Tableau Prep Builder uses grouping levels to define the level of granularity of aggregated data. - There are many different grouping levels for date fields, which all fit into two categories: discrete date parts and continuous date values. -Discrete date parts change the granularity so your data is aggregated to the level of a specific part of a date, such as the Month - Continuous date values change the granularity so your data is aggregated to the level specified based on a continuous timeline. This means that Month Start will use every unique month and year combination in the data as a single row,
Show list view
Tableau Prep shows a list of just the field names, their data types, and if any data preparation changes have been made to them. -In this view, you can perform cleaning operations on multiple fields at the same time. -The list view is especially useful if your data set has a lot of fields, which would require a lot of scrolling to see in the profile pane, and you want to quickly examine the field names and data types.
Show data grid
Tableau Prep shows the data grid (without showing the profile pane). -provides a focused look at the row level details of your data -specially useful if you want to show only the rows for a value of a field selected in the profile pane.
View the Union pane
The Merge Fields button on the toolbar: When you select related fields in the profile pane, as in this example, the merge fields button is displayed on the toolbar, allowing you to merge fields Union pane: Listed data inputs for the union, also information about the resulting fields such as the number of resulting fields and which of the fields are mismatched. Fields are color coded Profile names card: The card highlights the data that came form that table Colors in the field profile card: Indicate the table the field is from. legend is under the inputs
Adding descriptions to cleaning operations
To add descriptions to changes, right-click the operation and select Add Description -Add descriptive content to the open dialog, then click Enter. -To make changes to the a description, right-click the operation and select Edit Description or Delete Description.
Aggregating Use cases
To match other data structures: Aggregate data when you want to combine it with another table, but the rows of one table are not at the same granularity as the rows in the other table. Improve performance: Aggregating the data will roll up rows of data together into fewer rows of data. When you reduce the size of your data output file, you will see faster query times, reduction of data read, and faster load times of visualizations. Less detail needed: if your analysis does not need that level of granularity.
Remove fields
To remove the field, select the field profile card, click the ellipses (...) to see the More options menu, and then click Remove. -You can remove multiple fields simultaneously by pressing the Control key before clicking the desired profile card headers. Then open the More options menu on one of the selected profile cards, and click Remove. -Alternatively, after selecting the desired profile cards, click Remove Fields on the toolbar.
Which of the following would be one reason to change the data type of the Order Date field from String to Date?
To see the data distribution.
Spaces
Trim Spaces: Removes spaces from the beginning and end of each field. -Remove Extra Spaces: Removes spaces at the beginning and end, double or more spaces are reduced to single spaces, single spaces in the midst of the string remain. -Remove All Spaces: Removes every space within a field. -You can also clean your data using a pivot step or a script step to apply R or Python scripts to your flow.
Unexpected values
Unexpected values may indicate inconsistent or inaccurate data that might need cleaning, for example, a date in the previous century when the data should only capture rows from the 2000's.
Edit incorrect values (Data entry errors of field name)
We can add a cleaning step after the join and edit the value there, or we can edit the value in-line in the Join Clauses pane. In the Join Clauses pane, right-click Reeds and select Edit Value on the context menu.
Merge recommended fields
When Tableau Prep Builder can determine potential matches for mismatched fields, the union pane summary will highlight those fields. To use the highlights to merge mismatched fields: 1.Select the field name in the Mismatched fields section of the union pane summary. 2. Hover over the highlighted field until you see a plus (+) sign. 3. Click the plus sign (+) to merge the fields.
Change data types to match
When fields with data types that don't match are going to be merged after a union, Tableau Prep Builder will first change the data types of the fields so they're compatible. When Tableau Prep Builder makes this change, the profile card for the merged field has a change data type annotation in the upper right hand corner of the profile card. If Tableau Prep Builder has assigned the incorrect data type to this field, separate the fields, change their data types to match, and then merge them again. To do this: -Right-click the change data type icon. -Click Separate Inputs with Different Types. -Now that the fields are separated, make the changes to the data type accordingly -Merge the fields again using any of the methods previously described.
You are interested in the percentage of flights that have originated in California that also had a destination in California. What is the quickest way to determine that? Select the best option.
When one value (such as the origin state of California) is selected in the profile pane, the related values in the profile pane are shown in blue. Hovering over a related value (such as the destination state of California) will display the percentage of related rows.
Highlight related and identical values
When you select a distribution bar or an individual value in the profile pane, two things happen: -The related values in the other profile cards are highlighted in blue. This highlighting shows the relationship distribution between the value you selected and the values in the other fields. -The data grid at the bottom of the workspace is filtered to show the row level detail for the selected data. -If you select a value in the data grid, all identical values for that field will also be highlighted in the data grid.
Filter using a calculation
Within a Clean step, select Calculation. The Add Filter dialog box opens. Enter the calculation, verify that it's valid, and click Save. - Calculation editor can also be accessed via the Filter Values button.
Merge fields in the profile pane
You can also merge fields using the profile pane in the following ways: -Click, drag, and drop the profile card for a field onto the profile card for another field. See the following animation for an example. - Use the toolbar: Press the Control key and then click the profile cards for the fields you want to merge. Then click Merge Fields that appears on the toolbar. See the following animation for an example. -Use the context menu: Press the Control key and then click the profile cards for the fields you want to merge, right-click one of the profile card headers, and click Merge Fields in the context menu.
standard deviation
a computed measure of how much scores vary around the mean score
parent directory
a directory that holds other directories
Directory
a location for storing files on your computer. Directories are found in a hierarchical file system, such as Linux, MS-DOS, OS/2, and Unix.
data validation
a necessary part of the iterative analytic process because it allows you to see if you have the needed fields to complete your analysis. also allows you to verify that your data preparation operations have been successfully applied.
When to use a Clean Step
added at any point to the flow. -can also make separate cleaning steps if it makes sense to group some types of cleaning in a different step. -The best place for a cleaning operation is within its own step. However, with Tableau Prep Builder, you can also choose to perform cleaning operations in most step types in your flow (such as joins and unions) when doing so makes the most sense.
distribution of field values
can help you identify if your data seems correct and complete. -shape of the distribution should make sense based on your interpretation of the data -For instance, if you are viewing standardized test scores, you would not expect to see negative values or a majority of observations in the lowest bins of scores. Rather, you would expect positive values only and for the scores to follow a standard distribution.
Tableau Prep profile pane
contains information about how your data is structured, including the number of fields (or columns) and the number of rows (or records). For each field, there is a profile card where you can see: -number of unique values in the data set. -The field's data type -The field's data role, if there is one, such as a URL, email address, or zip code or postal code, which can help you find invalid entries in your data set.
Detail view
each unique field value is listed.
The Tableau Prep sort options (Bars icon)
enable you to compare values in the following ways: -By count (or frequency) in descending or ascending order -By domain: the individual values sequentially in alphabetical, numerical, or chronological descending or ascending order
Common join issues
following will result in Tableau Prep Builder reading the values as a mismatch and excluding them from the join: -Inconsistent capitalization, punctuation, and use of abbreviations -Misspellings or typos -Extra spaces or separators
Automatic Split and Custom Split
functions work by separating string field values based on a delimiter or repeated pattern of values. -click the More options menu on a field to open the cleaning menu and select Split Values. -To automatically split a field, simply select Automatic Split from the Split Values menu. Tableau Prep Builder will instantly perform the split at a logical location. You do not have the option to decide where the split will be. -To have more control over where the split takes place, select Custom Split, and use the Custom Split editor to configure your split. -Split and Custom Split only work on String fields. If you want to split a Date or a Date & Time field, change the field's data type to a String.
Granularity
level of granularity describes what a row represents, or the level of detail in the data.
Tableau Prep search options (Mag glass incon)
look for values or parts of values in different parts of a field including: Contains Starts with Ends with Exact match Does not contain
Join clauses pane
pane allows you to see the values for each field in the join clause. Mismatched join clause values are displayed in red text. -To see only the mismatched values, select Show only mismatched values at the top of the pane. -
Performing quick cleaning string operations
select Clean to access the options: Make Uppercase and Make Lowercase can be implemented as quick and easy ways to clean your data. Numbers, spaces, and punctuation are not affected by the Make Uppercase and Make Lowercase functions.
Fix mismatched fields after a union in a flow
several ways to fix mismatched fields in a Union step: -Rename a field in the profile card. -Merge fields in the union pane summary. -Merge fields in the profile pane. -To make it easier to view and fix mismatched fields, check Show only mismatched fields in the union pane. Tableau Prep Builder then displays only the mismatched fields in the profile pane and data grid. -Rename a field in the profile card if the fields are the same but have different names -You can also merge mismatched fields in the Mismatched Fields section of the union pane summary.
Variance
standard deviation squared, a measure of dispersion that takes into account the spread of all data points in a data set.
Nulls
value is unknown or invalid.: A null is not the same as a value of 0 (zero), nor is it the same as an empty string.
Outliers
values that are unusual compared to other values in the data for that field, for example, a negative number in a field primarily made up of positive numbers.
Summary and Detail
view to look for unexpected values
Create the union *using files as the data sources
1.) Connect to the file in Tableau Prep Builder. An input step for the table is added to the flow pane. The data grid indicates that the field names are not recognized and that there are many null values. Use the data interpreter. Now the field names and values look as expected. 2.)Select Wildcard union in the Input pane. 3.) By default, Tableau Prep Builder plans to search for additional files in the same location as the first file. You could use the Search in field to browse to other locations. 4.) Choose the folders and files you wish to include in the search. - If you wanted to include files in the sub-folder of the parent directory, you would select Include subfolders. - Decide whether to include or exclude the files that match your search criteria. Selecting Exclude means that files that match the wildcard pattern will be excluded in the search, and that all files that do not match the wildcard pattern will be included instead. 5.) Now enter a wildcard search pattern to find files that have the specified characters in the file name, followed by the asterisk (*) wildcard character. If you wanted to return all files within the directory, you would leave this field blank. 6.) Choose whether to include sheets within the files you've included in the search. Selecting Include will include sheets matching the wildcard pattern in the search. Selecting Exclude means sheets that match the wildcard pattern will be excluded in the search, and that sheets that do not match the wildcard pattern will be included instead. 7.) Press Enter and review the results of the wildcard search. 8.) Apply the union. The input step icon and the Apply button indicate that the union has been applied. *If the data changes while you are working in a flow, you can update the wildcard union by right-clicking the input step with the wildcard union configured, and then clicking Refresh.
Using the Pronunciation method
1.) From the More options menu, select Group and Replace, and then select Pronunciation to open the Group and Replace-by Pronunciation dialog box. Recommended groups are represented by the paperclip next to the value. 2.) Select each group to view how the algorithm has categorized the group. Remove the check mark next to any group member to remove it, or select any unchecked member to add it. 3.) Algorithm results should always be reviewed and the resulting groups adjusted manually. They are a starting point for data cleaning, and rarely the end result needed to replace values.
Filter your data to a specific range of values and dates
1.) Open the More options (...) menu on the profile card where you want to apply the filter, select the Filter menu item, and then choose Range of Values or Range of Dates. 2.) Use the filter dialog box to specify the range.
Editing in the profile pane
1.) Right-click (or double-click) the value you would like to change and select Edit Value. 2.) Type the value you would like to change it to and press Enter. 3.) A paperclip annotation indicating the value has a group and replace operation performed appears in the following locations: - Paper clip icon is what details the changes - Next to the value in the profile card - In the header of the profile card - Above the step in the profile pane - Next to the corresponding change in the Changes pane
Review the results of the union
1.) To review the results of the union, add a cleaning step. 2.) Toggle to list view to see the field names more easily. Mismatched fields must be merged in a cleaning step. They cannot be merged in the input step where the wildcard union is applied.
Merge mismatched fields
1.Reorder the profile cards to make it easier to compare the values. In profile pane view, press the Control key and then select the profile card headers. Drag the card headers to the left side of the profile pane and release when you see a black line 2.) Now that you've confirmed that the fields share common values, you are ready to merge them. You first try to merge the fields by renaming one of the field names to match one of the others, a method you used to merge mismatched fields after unioning tables in a union step. However, using this method after a wildcard union will result in an error, regardless of the view where you attempt the change. The error dialog box states that a field with the same name already exists. 3.) You now need to explore other options for merging mismatched fields in the cleaning step after a wildcard union.
Which of the following is the first step of the workflow to fix mismatched fields after a wildcard union?
Add a cleaning step to evaluate the union results
Results of merging
After merging the mismatched fields using any of the methods described above, you see only one field. An annotation icon above the field name indicates that it is a merged field.
Keep using created views
After you have performed more data preparation operations in Tableau Prep Builder, preview your flow again in Tableau Desktop to create a new extract. Close the new blank Tableau Desktop that opens as a result of running the flow. In the open visualization you have already created, click the New Data Source button on the toolbar. Navigate to the Datasources folder of your Tableau Prep Builder repository and open the .hyper extract you just created. Go back to the worksheet and replace the data source. If you have visualizations that reference fields that have changed, those fields will be shown in red, and you will need to update the field references.
Aggregation
Aggregation refers to how multiple values in a single column of data are combined together. There are different mathematical methods to aggregate data, such as counting the values, determining the median (middle) value, or calculating the sum (total), etc.
Validate a cleaning operation
As you perform cleaning operations on your data, you can validate their success in the data grid. Select a change in the Changes pane to see its corresponding column in the data grid.