MASTER- Salesforce Data Architecture and Management Designer
True or False: The action you select for your Duplicate Rule applies to both Create and Edit actions.
False. You can choose to take a different action for Creates versus Edits.
How do you enable Duplicate Management for Person Accounts?
First enable Person Accounts, then define and activate the Matching and Duplicate Rules for Person Accounts.
Will Duplicate Rules run for all users or all records?
It depends on whether or not optional "Conditions" have been defined for the Duplicate Rule.
Describe Parallel Mode within the Bulk API
It is enabled by default. It allows for faster loading of data by processing batches in parallel.
What is Multitenancy?
Multitenancy is a means of providing a single application to multiple organizations from a single hardware/software stack.
Duplicate Management uses Data.com technology. Because of this, is a Data.com license required?
No.
What is PK Chunking?
PK Chunking (or Primary Key Chunking) is a strategy for querying large data sets. PK Chunking is a feature of the Bulk API that splits a query into chunks of records with sequential record Ids (i.e. the Primary Keys). Ids are always indexed, so this is an efficient method for querying large data sets.
This is enabled for the Bulk API by default.
Parallel Mode
What are the Data Quality Analysis Dashboards?
Provided for free by Salesforce Labs via the AppExchange, Data Quality Analysis Dashboards leverage custom formula fields on many standard objects to record data quality and completeness. The formulas are then depicted via the dashboards to identify deficiencies in the data.
True or False: You can compare matches across objects (ex. Compare Contacts to Leads).
True
True or False: Picklist fields are available on Skinny Tables.
True.
True or False: Skinny Tables cannot contain fields from other objects.
True.
True or False: Text Area (Long) fields are available on Skinny Tables.
True.
When defining a Duplicate Rule, what options are available with respect to Record-Level Security?
1.) Enforce Sharing Rules 2.) Bypass Sharing Rules
For the Query Plan Tool, what are four Leading Operation Types?
1.) Index 2.) Sharing 3.) Table Scan 4.) Other (optimizations internal to Salesforce)
How many Duplicate Rules can exist per object?
5
What is a compound WHERE clause condition?
i.e. WHERE x AND y
What actions are available for Duplicate Rules?
1.) Allow 2.) Block
When the Allow action is selected, you can choose to also enable...
1.) An Alert 2.) Reporting (which includes the record created/edited and all its potential duplicates).
What operations are likely to cause lock contention, and as a result, require data loads to be run in Serial Mode via the Bulk API?
1.) Creating New Users 2.) Updating User Roles 3.) Updating Territories 4.) Changing ownership for records with a Private sharing model
If a trigger causes a change on an object that the current user does not have access to edit...
... the change is not tracked because field history honors the permissions of the current user.
List several key attributes with respect to determining Data Quality.
1.) *Age* (when were the records last updated?) 2.) *Completeness* (i.e. make a list of the fields that are required for each business use. Then, run a report that shows the percentage of blanks for these fields. 3.) *Accuracy* (appExchange apps for data quality can be used to help determine accuracy against a trusted source). 4.) *Consistency* (Reports can show the variations for each true value for a given field). 5.) *Duplication* 6.) *Usage*
List some differences with the Force.com Query Optimizer compared to traditional relational database optimizers.
1.) *Multitenant Statistics* - Because Salesforce is a multitenant environment, Salesforce keeps tenant-specific statistics to provide insight into each tenant's data distribution. 2.) *Composite Index Joins* - The Force.com Query Optimizer considers the selectivity of the single-column indexes alone, as well as the intersected selectivity that results from joining two single-column indexes. 3.) *Sharing Filters* - The Force.com Query Optimizer considers the selectivity of sharing filters alongside traditional filters (i.e. the WHERE clauses) to determine the lowest cost plan for query execution.
What are some other best practices for optimizing your data load performance?
1.) Introduce bypass logic for triggers, validation rules, workflow rules (but not at the cost of data integrity) 2.) Defer Sharing Calculations 3.) Minimize the number of fields loaded for each record. Foreign key, lookup relationships, and roll up summary fields are likely to increase processing times. 4.) Minimize the number of triggers where possible. Also, where possible, convert complex trigger code to Batch Apex that processes asynchronously after data is loaded
List some standards that are typically included in a Data Management Plan?
1.) Naming 2.) Formatting 3.) Workflow 4.) Quality 5.) Roles and Ownership 6.) Security and Permissions 7.) Monitoring
What are some exceptions to Index Selectivity that will result in an efficient index not being used (hint: non-optimized operators)?
1.) Negative filter operators (i.e. !=, NOT LIKE, EXCLUDES) 2.) Comparison operators paired with text fields 3.) Leading % wildcards 4.) References to non-deterministic formula fields (i.e. cross-object formula fields)
For which objects can Duplicate Management be established?
1.) Person Accounts 2.) Business Accounts 3.) Contacts 4.) Leads 5.) Custom Objects
For what fields does Salesforce automatically maintain indexes?
1.) RecordTypeId 2.) SystemModStamp 3.) CreatedDate 4.) Id 5.) Division 6.) Name 7.) Foreign Keys (Lookups and Master-Detail fields) 8.) Email (Leads and Contacts)
What are some techniques for dealing with problems related to Lookup Skew?
1.) Reduce record save time (i.e. increase save performance, optimize trigger/class code, reduce workflow, consider asynchronous operations, etc.) 2.) Consider a Picklist field instead of a Lookup field 3.) Distribute the skew 4.) Reduce the load (i.e. from automated processes and integrations running concurrently)
What features of Salesforce can be used to enforce Data Quality?
1.) Required fields 2.) Validation rules 3.) Workflow rules 4.) Page layouts 5.) Simple dashboards 6.) Data enrichment tools (Data.com) 7.) Duplicate Management 8.) Custom field types 9.) State and Country Picklists
In what order does Salesforce perform indexed searches?
1.) Searches the indexes for appropriate records 2.) Narrows down the results by access permissions, search limits, and other filters, creating a result set 3.) Once a result set reaches a predetermined size, all other records are discarded 4.) Finally, the result set is used to query the records from the database to retrieve the fields that a user sees
What data types can be External Ids?
1.) Text 2.) Email 3.) Auto-Number 4.) Number
Which data types cannot be indexed?
1.) Text Area (Long) 2.) Text Area (Rich) 3.) Multi-Select Picklist 4.) Non-Deterministic Formulas 5.) Encrypted Text
What are the main areas of the application that are impacted by differing (or suboptimal) architectures in implementations with large data volumes?
1.) The loading or updating of large numbers of records, either directly or with integrations. 2.) Extracting records using reports, list views, or queries.
What are two side-effects of the way that Salesforce stores customer application data?
1.) Traditional performance-tuning techniques will show little to no results. 2.) You cannot optimize the underlying SQL of the application because it is generated by the system, and not written by each tenant.
What does the Bulk API do when it encounters locks?
1.) Waits a few seconds for the lock to be released. 2.) If lock is not released, record is marked as failed. 3.) If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later. 4.) When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed. 5.) The batch will be tried again up to 10 times before the batch is marked as failed. 6.) As some records may have succeeded, you should check the results of the data load to confirm success/error details.
When might Duplicate Rules NOT run?
1.) When records are created via the Quick Create 2.) On Lead Convert in an org where the "Use Apex Lead Convert" is disabled 3.) Records are restored using the Undelete button 4.) Records are added via Lightning Sync 5.) Records are manually merged 6.) Records are created via the Community Self-Registration 7.) A Self-Service user creates a record and the Duplicate Rule contains conditions based on the User object 8.) Duplicate Rule Conditions are set for lookup fields and records with no value for these fields are saved.
With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing.
10 minutes.
What is the default size for a PK Chunk?
100,000 records
How many columns can a Skinny Table contain?
100.
How long might it take before the text in a searchable object's created or updated record is searchable?
15 minutes or more
For how long is Field History data retained?
18 months.
When loading data via batches, if more than N unprocessed requests/batches from a single organization are in the queue, additional batches from that organization will be delayed while batches from other organizations are processed.
2,000
What is the maximum size for a PK Chunk?
250,000 records
Duplicate Rules can be associated to N Matching Rules.
3, and they must be of different objects.
With Duplicate Management, what is a Matching Rule?
A rule that determines how duplicates are identified.
With Duplicate Management, what is a Duplicate Rule?
A rule that determines the behavior that occurs when a record being saved has been identified as a possible duplicate.
For what objects are Skinny Tables available?
Account, Contact, Opportunity, Lead, Cases, and Custom Objects.
Standard duplicate rules for these objects are set up and activated by default.
Accounts, Contacts, and Leads
With respect to data loads, how can you optimize batch sizes?
All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.
When the Block action is selected, this is enabled by default.
An Alert.
What is the first step to improving data quality?
Develop a Data Management Plan, which typically includes standards for creating, processing, and maintaining data.
Why would Skinny Tables be needed?
Behind the scenes, for each object, Salesforce maintains separate tables for standard fields and custom fields. Normally, when a query or report contains both types of fields, a join would be needed between these two behind-the-scenes tables. A Skinny Table, which could contain standard and custom fields for an object, would eliminate the need for those joins.
How can you organize data load batches to avoid risks of lock contention?
By organizing the data by parent Id. Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example, in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember records referencing the same Account Id are in the same batch.
How do you create a Skinny Table for an object?
Contact Salesforce Support.
What is a best practice for querying a supported object's share table using PK Chunking?
Determining the chunks is more efficient in this case if the boundaries are defined on the parent object record Ids rather than the share table record Ids. So, for example, the following header could be used for a Bulk API query against the OpportunityShare object table using PK Chunking: Sforce-Enable-PKChunking: chunkSize=150000; parent=Opportunity
What is Index skew?
Essentially similar to lookup skew, when a large number of records point to the same index.
What must happen when you match across objects?
Establish field mapping between the two objects.
What custom field type is automatically indexed when created?
External Id
True or False: Formula fields are available on Skinny Tables.
False.
True or False: Lookup fields are available on Skinny Tables.
False.
True or false: Tracked fields are automatically translated.
False.
True or False: Deleted records cannot impact query performance.
False. Add isDeleted = False to your queries, or empty your recycle bin!
How does Salesforce store the application data for each organization?
In a few large database tables that are partitioned by tenant and serve as heap storage. The platform's runtime engine then materializes virtual tables based on the customer's metadata.
What is a symptom of Index Skew?
Index row lock (when two updates occur at the same time and the index, which needs to be rebuilt, is large).
What does Salesforce do when providing its CRM to a new customer?
Instead of providing a complete set of hardware/software resources to an organization, Salesforce inserts a layer of software between the single instance and other customer deployments. This layer is invisible to the organizations, which only see their own data and schemas while Salesforce reorganizes the data behind the scenes to perform efficient operations.
What does the Force.com Query Optimizer do when a query has a compound WHERE clause?
It considers the selectivity of the single-column indexes alone, as well as the intersected selectivity that results from joining two single-column indexes.
What is the most efficient chunk size (recommended) for an organization?
It depends on a number of factors, such as the data volume, and the filters of the query. Customers may need to experiment with different chunk sizes to determine what is most optimal for their implementation. 100,000 records is the default size, with chunks able to reach 250,000 records per chunk, but the increase in records per chunk means less efficiency in performance.
With respect to Lookup Skew, what is an alternative to having a "catch-all" lookup value?
Leave the value blank, which will reduce/eliminate the skew.
Why is Lookup Skew bad?
Lookups are foreign key relationships between objects. When a record is inserted or updated, Salesforce locks the target records in each lookup field to ensure that data integrity is maintained. Locks can occur when you try to insert or update records in a LDV environment where lookup skew exists.
The Force.com Query Optimizer will use an index on a custom field if the filter:
Matches less than 10% of the total number of records for the object, up to a maximum of 333,333 records.
The Force.com Query Optimizer will use an index on a standard field if the filter:
Matches less than 30% of the first million records and less than 15% of the remaining records, up to a maximum of 1 million records.
How does Salesforce ensure that tenant-specific customizations do not breach the security of other tenants or affect their performance?
Salesforce uses a runtime engine that generates application components for each organization using the customer's metadata.
What is the format of the header to include in the Bulk API to enable PK Chunking?
Sforce-Enable-PKChunking
When using PK Chunking, how would you specify the chunk size in the header?
Sforce-Enable-PKChunking: chunkSize=100000;
Describe considerations with respect to Skinny Tables and Sandboxes.
Skinny Tables are copied to Full Copy Sandboxes, but not for other Sandboxes. If needed in other Sandboxes, contain Salesforce Support.
What are Skinny Tables?
Skinny tables are tables created by Salesforce that contain frequently used fields in order to avoid joins and increase performance when running reports and queries.
What should you do if you change a little less than 25% of your data set for a LDV object and you notice slower query or report performance?
Submit a case to Salesforce Premier Support to see if a manual statistics recalculation for select objects in your org can return operation to peak performance.
What should you always do to prepare for a data load?
Test in a Sandbox environment first.
What is the Data.com Assessment App?
The Data.com Assessment App helps customers understand the overall health of their data.The app can be used to analyze Account, Contact, and Lead records in order to gain insights on data completeness and quality.
What is the Force.com Query Optimizer?
The Force.com Query Optimizer works behind the scenes to determine the best path to the data being requested based on the filters in the query. It will determine the best index from which to drive the query, the best table from which to drive the query if no good index is available, and more.
What is the Query Plan Tool, and where is it located?
The Query Plan Tool is a tool to help optimize and speed up queries over large volumes. The Query Plan Tool can be found/enabled within the Developer Console.
Describe indexes and tables.
The Salesforce architecture makes the underlying data tables for custom fields unsuitable for indexing. Therefore, Salesforce creates an Index Table that contains a copy of the data, along with information about the data types. By default, Index Tables do not include records that are null (with empty values), however you can work with Salesforce to include these if needed.
For the Query Plan Tool, what is the Cardinality?
The approximate # of records returned by the plan.
For the Query Plan Tool, what is the Cost?
The cost of the query compared to the Force.com Query Optimizer's selectivity threshold.
For the Query Plan Tool, what is sObject Cardinality?
The estimated total size/volume/rows of the sObject table
If an index is not available for a field in a filter condition...
The only alternative is to scan the entire table/object, even when the filter condition uses a optimizable operator with a selective value.
For the Query Plan Tool, what is the Leading Operation Type?
The primary operation type that Salesforce will use to optimize the query.
For the Query Plan Tool, when the Cost is above 1, it means that...
The query will not be selective.
For the Query Plan Tool, what is sObject Type?
The sObject (i.e. Account)
With respect to Field History Tracking, what happens to fields with more than 255 characters?
Their changes are tracked as edited, but the old/new values are not recorded.
What are the trade-offs with respect to Parallel Mode?
There is risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions.
What is better? Performing Upserts, or performing Inserts followed by Updates?
Upserts are more costly than performing Inserts and Updates separately. Avoid Upserts when it comes to large data volumes.
When loading data with parent references into Salesforce, what is more efficient? Using an External Id or a Salesforce Id?
Using an External Id has additional overhead in that it performs a kind of "lookup" to find the record, whereas this additional overhead does not occur (or is bypassed) when using the native Salesforce Id.
What is Lookup Skew?
When a very large number of records point to the same record in the lookup object.
When should you use Serial Mode versus Parallel Mode?
When there is risk of lock contention and you cannot reorganize the batches to avoid these locks.
When should you use a Picklist field instead of a Lookup field?
When you have a relatively low number of values.
When would you use PK Chunking?
When you need to query or extract 10s or 100s of millions or records, for example, when you need to initially query an entire data set to setup a replicated database, or if you need to query a set of data as part of an archival strategy where the record count could be in the millions.
When should you use Parallel Mode versus Serial Mode?
Whenever possible, as it is a best practice.
The platform automatically recalculates optimizer statistics in the background when...
Your data set changes by 25% or more.