dbt Certification Practice Exam Questions

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

B: check_cols

A company wants to use the check strategy to detect changes in their product inventory. Which configuration must be specified to use the check strategy in dbt? A: updated_at B: check_cols C: unique_key D: None of the above

C: source_status:stale Explanation: To analyze data sources that have become stale compared to their previous state, you should use the 'source_status:stale' selector in your dbt command. This selector will enable the data analyst to focus on the data sources that have become less fresh compared to their previous state.

A data analyst in your team wants to analyze data sources that have become stale compared to their previous state. Which selector should you use in your dbt command to meet this requirement? A: source_status:fresher B: source_status:fresher+ C: source_status:stale D: source_status:stale+

A: The timestamp strategy Explanation: The timestamp strategy provides better handling of changes to the source data, including additions and deletions of columns, compared to the check_cols strategy. Using the timestamp strategy is considered a best practice for snapshot configurations.

A data analyst wants to create a snapshot configuration to track changes in a customer table that contains sensitive information. They want to ensure that the snapshot configuration provides the best handling of changes to the source data, including the addition and deletion of columns. Which strategy should they use when working with snapshot tables in dbt? A: The timestamp strategy B: The check_cols strategy C: The unique key strategy D: The hard delete strategy

B: Snapshot the data in its raw form and use downstream models to clean up the data.

A data analyst wants to use snapshots in their dbt project to track changes in a dataset. What is the best practice for snapshotting the data in dbt? A: Snapshot the data after it has been cleaned and transformed by downstream models. B: Snapshot the data in its raw form and use downstream models to clean up the data. C: Snapshot the data after it has been cleaned and transformed, but before it has been joined with other tables. D: C and A

C: The unique_key is not truly unique in the data.

A data engineer is working on an incremental model in dbt using the delete+insert+merge strategy. However, the incremental model run is failing with a unique key violation error. What could be the reason for this error? A: The unique_key is not defined in the model. B: The unique_key is not used in the delete+insert+merge strategy. C: The unique_key is not truly unique in the data. D: The delete+insert+merge strategy is not appropriate for the data.

C: Use a dbt seed to load the department data into the database Explanation: To load a CSV file with reference data into a database in a dbt project, the correct approach is to use a dbt seed. This allows the data team to load the department data into the database and use it in their transformation logic.

A data team is setting up a new dbt project to transform their company's employee data. They have a CSV file with a list of department codes and descriptions that they want to load into their database as reference data. Which of the following is the correct approach for this scenario? A: Use a dbt snapshot to load the department data into the database B: Write a custom Python script to load the department data into the database C: Use a dbt seed to load the department data into the database D: Write a set of SQL queries to insert the department data into the database

C: Use a surrogate key to condense many columns into a single column

A data team wants to reduce the number of columns that need to be checked in their database when using the check strategy in dbt. What is the recommended technique they can use to achieve this? A: Include all columns in the check_cols configuration B: Exclude columns from the check_cols configuration C: Use a surrogate key to condense many columns into a single column D: None of the above

B: dbt snapshots can be used to keep a historical record of changes to patient data, including medical history and medication information. By using snapshots, the healthcare company can implement type-2 Slowly Changing Dimensions (SCDs) to track changes to rows in mutable source tables over time, making it easier to track the evolution of patient data over time.

A healthcare company wants to keep a historical record of changes to their patient data, such as changes to their medical history and medication information. which feature of dbt is the best for this scenario? A: dbt historical cannot be used for tracking changes in patient data as it violates HIPAA regulations. B: dbt snapshots can be used to keep a historical record of changes to patient data, including medical history and medication information. C: dbt data lineage can only be used for tracking changes to dimension tables in data warehousing and cannot be used in healthcare settings. D: dbt incremental can be used for tracking changes to patient data but only if the patient has given their explicit consent.

B: Run the snapshots as a different user or role in the warehouse

A team is working on a dbt project where they are using snapshots to track changes to their product inventory. They want to prevent accidental deletion of their snapshots. What is the recommended approach for setting privileges on the snapshot tables? A: Run the snapshots as the same user or role as other models in the warehouse B: Run the snapshots as a different user or role in the warehouse C: Don't set any privileges on the snapshots D: Give all users full access to the snapshots

A: The unique_key is not set up correctly in the configuration.

Alex is a data engineer working on a dbt project that uses the merge strategy with a defined unique_key. He notices that rows matching the unique_key are not being replaced with new data as expected. Which of the following reasons could explain this issue? A: The unique_key is not set up correctly in the configuration. B: The merge strategy is not suitable for handling unique keys. C: The unique_key must be combined with another strategy for it to work correctly. D: The dbt_project.yml file is missing or improperly configured.

D: Both sets of hooks are executed, and the SQL statements defined in both will be executed in the order they were defined.

Alex is working on a dbt project and has defined several hooks for the models in both the dbt_project.yml file and in the config block of a specific model. He wants to know the order in which the hooks will be executed. How does dbt execute hooks defined in both the dbt_project.yml file and in the config block of a specific model? A: Only the hooks defined in the config block of a specific model are executed. B: Only the hooks defined in the dbt_project.yml file are executed. C: Hooks are executed randomly, regardless of where they are defined. D: Both sets of hooks are executed, and the SQL statements defined in both will be executed in the order they were defined.

B: Web-based UI, hosted environment, and differentiated features

Anna is considering using dbt for her data transformation project and wants to understand the differences between dbt Core and dbt Cloud. What are some of the benefits of using dbt Cloud over dbt Core? A: Lower cost and more extensive customization options B: Web-based UI, hosted environment, and differentiated features C: Access to a larger open-source community and more extensive documentation D: Higher performance and greater compatibility with various data warehouses

B: The tarball URL and the subfolder name where the package source code is installed. Explanation: To install packages using tarball URLs hosted internally, you need to specify the URL of the tarball and the subfolder name where the package source code is installed within. This information will allow dbt to fetch and install the package from the internal location, ensuring that the correct version and source code are used in your project.

As a data engineer working on a dbt project, you need to install a package from a tarball URL hosted internally within your organization. What information do you need to provide in order to install this package? A: The tarball URL and the package version. B: The tarball URL and the subfolder name where the package source code is installed. C: The tarball URL and the branch name of the package repository. D: The tarball URL and the name of the data warehouse used in the project.

B: Using environment variables to load credentials.

As a data engineer, you are concerned about securely storing credentials for your dbt project. What method is considered more secure for storing credentials in dbt? A: Storing credentials directly in the profiles.yml file. B: Using environment variables to load credentials. C: Saving credentials in a separate plain-text file within the project directory. D: Storing credentials in the dbt_project.yml file.

B: By default, dbt expects model files to be located in the models subdirectory of your project. You can change this by updating the model-paths configuration in your dbt_project.yml file.

By default, where does dbt expect model files to be located in a dbt project, and how can you change this? A: By default, dbt expects model files to be located in the project root directory. You cannot change this. B: By default, dbt expects model files to be located in the models subdirectory of your project. You can change this by updating the model-paths configuration in your dbt_project.yml file. C: By default, dbt expects model files to be located in a subdirectory named after the target database. You can change this by updating the model-paths configuration in your dbt_project.yml file. D: By default, dbt expects model files to be located in a subdirectory named after the target schema. You can change this by updating the model-paths configuration in your dbt_project.yml file.

B: No, you do not need to create the target schema before running dbt. dbt will check if the schema exists when it runs, and create it if it does not exist.

Do you need to create a target schema before running dbt? A: Yes, you need to create the target schema before running dbt. B: No, you do not need to create the target schema before running dbt. dbt will check if the schema exists when it runs, and create it if it does not exist. C: You only need to create the target schema if you are running dbt on-premises, but not if you are using dbt Cloud. D: It depends on the version of dbt you are using.

select * from raw.ticket_tailor.issued_tickets

Examine this YAML file and SQL query, what will this SQL query compile to?

C & D: Create 1 job, run source freshness and dbt build sequentially

Given the following models and their dependencies, the requirement is to check freshness before running any model. Choose 2 steps to achieve the requirement: A: Create 2 jobs B: Run source freshness --select source:source_1 and dbt build source:source_1+ sequentially in 1 job. Run dbt build source:source_2+ in another job C: Run source freshness and dbt build sequentially D: Create 1 job

A & B: It allows for easy dependency tracing, It allows for source freshness reporting

How can using dbt sources improve data analysis workflows? Select all answers that apply: A: It allows for easy dependency tracing B: It allows for source freshness reporting C: It allows for better data visualization D: It allows for easier data cleansing

B: Use the --select flag with the source_status selector Explanation: To reference the source freshness results in a subsequent dbt command, you can use the --select flag followed by the source_status selector (e.g., source_status:fresher+). This allows you to apply selectors based on the freshness of the data sources.

How can you reference the source freshness results in a subsequent dbt command? A: Use the --source-status flag B: Use the --select flag with the source_status selector C: Use the --refresh flag D: Use the --state flag with the DBT_ARTIFACT_STATE_PATH environment variable

C: You can specify the schema in the dbt_project.yml file using the schema configuration block, or in the model file using a config block.

How can you specify a schema other than the target schema in your profiles.yml for building models in dbt, and where can you specify this information? A: You can specify the schema in the dbt_project.yml file using the model-paths configuration. B: You can specify the schema in the model file using the ref function. C: You can specify the schema in the dbt_project.yml file using the schema configuration block, or in the model file using a config block. D: You can specify the schema in the model file using the schema keyword.

C: Cast the column to the correct type in your model using SQL syntax.

How can you specify column types in dbt? A: Use the type configuration block in the dbt_project.yml file to specify the column types for each model. B: Use the dtype function to specify column types in the SQL select statement. C: Cast the column to the correct type in your model using SQL syntax. D: Use the column_type configuration block in the model file to specify the column types for each column.

A: By providing a dedicated page in the auto-generated documentation site with context relevant to the outputs. Explanation: Exposures in dbt help data consumers understand and use the outputs of a project by providing a dedicated page in the auto-generated documentation site with context relevant to the outputs. This allows data consumers to easily access information about the outputs and how they can be used in downstream applications, dashboards, or data science pipelines.

How do exposures help data consumers understand and use the outputs of a dbt project? A: By providing a dedicated page in the auto-generated documentation site with context relevant to the outputs. B: By creating a separate schema for each user to maintain separate development and production environments. C: By enabling you to test and run resources that feed into the exposure. D: By defining and describing the upstream use of the project.

C: The 'alias' configuration

Jack is working on a dbt project and wants to change the default identifier for a relation that is created when a model is executed. What model configuration option should he use to achieve this? A: The 'name' configuration B: The 'identifier' configuration C: The 'alias' configuration D: The 'relation_name' configuration

C: Check strategy Explanation: Use the check strategy for tables that do not have a reliable updated_at column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.

Maria is working on a dbt project and needs to create snapshots for a table that does not have a reliable updated_at column. Which snapshot strategy should she use to compare a list of columns between their current and historical values? A: Timestamp strategy B: Merge strategy C: Check strategy D: Incremental strategy

D: Delete the target/partial_parse.msgpack file in the project. Explanation: In dbt Cloud, partial parsing of a project can lead to issues. If you find that your dbt project is not compiling to the values you have set, deleting the target/partial_parse.msgpack file in your project can help. Doing so will force dbt to recompile your entire project and may help resolve any issues caused by partial parsing.

Sara is a data analyst working with a dbt Cloud project. She has noticed that her dbt project is not compiling to the values she has set. What can she do to resolve this issue? A: Change the variable values to their default settings. B: Rename the environment variables in the dbt Cloud UI. C: Delete the dbt Cloud project and create a new one. D: Delete the target/partial_parse.msgpack file in the project.

C: DBT_ENV_SECRET

Sarah is a data analyst who works with a dbt Cloud project. She needs to define an environment variable to store the database password. Which prefix should she use for the variable name to define it as an environment secret variable in dbt Cloud? A: DBT B: DBT_ENV C: DBT_ENV_SECRET D: DBT_SECRET

C: Table and Incremental Explanation: Python models in dbt cannot be materialized as view or ephemeral. They can only be materialized as table (the default materialization) or incremental.

Sarah is working on a dbt project that includes Python models. Which materialization options are available for these Python models? A: View and Ephemeral B: Table and Ephemeral C: Table and Incremental D: View and Incremental

dbt run --select 1+my_model

Select "my_model" and its first-degree parents

A & C: It allows for easy dependency tracing, It allows for version control of data sources

Select the best reason for using dbt sources rather than referencing raw database tables choose all answers that apply: A: It allows for easy dependency tracing B: It improves performance of SQL queries C: It allows for version control of data sources D: It improves data visualization capabilities

B: dbt run --select tag:customer_analytics+ --exclude customer_lifetime_value

The marketing team later requests that you exclude a specific model, customer_lifetime_value, while still executing the other customer_analytics tagged models and their dependencies. How would you modify your dbt command to accommodate this request? A: dbt run --select tag:customer_analytics+ --select -customer_lifetime_value B: dbt run --select tag:customer_analytics+ --exclude customer_lifetime_value C: dbt run --select tag:customer_analytics+ --select -+customer_lifetime_value D: dbt run --select tag:customer_analytics+ -customer_lifetime_value

C: It can only accept basic data types such as strings, booleans, and numbers.

What are the limitations of the dbt.config() method in terms of the types of arguments that can be passed to it? A: It can accept any type of data structure or function. B: It can only accept strings and numbers. C: It can only accept basic data types such as strings, booleans, and numbers. D: It can accept any data type as long as it is defined in a YAML file.

B: dbt init dbt_project_name

What command should you use to initiate a new dbt project? A: dbt new dbt_project_name B: dbt init dbt_project_name C: dbt create dbt_project_name D: dbt start dbt_project_name

A: dbt returns the error generated by the database or data warehouse, and downstream models are skipped.

What happens when dbt encounters an error in the SQL code of a model during execution? A: dbt returns the error generated by the database or data warehouse, and downstream models are skipped. B: dbt continues running the remaining models and ignores the model with the error. C: dbt automatically fixes the error and reruns the model. D: dbt stops execution immediately and does not provide any error message.

A: Using common table expressions (CTEs)

What is a best practice in SQL for separating logic that cleans up data from logic that transforms data? A: Using common table expressions (CTEs) B: Using the ref function to build models on top of other models C: Keeping all logic in a single query D: Using the dbt tool to separate logic out into separate models

A: --target

What is the command used to specify a target other than the default when using dbt? A: --target B: --prod C: --dev D: --target_schema

B: To evaluate the freshness of data sources and apply selectors based on the result Explanation: The source_status method is used to evaluate the freshness of data sources by comparing the current state with the previous state. It allows users to apply selectors based on the freshness of the sources, which can be useful in ensuring that the data used in the transformation is up-to-date.

What is the main purpose of the source_status method in dbt? A: To calculate the total execution time of dbt commands B: To evaluate the freshness of data sources and apply selectors based on the result C: To optimize query performance D: To manage user permissions and access control

A: A dbt_project.yml project configuration file Explanation: The minimum requirement for a dbt project is the presence of a dbt_project.yml project configuration file. This file provides the necessary configuration for dbt to run commands and connect to the appropriate database. Options B, C, and D are not required for a dbt project to function.

What is the minimum requirement for a dbt project? A: A dbt_project.yml project configuration file B: A models directory with transformation logic C: A connection to a database D: A set of predefined macros

D: In the dbt_project.yml file or on the command line Explanation: Variables can be defined in the dbt_project.yml file or on the command line. When defined in the dbt_project.yml file, variables can be set globally for the entire project, and can be referenced in any model, analysis, test, or hook. When defined on the command line, variables can be set on a per-run basis and can override variables defined in the dbt_project.yml file. This can be useful for setting variables that are specific to a particular run or environment.

Where can variables be defined in a dbt project? A: In the data directory B: In the models directory C: In the macros directory D: In the dbt_project.yml file or on the command line

dbt run --select model_d+

Which command will build only the models that were not built on the recent run? A: dbt run --select model_b+ B: dbt run --select model_c+ C: dbt run --select model_d D: dbt run --select model_d+

Macros

Which feature of Jinja allows you to reuse code across multiple dbt models?

E: The doc function can be used to reference the name of a doc macro to be used in a description

Which is true about the doc function in dbt? A: The doc function can be used to reference the name of markdown file to be used in a description B: You can define a doc block in a SQL file and reference it in a description C: The doc function is used to create dependencies between models D: You can define a doc block within a YAML file and reference it in a description E: The doc function can be used to reference the name of a doc macro to be used in a description

B: dbt.config()

Which method is used to set the configurations within a Python model in dbt? A: dbt.setup() B: dbt.config() C: dbt.define() D: dbt.run()

C: By building up reusable, modular data models

Which of the following best describes how dbt optimizes your workflow? A: By creating a new database with each dbt run B: By processing raw data in real-time C: By building up reusable, modular data models D: By storing data in a more efficient format like avro and parquet

C: dbt source freshness Explanation: The dbt source freshness command produces a sources.json artifact that contains information about the execution times and max_loaded_at dates for dbt sources. This artifact can be used to reference the source freshness results in subsequent dbt invocations.

Which of the following dbt commands produces a sources.json artifact that can be referenced in subsequent dbt invocations? A: dbt source new_source B: dbt test freshness C: dbt source freshness D: dbt run freshness

D: Configuring time zones or avoiding hardcoded table names

Which of the following is a valid use case for using variables in dbt models? A: Generating reports and visualizations from SQL queries B: Storing configuration data for use in dbt projects C: Defining data transformations in a graphical user interface D: Configuring time zones or avoiding hardcoded table names

A, B, C, D, E, F: All are true

Which of the following statements about dbt are true? Choose all that apply: A: dbt helps you get more work done B: dbt produces higher quality results C: dbt allows you to modularize and centralize your analytics code D: dbt provides guardrails typically found in software engineering workflows E: dbt allows you to collaborate, version, test, and document your queries before deploying them to production F: dbt provides monitoring and visibility

B: Data models are saved as sql files.

Which of the following statements are true about developing in dbt? Options: A: Data models are saved as tf(transformation) files. B: Data models are saved as sql files. C: Data models are saved as yaml files. D: Data models are saved as json files.

D: All of the above

Which of the following statements are true about the ref function in dbt? A: The ref function allows you to build models on top of other models B: The ref function allows you to modularize your models C: The ref function allows you to make your models reuseable D: All of the above

B: The Python model is stored in a .py file in the models/ folder.

Which of the following statements is true about the Python model in dbt? A: The Python model is stored in a database table. B: The Python model is stored in a .py file in the models/ folder. C: The Python model is a class compiled by dbt Core. D: The Python model defines a function named dbt().

C: Files containing business-specific logic, such as a list of country codes or employee IDs Explanation: According to dbt documentation, seeds are version controlled and maintainable, and are best suited to files containing business-specific logic, such as a list of country codes or employee IDs. Seeds should not be used to load raw data, as this should be loaded using an ETL tool or SQL script, and large CSVs are not performant when loaded using dbt's seed functionality. Similarly, files containing sensitive information should not be stored in seeds, but instead should be secured and protected using appropriate security measures.

Which of the following types of files are best suited for loading using dbt's seed functionality, according to dbt documentation? A: Large CSV exports from a production database B: Raw data files in JSON or XML format C: Files containing business-specific logic, such as a list of country codes or employee IDs D: Files containing sensitive information, such as PII or passwords

B: Use the dbt cloud console to create and run the job daily at 8 AM

You are a data analyst working at a company that utilizes dbt cloud. You need to schedule a job to run daily at 8 AM that will compile and test your company's data models. Which method should you use to accomplish this task? A: Use the dbt run command in dbt core and set the schedule according to your organization's best practices B: Use the dbt cloud console to create and run the job daily at 8 AM C: Use the dbt test command in dbt core and set the schedule according to your organization's best practices D: Use the dbt compile command in dbt core and set the schedule according to your organization's best practices

B: Add an alias configuration to the model's configuration with a custom value Explanation: To provide a more descriptive name for the relation in the database, you can use the alias model configuration. This allows you to override the default relation name, which is based on the model's filename.

You are a data engineer working on a dbt project. You are in charge of creating models that will be used by different teams in your organization. You have created a model named "customer_orders.sql" and you want to provide a more descriptive name for the relation in the database. Which of the following is a valid way to do this? A: Change the filename of the model to a more descriptive name B: Add an alias configuration to the model's configuration with a custom value C: Use the generate_schema_name macro to set a custom name for the relation D: Use the rename_relation function in the model code to change the name of the relation

C: Table and incremental Explanation: In dbt, Python models have two materialization options: table and incremental. Incremental models in Python support the same incremental strategies as SQL models, but the specific strategies depend on the database adapter being used. It is not possible to use view or ephemeral materialization for Python models, nor can Python be used for non-model resources like tests and snapshots. Therefore, for a Python model that requires incremental updates, you can use either the table or incremental materialization. The incoming data must be filtered to only include new rows. The insert_overwrite strategy for incremental models is not yet supported for BigQuery/Dataproc, but the merge incremental strategy is supported.

You are a data platform engineer working with a Python model that requires incremental updates. What materialization options do you have for Python models in dbt? A: View and incremental B: Table and view C: Table and incremental D: Ephemeral and table

D: Ephemeral Explanation: The ephemeral materialization in dbt allows you to write reusable logic without directly building the model into the database. Instead, the code from this model will be used as a common table expression in dependent models.This materialization option is best suited for very lightweight transformations that don't need to be directly queried, and are used only in one or two downstream models. If you want to write a model in dbt that doesn't directly build into the database, you should use the ephemeral materialization.

You are a data scientist working on a data pipeline and you want to write a model in dbt that doesn't directly build into the database. Which materialization option should you use for this purpose? A: Table B: View C: Incremental D: Ephemeral

C: Singular test Explanation: Singular tests in dbt are a way to make one-off assertions about specific resources in your dbt project such as models, sources, seeds, and snapshots. These tests are defined in .sql files located in the tests directory specified in the test-paths configuration. To make a one-off assertion about a specific resource in your project, you should create a singular test. Singular tests are easy to create and can help ensure the integrity of your dbt project.

You are a dbt dev working on a dbt project. You want to make a one-off assertion about a specific resource in your project. Which type of test should you create? A: Generic test B: Built-in test C: Singular test D: Parameterized test

D: Ephemeral The ephemeral materialization in dbt allows you to write reusable logic without directly building the model into the database. Instead, the code from this model will be used as a common table expression in dependent models. This type of materialization has advantages such as keeping the data warehouse clean by reducing clutter, and allowing for lighter weight transformations.Therefore, for a lightweight transformation that is only used in one downstream model and doesn't need to be directly queried, you should use the ephemeral materialization. However, it also has some limitations, such as not being able to select directly from the model and some operations not being able to reference it.

You are a dbt developer working on a lightweight transformation that is only used in one downstream model and don't need to be directly queried. Which materialization should you use for this model? A: View B: Table C: Incremental D: Ephemeral

D: Confirm the code's functionality and prevent code regressions.

You are an Analytics engineer working on a dbt project. Which of the following is a benefit of defining tests in dbt? A: Speed up the development process. B: Allow for more flexibility in the code. C: Improve the user interface of the project. D: Confirm the code's functionality and prevent code regressions.

A: dbt run --select path:models/daily_reports --exclude tag:nightly --exclude config.materialized:table

You have been asked to run a specific set of models related to the daily financial reports, but you need to exclude models tagged as "nightly" and those with a materialized configuration set to "table."What command would you use to run the models contained in the "daily_reports" directory, excluding those with the "nightly" tag and materialized as "table" ? A: dbt run --select path:models/daily_reports --exclude tag:nightly --exclude config.materialized:table B: dbt run --select path:models/daily_reports --select -tag:nightly,config.materialized:table C: dbt run --select path:models/daily_reports,tag:nightly,config.materialized:table D: dbt run --select path:models/daily_reports --select -tag:nightly --select -config.materialized:table

Exposures make it possible to define and describe a downstream use of your dbt project, such as in a dashboard, application, or data science pipeline. By defining exposures, you can then: - run, test, and list resources that feed into your exposure - populate a dedicated page in the auto-generated documentation site with context relevant to data consumers

dbt Exposures

D: All of the above

In order for the migration of legacy SQL code to dbt to work without any problems, which of the following needs to be taken into consideration? A: The legacy SQL dialect B: Stored procedures or functions C: The data warehouse D: All of the above

B: The "incremental_predicates" feature in dbt optimizes the incremental build process for large data volumes by using specified SQL expressions to determine changed data and update only those records, resulting in improved performance and reduced processing time for data transformations. Explanation: The "incremental_predicates" feature in dbt is used to improve performance when working with large data volumes. It allows for a list of SQL expressions to be specified that will be used to optimize the incremental build process. By specifying these expressions, dbt can better determine which data has changed and only update those records instead of reprocessing the entire dataset. This can significantly improve performance and reduce the time it takes to run data transformations on large datasets.

Imagine you are a data engineer working for a company that deals with massive amounts of data. You have been tasked with optimizing the data transformation process to reduce the time it takes to process the data.You have been using dbt as your primary tool for data transformation, but you have noticed that the process is taking too long when working with large data volumes. You have heard about a feature in dbt called "incremental_predicates" that can help improve performance in such scenarios. How does this dbt feature does this? A: The "incremental_predicates" feature in dbt optimizes the incremental build process for large data volumes by using Tree based algorithms to determine changed data and update only those records, resulting in improved performance and reduced processing time for data transformations. B: The "incremental_predicates" feature in dbt optimizes the incremental build process for large data volumes by using specified SQL expressions to determine changed data and update only those records, resulting in improved performance and reduced processing time for data transformations. C: The "incremental_predicates" feature in dbt optimizes the incremental build process for large data volumes by using specified Jinja control structures to determine changed data and update only those records, resulting in improved performance and reduced processing time for data transformations. D: dbt has no incremental_predicates feature, dbt uses pre-hook and post-hooks to determine changed data and update only those records, resulting in improved performance and reduced processing time for data transformations.

D: The output of each dataframe operation not being immediately calculated, but only computed when explicitly asked for. Explanation: In a dbt Python model, the concept of "lazy evaluation" refers to the output of each dataframe operation not being immediately calculated. Instead, the operations are only computed when you explicitly ask for the final result of the data. In development, you can preview the data using methods like .show() or .head(). When you run a Python model, the full result of the final DataFrame will be saved as a table in your data warehouse.

In a dbt Python model, what is the concept of "lazy evaluation"? A: The ability to preview data using methods like .show() or .head() in development. B: The ability to execute Python code remotely on a data platform. C: The ability to create a series of meaningful transformations using CTEs. D: The output of each dataframe operation not being immediately calculated, but only computed when explicitly asked for.

B: Use the DBT_PROFILES_DIR environment variable to change the default location of the profiles.yml file. Explanation: There are multiple ways to direct dbt to a different location for your profiles.yml file, but the most appropriate method is to use the DBT_PROFILES_DIR environment variable to change the default location. Specifying this environment variable overrides the directory that dbt looks for your profiles.yml file in. You can specify this by running: export DBT_PROFILES_DIR=path/to/directory.

In a dbt project, how can you change the default location of the profiles.yml file to a different directory? A: Modify the dbt_project.yml file to include a reference to the new directory. B: Use the DBT_PROFILES_DIR environment variable to change the default location of the profiles.yml file. C: Include a --profiles-dir option in the dbt_project.yml file. D: Move the profiles.yml file to the new directory and update the file path in each model's configuration.

A: source_status:fresher Explanation: To build a model using data sources that are no older than their previous state, you should use the 'source_status:fresher' selector in your dbt command. This selector will ensure that the transformation only uses data sources that are either fresher or equal to their previous state.

In a dbt project, you need to build a model using data sources that are no older than their previous state. Which selector should you use to achieve this? A: source_status:fresher B: source_status:fresher+ C: source_status:equal D: source_status:stale

A: source_status:equal Explanation: To select data sources that are at least as fresh as their previous state but not fresher, you should use the 'source_status:equal' selector in your dbt command. This selector will ensure that the transformation only uses data sources that have the same freshness as their previous state.

In a dbt project, you want to select data sources that are as fresh as their previous state, but not fresher. Which selector should you use in your dbt command? A: source_status:equal B: source_status:fresher C: source_status:stale D: source_status:fresher+

B: custom_calendar_dimension_list Explanation: To group your metrics by a custom dimension like "is_weekend" in dbt metrics, you need to set the variable "custom_calendar_dimension_list" in the "dbt_project.yml" file. This variable allows you to define custom calendar dimensions that can be used to group your metrics for reporting and analysis.

In dbt metrics, if you want to group your metrics by a custom dimension like "is_weekend", what variable must be set in the "dbt_project.yml" file? A: is_weekend_dimension_list B: custom_calendar_dimension_list C: dimension_group_list D: metric_group_list

C: In dbt_project.yml, in a dedicated .yml file within the models/ directory, and within the model's .py file using the dbt.config() method. Explanation: The three ways to configure dbt Python models in dbt are: 1) in dbt_project.yml, where you can configure many models at once, 2) in a dedicated .yml file, within the models/ directory, and 3) within the model's .py file, using the dbt.config() method. Therefore, option C is correct, while the other options are incorrect.

In dbt, what are the three ways to configure dbt Python models? A: In a SQL file, in a .py file, and in the dbt_project.yml file. B: In a dedicated .yml file, within the models/ directory, and in a SQL file. C: In dbt_project.yml, in a dedicated .yml file within the models/ directory, and within the model's .py file using the dbt.config() method. D: In a dedicated .py file, within the models/ directory, and within the model's .yml file using the dbt.config() method.

A: Use the "dev" target as the default Explanation:Dbt Supports Multiple Targets Within A Profile, Which Encourages The Use Of Separate Development And Production Environments. When Developing In Dbt Locally It Is Always Good Practice To Use The Dev Target Usually Set As The Default. A Separate "Prod" Target Can Also Be Created For Production Environments. Users Can Also Use The "--Target" Option When Issuing A Dbt Command To Use A Target Other Than The Default

In dbt, what is the recommended practice when developing locally? A: Use the "dev" target as the default B: Use the "prod" target as the default C: Use the "--target" option when issuing a DBT command D: Use the "default" target when issuing a DBT command

C: dbt.config()

In dbt, which method is used to configure a Python model within the model's .py file? A: dbt.configure() B: dbt.setup() C: dbt.config() D: dbt.define()

C: dbt.ref()

In dbt, which of the following methods returns a DataFrame pointing to a upstream model? A: dbt.run() B: dbt.source() C: dbt.ref() D: dbt.snapshot()

A: Once a source has been defined in the schema.yml file, you can reference it in a model using the {{ source() }} function. This function allows you to select data from the source in the same way you would select data from a table in an SQL query.

One of your team members is new to using dbt and is unsure how to reference a source in a model. How would you explain this process to them? A: Once a source has been defined in the schema.yml file, you can reference it in a model using the {{ source() }} function. This function allows you to select data from the source in the same way you would select data from a table in an SQL query. B: Once a source has been defined in the schema.yml file, you can reference it in a model using the {{ model() }} function. This function allows you to select data from the source in the same way you would select data from a table in an SQL query. C: Once a source has been defined in the schema.yml file, you can reference it in a model using the {{ db() }} function. This function allows you to select data from the source in the same way you would select data from a table in an SQL query. D: Once a source has been defined in the schema.yml file, you can reference it in a model using the {{ join() }} function. This function allows you to combine data from different sources in the same way you would combine data from different tables in an SQL query.

C: Variables defined with --vars, package-scoped variable declaration in dbt_project.yml file, global variable declaration in the dbt_project.yml file, and the variable's default argument. Explanation: In dbt, the order of precedence for variable declaration is: variables defined with --vars command line argument, package-scoped variable declaration in dbt_project.yml file, global variable declaration in the dbt_project.yml file, and the variable's default argument. If dbt is unable to find a definition for a variable, a compilation error is raised.

What is the order of precedence for variable declaration in dbt? A: Variables defined with --vars, global variable declaration in the dbt_project.yml file, package-scoped variable declaration in dbt_project.yml file, and the variable's default argument. B: Package-scoped variable declaration in dbt_project.yml file, variables defined with --vars, global variable declaration in the dbt_project.yml file, and the variable's default argument. C: Variables defined with --vars, package-scoped variable declaration in dbt_project.yml file, global variable declaration in the dbt_project.yml file, and the variable's default argument. D: Global variable declaration in the dbt_project.yml file, variables defined with --vars, package-scoped variable declaration in dbt_project.yml file, and the variable's default argument.

C: To evaluate the freshness of the data sources

What is the primary function of the dbt source freshness command? A: To generate a list of all available data sources B: To update the data sources to their latest versions C: To evaluate the freshness of the data sources D: To create a new source for the dbt project

B: To override variables for a run of dbt

What is the purpose of the --vars command line option in dbt? A: To define variables in a dbt project B: To override variables for a run of dbt C: To configure timezones and avoid hardcoded table names D: To generate reports and visualizations from SQL queries

C: To organize analytical SQL queries in a dbt project. Explanation: These queries are not run by dbt when you initiate the dbt run command, but can be compiled using the dbt compile command. This feature provides a way to version control your non-model SQL queries and keep them organized alongside your other dbt project components. Option A is incorrect because SQL queries for use in dbt models are stored in the models folder. Option B is incorrect because SQL queries for use in dbt tests are stored in the tests folder. Option D is incorrect because SQL queries for use in dbt seeds are stored in the seeds folder.

What is the purpose of the dbt analysis folder? A: To store SQL queries for use in dbt models. B: To store SQL queries for use in dbt tests. C: To organize analytical SQL queries in a dbt project. D: To store SQL queries for use in dbt seeds.

C: Jinja, the templating language used by dbt SQL models, provides access to the project's context, which is not available to Python models. Explanation: The reason why dbt Python models have limited access to the context of the project is that they do not use Jinja, a templating language that is used by dbt SQL models to render compiled code and provides access to the project's context. Instead, the context is made available from the dbt class and passed in as an argument to the model() function.

What is the reason why dbt Python models have limited access to the context of the project? A: Python models do not support passing context as an argument to the model() function. B: Python models use a different templating language that does not have access to the project's context. C: Jinja, the templating language used by dbt SQL models, provides access to the project's context, which is not available to Python models. D: The context is not relevant or necessary for Python models to perform their functions.

D: dbt will start building only one model, and finish it, before moving onto the next model, but the run time of the project will increase. When you specify threads: 1 in your dbt project configuration, it means that you are restricting the number of threads that can be used to execute your dbt commands to 1. This implies that dbt will run only one operation at a time, i.e., build one model at a time, and wait for the completion of the current operation before proceeding to the next one. dbt will start building only one model, and finish it, before moving onto the next model, but the run time of the project will increase.

What will happen if you specify threads: 1 when running dbt? A: dbt will start building only one model, and finish it, before moving onto the next model. B: dbt will start building all models at once, without finishing any of them. C: dbt will start building all models in a sequential order D: dbt will start building only one model, and finish it, before moving onto the next model, but the run time of the project will increase.

B: Running data integrity tests against individual models In dbt, when using sources it is best advised to perform and running data intergrity tests against the sources rather than against individual models

When implementing data integrity checks in dbt, which of the following methods should be avoided? A: Running data integrity tests against the data warehouse B: Running data integrity tests against individual models C: Automating data integrity checks D: Performing manual checks on the data E: None of the above

C: It executes them and returns the number of records that failed the test.

When using dbt to test a model, what does dbt do with the queries it constructs for each test? A: It saves them to the YAML file for future reference. B: It discards them after the test is run. C: It executes them and returns the number of records that failed the test. D: It compiles them into a single executable query.

A&B: Use the --select flag followed by the name of the model, Use the -s flag followed by the name of the model dbt run --select Model_Name dbt run -s Model_Name

When working with dbt, you want to run a specific model. How can you accomplish this? A: Use the --select flag followed by the name of the model B: Use the -s flag followed by the name of the model C: Use the --run flag followed by the name of the model D: Use the -r flag followed by the name of the model

Development Environment

When you've created a pull request and your teammate is reviewing your commits, is this situation in the development environment or the deployment environment?

B: Use the schema property to define the actual names as per the database, and use your name: property for the name that makes sense and is easily readable

You are reviewing a dbt project that has a source schema defined in the schema.yml file. The source schema has a name in the database that looks like a machine generated name which is not easily readable, you want to use a more sensible name in dbt. How can you implement this and still ensure dbt compiles using the actual source schema name? A: Use DDL and DML statements to change the schema name to one that can be easily read and makes sense B: Use the schema property to define the actual names as per the database, and use your name: property for the name that makes sense and is easily readable C: Use the 'database' property to define the actual names as per the database, and use your name: property for the name that makes sense and is easily readable D: Use the 'table' property to define the actual names as per the database, and use your name: property for the name that makes sense and is easily readable

A: Use the debug command within the dbt project Simply run dbt debug from within dbt project to test your connection.

You are running dbt and want to check the connection to your warehouse. What should you do? A: Use the debug command within the dbt project B: Use the test command within the dbt project C: Use the run command within the dbt project D: Use the check command within the dbt project

B: source_status:fresher Explanation: To exclude data sources that have become stale compared to their previous state, you should use the 'source_status:fresher' selector in your dbt command. This selector will ensure that the transformation only uses data sources that are either fresher or equal to their previous state, effectively excluding the stale ones.

You are working on a dbt project and need to exclude data sources that have become stale compared to their previous state. Which selector should you use in your dbt command? A: source_status:exclude_stale B: source_status:fresher C: source_status:stale D: source_status:fresher+

B: source_status:fresher+ Explanation: To select only the data sources with freshness greater than the previous state, you should use the 'source_status:fresher+' selector in your dbt command. This selector will ensure that the transformation only uses data sources that are fresher than their previous state.

You are working on a dbt project and want to ensure that only the data sources with freshness greater than the previous state are used for the transformation. Which selector should you use in your dbt command? A: source_status:fresher B: source_status:fresher+ C: source_status:stale D: source_status:stale+

A Selector A uses the 'intersection' clause to select models that meet both conditions: being part of the 'core' package and having a 'marketing' tag. This fulfills the requirement of selecting models with a tag of "marketing" and part of the "core" package.

You are working on a dbt project and want to select models that have a tag of "marketing" and are part of the "core" package. Which of the following YAML selectors will achieve this?

B & D: There is an ambiguous model name in the schema and/ or The target schema is not defined in the project configuration file Explanation: One possible reason for a model not being created in the database is that there is an ambiguous model name in the schema. This can happen if the same model name is used for two or more models in different custom schemas. To resolve this issue, you can change the schema or alias configurations of the models to avoid ambiguity. if the target schema is not defined in the project configuration file, dbt might not know where to create the model in the database. This could lead to the model not being created as expected.

You are working on a dbt project where you have defined custom schemas for some models. During a dbt run, you notice that one of the models is not being created in the database. What is a possible reason for this issue? A: The generate_alias_name macro has been overridden B: There is an ambiguous model name in the schema C: The model does not have an alias defined D: The target schema is not defined in the project configuration file

B: The specified package versions are not installed on the system. Explanation: If some models are still using the wrong package versions, it is possible that the specified package versions are not installed on the system. It is important to ensure that the specified packages and their versions are installed and accessible to the Python runtime on your data platform.

You are working on a dbt python project that involves data processing and transformation. You have specified specific package versions in the dbt_project.yml file under the "packages" configuration block. However, when you run dbt, you notice that some models are still using the wrong package versions. Which of the following could be the reason for this? A: The specified package versions are not compatible with the data platform being used. B: The specified package versions are not installed on the system. C: The package versions are not being read from the dbt_project.yml file. D: The models are not referencing the correct package names.

B: Model aliasing can be used to override the default relation identifier for a model.

You are working on a large dbt project with multiple models. You have been asked to improve the readability of the database schema by using model aliasing. Which of the following statements about model aliasing is true? A: Model aliasing cannot be used in combination with custom schemas. B: Model aliasing can be used to override the default relation identifier for a model. C: Model aliasing is only useful for small projects with a few models. D: Model aliasing cannot be used in a dbt project that uses package dependencies.

C: A full-refresh of both the incremental model and any related models has not been executed. Explanation: When using an incremental model in dbt with the "on_schema_change: ignore" setting, any changes made to the columns in the incremental model will not be reflected in the target table until a full-refresh of both the incremental model and any related models is executed. This means that if you add a column to the incremental model, it will not appear in the target table until a full-refresh is executed. Therefore, the most likely cause of this issue is that a full-refresh of both the incremental model and any related models has not been executed.

You have an incremental model in dbt with the default setting "on_schema_change: ignore", but when running the dbt command, the new column you added is not appearing in the target table. What could be the cause of this issue? A: The database does not support adding new columns in this way. B: The on_schema_change parameter is not set correctly in the configuration. C: A full-refresh of both the incremental model and any related models has not been executed. D: There is an issue with the SQL used in the model.

C: The table being used to calculate the metric does not contain accurate data. Explanation: When a metric is created in dbt, it takes all the information in a table and simplifies it into a single number that gives you a quick understanding of the data. If the resulting number does not match the expected value, the most likely reason is that the table being used to calculate the metric does not contain accurate data. This can be due to incorrect data being loaded into the table, data being filtered incorrectly, or other issues with the underlying data.

You have created a metric in your dbt project that uses one dimension - time. However, when you run the metric, the resulting number does not match the expected value. What is the most likely reason for this issue? A: The metric is not defined correctly in the dbt project. B: The time dimension is not being properly considered in the metric calculation. C: The table being used to calculate the metric does not contain accurate data. D: The metric is using too many dimensions to accurately represent the data.

B: The dimensions being used in the metric calculation are not relevant to the data being analyzed. Explanation: When a metric is created in dbt that uses multiple dimensions, it takes all the information in the table and simplifies it into a single number that gives you a quick understanding of the data.

You have created a metric that uses multiple dimensions, but when you run the metric, the resulting number is not providing useful information. What is the most likely reason for this issue? A: The metric is not defined correctly in the dbt project. B: The dimensions being used in the metric calculation are not relevant to the data being analyzed. C: The table being used to calculate the metric does not contain enough data to make the calculation meaningful. D: The metric is being calculated using an incorrect formula.

A: The exposure has not been defined correctly in the dbt schema.yml file.

You have defined an exposure called "my_exposure" in your dbt project, but when you try to run the dbt run command with the -s flag and the exposure name, you receive an error message that says "Could not find exposure 'my_exposure'". What is the most likely reason for this error? A: The exposure has not been defined correctly in the dbt schema.yml file. B: The exposure has not been added to the dbt_project.yml file. C: The exposure has not been added to the auto-generated documentation site. D: The exposure has not been tested and run in the dbt project.

C: dbt run -s stg_covid_cases

You have this DAG showing the 3 models, how do you ensure you run just the stg_covid_cases model? A: dbt run --s stg_covid_cases B: dbt run --select model=+stg_covid_cases+ C: dbt run -s stg_covid_cases D: dbt run -select stg_covid_cases

C: Update the ref functions to remove the cycle. Explanation: We observe cyclic errors in dbt when there is a circular dependency between two or more models, causing an infinite loop during the build process. This happens when a model depends on another model, which in turn depends on the first model, creating a cycle in the dependency graph. It can also occur with indirect dependencies involving more than two models. To fix this we need to update the ref function which builds dependencies between model and break the cycle.

You initiated a dbt run as seen in the diagram, How do you fix the cycle error encountered? A: Run the dbt self-heal command. B: Rerun the model using dbt run --full-refresh C: Update the ref functions to remove the cycle. D: Rerun the model using dbt run --vars '{"cyclic": "false"}'

A, B & C: Create a Schema YAML file for that specific model in the models directory, Add the test type you want into the YAML file, Run the dbt test command, and confirm that all your tests passed

You want to add tests to a model in a project. You want to ensure that your models are working correctly by testing individual columns in the models. What should you do? A: Create a Schema YAML file for that specific model in the models directory B: Add the test type you want into the YAML file. C: Run the dbt test command, and confirm that all your tests passed. D: Run the dbt test command, and confirm that all your tests failed.

B: snapshot

You want to capture the current state of a source table in your dbt project at a specific point in time. Which dbt config should you use? A: seed B: snapshot C: archive D: model

C: vars

You want to define a custom project variable in the dbt_project.yml file that can be used for data compilation. What configuration option should you use? A: macro-paths B: test-paths C: vars D: analysis-paths

B Selector B uses the 'intersection' clause to select models that are part of the 'sales' package and have a materialization setting of 'incremental'. Then, it uses the 'exclude' clause to remove models with a 'deprecated' tag from the selection. This meets the requirements of the question.

You want to select models in the "sales" package that have a materialization setting of "incremental" but exclude those tagged as "deprecated". Which of the following YAML selectors will achieve this?


Set pelajaran terkait

Interpersonal com: Test 2 -Kuntzman

View Set

Computer Forensic Methods 2 - Chapter 3 Review Questions

View Set

Mean, mode, median, range, frequency.

View Set