BIS 375 Exam

Ace your homework & exams now with Quizwiz!

Use the CO2 Emissions project to answer this question. In the univariate analysis on the merged and prepared dataset, the median meat supply in kg per capita is lowest for which year?

2009

Use the CO2 Emissions project to answer this question. How many records are in the original Urbanization_GDP_and_Population source dataset?

41,859

Use the CO2 Emissions project to answer this question. In the dataset that aggregates across the years 2008-2012, what is the median GDP per capita?

7,875.2

Consider the CO2 Emissions project. Which of the following visual recipes could you use to restrict the datasets in the project to the years 2008-2012? Select all that apply.

A sample/ filter recipe The pre - filter step of a Join recipe B

The Formula processor in the Prepare recipe can be used to create a new column from a complex expression. What language is used to write these expressions?

A spreadsheet-like formula language

DSS is built for collaboration across the data team. When returning to a project, which of the following tabs on the project homepage would be the best place to visit for a high-level overview of user contributions to a project?

Activity

In order to join more than two datasets with only visual recipes, which of the following solutions is correct and why?

Although only two datasets can be added in the Join recipe creation dialog, more datasets can be added on the Join step.

Use the CO2 Emissions project to answer this question.What country has the 5th highest CO2 emissions per capita for the year 2010?

Bahrain

Use the CO2 Emissions project to answer this question. Which country has the lowest average percentage of urbanization during the 2008-2012 time period? (only considering countries with a non-null/non-zero average % urbanization)

Burundi

Which of the following make up the schema of a DSS dataset? Select all that apply.

Column sort preference Column storage type Column name

Computation in Dataiku DSS can take four main forms. Which of the following computation forms can be either in memory or streamed? (Choose two.)

Container Execution through Docker and Kubernetes Clusters DSS Engine

Which of these is not one of the available card types in a statistics worksheet?

Covariance matrix

Which of the following will be returned from queries in the Global Search toolbar? Select all that apply

DSS items from projects on the instance to which the user has access DSS items from the currently open or most recently opened project

Which of the following will be returned from queries in the Global Search toolbar? Select all that apply.

DSS items from projects on the instance to which the user has access DSS items from the currently-open or most recently-opened project

Which of the following are true of Dataiku DSS as a server-based solution? (Choose three.)

Dataiku DSS can be installed on-premises or on cloud servers. Dataiku DSS can flexibly connect to many different infrastructure and storage systems. Dataiku DSS makes it possible for users to collaborate in real-time through a web browser.

The Lab is a place to experiment before committing an analysis to your data pipeline in the Flow. Which of the following are examples of how you can incorporate Lab work into the Flow? Select all that apply.

Deploy a chart created in a Visual Analysis as output. Deploy a trained model for scoring. Deploy a notebook as a code recipe.

When a user creates a data pipeline, they accomplish it in which node?

Design

A colleague expresses interest in a DSS project you're working on. Among the following options, how can you successfully share the project with them?

Ensure that both of you share membership in at least one common security group. Then they can find the project by name in the project list.

Consider the CO2 Emissions Project The raw data for this project was provided in annual (yearly) increments. Imagine however, the raw data was documented in smaller increments, such as by month, and it was your job to aggregate monthly data into yearly data. For example, imagine the CO2 and oil dataset is recorded by month. For each observation of a country, there is a column storing the month (such as January) and a column storing the year (such as 2012) What visual recipe(s) could be used to aggregate the data into yearly totals

Group

Dataiku DSS provides the ability to create different data visualizations using the drag and drop chart interface. Which of the following chart types is most suitable for observing the distribution of a column divided into equal-width bins?

Histogram

Use the Predict CO2 Emissions project to solve this problem. Many of the input datasets include data for all recorded years. However, datasets often originate as separate files and need to be combined. For example, instead of a single dataset, you are given Meat_and_Egg_Production_1980s, Meat_and_Egg_Production_1990s, etc., and want to combine the files into a single dataset, which visual recipe would be most suitable?

Join

You are working with a dataset of retails orders, and want to enrich that dataset with information about the customers who placed each order. Which of the following recipes can you use to accomplish this?

Join

Use the CO2 Emissions project to answer this question. Which of the original columns do you need to divide by country population in order to produce a per capita measurement? Select aOil production (Eternad & Luciana) (terawatt-hours) Food Balance Sheets: Eggs - Production (FAO (2017)) (tonnes)ll that apply.

Oil production (Eternad & Luciana) (terawatt-hours) Food Balance Sheets: Eggs - Production (FAO (2017)) (tonnes) meat_prod_tonnes

Your dataset contains new data that is added daily. Ideally, you want to build only the new data each time the recipes in the Flow run instead of building the whole dataset. One way to accomplish this would be to _____ the dataset.

Partition

What is the name of the Dataiku DSS feature that you use to organize items such as datasets, recipes, models, discussions, and dashboards?

Project

Which of the following cannot be changed once it is created in Dataiku DSS?

Project ID

The New York Taxi Fares web application allows users to enter pickup and dropoff locations, the time of day, and number of passengers. The app then queries a Dataiku API node to get a predicted fare amount. This is an example of what type of scoring approach?

Real-time scoring

The push-down computation of DSS helps to shift heavy lifting from the DSS server to external compute resources. These external compute resources include which of the following? (Choose three.)

Remote containers In Cluster In-Database

Which of the following are true of sampling in DSS? (Choose three.)

Sampling enables users to work interactively with huge datasets. When preparing your dataset or building a chart, DSS uses a sample of the dataset, by default. Sampling settings are configurable.

Consider a dataset with a column, product_id. Most of the values in the column contain only whole numbers, while some values contain both whole numbers and letters. Dataiku DSS has assigned the meaning of "integer" to the column which means the values that contain letters are considered invalid. Since you know the values to be valid, what is the best course of action?

Set the storage type of the column to String.

Which of the following are true of dashboard permissions?

Setting a dashboard to "public" makes it accessible from the Dataiku DSS homepage of users who have the correct permissions. A "private" dashboard is private to those who have the correct permissions.

You can perform univariate, bivariate, and principal components analysis in which tab of a dataset?

Statistics

Which of the following represent ways to find outliers in the values in the column of a dataset? (Choose three.)

Statistics tab Analyze window Schema

Consider the CO2 Emissions project. The Urbanization_GDP_and_Population dataset includes columns for population and per capita GDP. Which of the following recipe options would allow you to create a new column in the output dataset that represents the total GDP for each country (row)? Select all that apply.

The Formula processor in a Prepare recipe Code recipe

By default, when browsing a dataset in its Explore tab, what are you viewing?

The first 10,000 records

When in the Explore tab of a dataset with default settings, actions like sorting and filtering return fast results. What property of DSS makes this possible?

These actions are performed on only a sample of the dataset.

What are 'Formulas' in a Prepare recipe used for?

Write functions like in a spreadsheet


Related study sets

Combo with "Chapter 44: Liver, Pancreas, and Biliary Tract Problems" and 1 other

View Set

Preparation W09 Math 108 Permutations

View Set

Reading "Bearing Up" and "Why Do You Dream" Quiz Study Guide

View Set

Chapter 3:Controlling Chemical Reactions

View Set

Messina Final Exam (All Questions)

View Set

Nursing Concepts Chapter Questions

View Set

World History- Indian Ocean Trade Network

View Set