Tableau Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

Which of the following type of points is NOT used to categorize a chosen data point in DBSCAN clustering?

Centroid sample

Seasonal Autoregressive Integrated Moving Average (SARIMA) can accommodate which of the following models?

- AR Model - MA Model - ARMA Model - ARIMA Model - Seasonal Model

Which of the following ensemble models are built through sequentially coordinated sampling method of records?

- AdaBoosting - Gradient Boosting

Which of the following statements about Addressing and Partitioning is correct?

- Addressing is similar to direction in relative reference for table calculation - Removing addressing will break the Table Calculation - Rearranging addressing does not change computing

Which of the following statement about AdaBoosting classifier are correct?

- At each round for a base classifier, create a bootstrap sample based on the weights - At each round for a base classifier, records wrongly classified will have their weights increased

Which of the following ensemble models are independent of base classifiers (i.e., use any type of classifiers as base classifiers)?

- Bagging - AdaBoosting

Which of the following statements are correct regarding benefits of ensemble model?

- Better performance than single model - Improved robustness than single model

Which of the following statements about advantages of DBSCAN over K-Means is correct?

- Capture clusters of complex shapes based on density - Do not require the number of clusters a priori - Insensitive to the ordering of the points in data - Very useful as outliers detector

Direction (relative)

- Define how the Table Calculation moves within the scope - Refer to the order in which table cells are calculated - Across, Down, Across then Down, Down then Across

Scope (relative)

- Define the boundaries within which a given Table Calculation can reference other values - Determine when the table calculation will reset to a beginning value - Table, Pane, Cell

What determines the performance of table calculation

- Depends on the cache of whatever machine Tableau is running - Row- and aggregate-level calculations utilize enterprise-level hardware with a live connection

Seasonal Autoregressive Integrated Moving Average (SARIMA)

- Extended ARIMA model to account for seasonal components - Represented as SARIMA(p, d, q)(P, D, Q) m ["m" for seasonal period) - Suitable for time series with trend and/or seasonal components

Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)

- Extended SARIMA model to model exogenous variables (=covariates) - Parallel input sequences of observations at the same time steps - Suitable for time series with trend and/or seasonal components and exogenous variables

Data Densification

- Fill in sparse data with missing values - Place in the source with joins, unions, or queries - Place in Tableau after aggregate data is returned from the source

Which of the following layers does NOT include neurons?

- Hidden layer - Output layer - Input layer

Which of the following statements about ensemble models is correct?

- It aggregates the prediction of each base classifier - Its base classifiers needs to be independent for a better performing - It combines multiple diverse base classifiers to predict an outcome

How to promote the diversity of base classifiers?

- May use different training examples through sampling - May use many different modeling algorithms - Introduce randomness into learning procedures - May use different features through sampling

Autoregression (AR)

- Model the next step as a linear function of the observations at prior time steps - Represented as AR(p) - Suitable for time series without trend and seasonal variation

Autoregressive Integrated Moving Average (ARIMA)

- Model the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps - Combine AR, MA and Integration (I) - A differencing pre-processing step to make the sequence stationary - Represented as ARIMA(p, d, q) - Suitable for time series with trend and without seasonal components

Autoregressive Moving Average (ARMA)

- Model the next step in the sequence as a linear function of the observations and residual errors at prior time steps - Combine AR and MA models and represented as ARMA(p, q) - Suitable for time series without trend and seasonal variation

Moving Average (MA)

- Model the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps - Represented as MA(q) - Suitable for time series without trend and seasonal variation

Which of the following ensemble models are using different features to promote diversity of base classifiers?

- Random Forest - Gradient Boosting

EPS

- Specify how close points should be to be in the cluster - If the distance between points <= eps, they are neighbors

Min Points

- Specify the min number of points to form a dense region - If minPoints = 5, need at least 5 points to form a dense region

Relative Table Calculations

- The Table Calculation is computed relative to the layout of the table - Rearranging dimensions that changes the table will change the Table Calculation results - Use the same relative scope and direction, even if you rearrange the view

Fixed Table Calculations

- The Table Calculation is computed using one or more dimensions - Rearranging those dimensions will not change the computation of the Table Calculation - The scope and direction remain fixed to one or more dimensions, no matter where they are moved within the view

Which of the following statements about statistical forecasting for time series analysis is correct?

- The default method is Exponential Smoothing Model with Seasonality - In Exponential smoothing model, more recent values are given greater weight - Exponential smoothing model iteratively forecasts future values from weighted averages of past values

Which of the following statement about Bagging classifier are correct?

- Train a base classifier on each bootstrap sample from the train data - Combine trained multiple base classifiers for predictions on the test data - Improve performance in almost all cases

Which of the following clustering index refers to the outlier data points?

-1

While referring to the short expression for a calculated field, find all correct statements about Pearson Correlation Analysis? __________("import numpy as npreturn np.corrcoef(_arg1,_arg2)[0,1]",SUM([Sales]),SUM([Profit])) 1. The same output will be obtained for np.corrcoef(_arg1,_arg2)[0,1] and np.corrcoef(_arg2,_arg1)[0,1] 2. The return output type should be set to SCRIPT_REAL 3. The returned correlation value will be between 0 and 1

1 and 2

________________ algorithm, one of the most common algorithms for training neural networks, calculates the degree of error, if any, and adjusts the weights that are associated with the inputs for that neuron, working backward from output neurons to input neurons?

Backpropagation

In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don't communicate with each other while casting their votes with the same weight. Which of the following ensemble method works similar to above-discussed election procedure?

Bagging

Which of the following charts use Circle Mark to highlight values and Line Mark to measure the gap between two groups of data points?

Barbell Chart

Imagine that you create a chart to show the percentage of Sales for each Category out of the total Sales. After you filter out one Category, the percentage of Sales of two remaining Categories always add up to 100%. Why does this happen?

Because regular filters on the source are applied before Table Calculations

Which of the following charts is the best to show changes in Rank of a value over a time dimension and a place dimension?

Bump Chart

Decision tree algorithm is called a ______________ box algorithm because its learned tree structure can be represented as if-then rules to improve human readability and interpretation?

White

How to delete a column from a data frame

X=data.drop(['ID','Class'],axis=1)

How to train a classifier model

clf.fit(X,y)

How to read .csv into Python in tabpy

data=pd.read_csv(r'filepath.csv')

A field on a shelf in the view or a calculated field using Table Calculation functions will have a __________ symbol:

delta

How to predict the class label with the trained classifier

pred = clf.predict(X_pred)

Clustering algorithm prefers to have clustering outputs with data points in the same cluster are very _____________ ?

similar

What is the required parameter to K-Means clustering algorithm?

the number of clusters

How to set dependent variable (y) and which variables you should select

y=data['Class']

_______________ are data points that are closer to each other than eps parameter and hence are put into the same cluster

Core samples

Which of the following parameters reflects seasonality factor in Simple Exponential Smoothing (SES) model?

Gamma

_________________ ensemble model is based on Boosting, not Bagging, to add a tree classifier at a time, so that the next classifier is trained to improve the already trained ensemble.

Gradient Boosting

Which of the following charts is the best to show scientific research on medical or environmental studies due to its Line chart characterized by a sharp increase after a relatively flat period?

Hockey Stick Chart

How to compute the accuracy value

IF ATTR([Class]) = [DT Prediction] THEN "Correct" ELSE "Incorrect"END (I think??)

Which of the following clustering algorithms belongs to centroid clustering algorithm?

K-Means

Which of the following clustering algorithms is readily available in Tableau through Analytics Pane?

K-Means

An exploratory data analysis to group data points into clusters is _____________?

K-Means Clustering

Which of the following charts is the best to show the performance over KPI using/adding custom shapes?

KPI Chart

A _____________ is the bottom node, from which no further branches grow?

Leaf node

Generally, an ensemble method works better, if the individual base models have ____________?

Less correlation among predictions

The maximum number of clusters in Clustering analysis with N records and M attributes for each record is ________?

N

The _______________ algorithm is computational models inspired by the behavior of neurons and the electrical signals, processing, and output from the brain?

Neural Network

Which of the following statements is NOT applicable to Decision tree algorithm?

None of the above

Which of the following charts is the best to identify the key segments of your customer base (e.g., top 20% of customers) that are most important for your business success?

Pareto Chart

What kind of Table Calculation should be applied to measure values in Pareto Chart?

Percent of Total

Differences of table calculation from other calculations

Performed after the initial query

_________________ is a new API (Application Programming Interface) that enables evaluation of Python code within a Tableau workbook.

TabPy

________ calculations make it possible to compare and perform calculations on aggregate values across rows of the resulting table

Table

Neural networks model is called a black box algorithm because its final structure is hidden and hence it is very difficult for human to understand the patterns (True/False)?

True

If you want to compute the 3 month moving average of SUM(Quantity) in two previous months and the current month, which of the following expression should you use?

WINDOW_AVG(SUM([Quantity]), -2, 0)

Visualize and analyze the cumulative impact of sequential positive or negative drivers

Waterfall Chart

While referring to the short expression for a calculated field, find all correct statements about Linear Regression Analysis? ___________('import numpy as npfrom sklearn import linear_modelclf = linear_model.LinearRegression()x=np.transpose(np.array([_arg1]))y=np.array(_arg2)clf.fit(x,y)return clf.predict(x).tolist()',SUM([Profit]),SUM([Sales])) 1. The return output type should be set to SCRIPT_INT 2. The above calculated field will execute without errors 3. You need to change the last two lines of the code into SUM([Sales]), SUM([Profit])) to reflect causality between two measures

2 and 3

What are the required parameters to DBSCAN clustering algorithm? 1. the number of clusters 2. the number of features 3. how close points should be to be in the cluster 4. the number of points to form a cluster

3 and 4 only

Which of the following statements is NOT one of K-Means clustering process? 1. Initialize the centroids of clusters randomly along variables 2. Assign each data point to the nearest cluster 3. Recalculate the centroids of clusters 4. For a chosen data point, determine if it is a noise sample that is not a core point nor a border point

4

Which of the following statements about eps parameter are NOT correct? 1. eps parameter specifies how close points should be to be in the cluster 2. If the distance between points <= eps, they are considered neighbors 3. Too small value of eps will result in that no points are core samples 4. Too large value of eps will lead to all points being labeled as noise 5. minPoints parameter specifies the maximum number of points to form a dense region

4 and 5 only

________________ acts as a mathematical gate in between the input and its output and transform/normalize the output of each neuron in [0,1] or [-1,1]

Activation function

Which of the following statements about Time Series Analysis are correct? 1. Autocorrelation refers to the association of two observations at prior time points 2. Seasonal variation refers to the recurring patterns of observations over short time periods 3. Trend refers to the increasing or decreasing mean value of the series 4. Stationarity refers to constant mean (& variance) value of the series over a long time period

All of the above

Which of the following statements about table calculations are NOT correct? 1. Regular filters on the source are applied after Table Calculations 2. Fields in a Table Calculation are only as a row level 3. Its performance depends on enterprise-level hardware with a live connection 4. It is not appropriate for tables

All of the above

Which of the following clustering algorithms belongs to density clustering algorithm?

DBSCAN

The ____________________ algorithm builds a prediction model by creating a series of splits at nodes after evaluating input features in terms of how cleanly it divides the data across the class states of target variable?

Decision Tree

Which of the following algorithm is not an example of an ensemble method?

Decision Tree

Which of the following Table Calculations is NOT used in Sankey Diagram?

Difference

Which of the following charts is a bar chart with two marks: One for some dimension members pointing up or right and another for other dimension members pointing in the opposite direction (down or left, respectively)?

Diverging Bar Chart

Which of the following metrics is LEAST appropriate to evaluate the K-Means clustering output?

Prediction accuracy

Late Filtering

Problem: The percent of the total will always add up to 100%, the percentage of the filtered total Fix: use LOOKUP(ATTR[__________]), 0)

Which of the following charts is a Bar chart plotted in polar coordinates instead of a Cartesian plane?

Radial Bar Chart

1. Choose T, number of trees to grow 2. Choose m < M (M is the number of total features), number of features to calculate the best split at each node (typically 20%) For each tree 3. Choose a training set by choosing N times (N is the number of training examples) with replacement from the training set At each node, randomly choose m features and calculate the best split Fully grown and not pruned The above algorithm is the best description of the _______________ algorithm?

Random Forest

__________________ refers to the shaded areas behind the marks in the view and is usable for any continuous axis in the view?

Reference Band

Result in several gradient shaded bands at various intervals across the numeric axis. Distribution can be defined by percentages, percentiles, quantiles, or standard deviation

Reference Distribution

A _____________ is the topmost node in a tree structure in a decision tree algorithm?

Root node

What kind of Table Calculation should be applied to measure values in Hockey Stick Chart?

Running Total

Which of the following choice is the best for the underlined portion of the following calculated field? _______________(" MyList = [] for x in _arg1: MyList.append(x>0) return MyList ", SUM([Profit]) )

SCRIPT_BOOL

Which of the following output type is the most appropriate for a calculated field that returns the outcome of clustering algorithms?

SCRIPT_REAL

Which of the following charts is the best to show flows between entities and their quantities in proportion to one another and magnitudes of flows with the width of flow arrows?

Sankey Diagram

_________________ is a repeating, predictable variation in value over months, quarters, days, or hourly.

Seasonality

Which of the following statement is NOT one of training steps of neural network algorithms?

Select the best splitting attribute at the root node

Alpha (Base factor)

The higher, the more weights to the most recent observations

Gamma (Seasonality factor)

The higher, the more weights to the recent seasonal components

Beta (Trend factor)

The higher, the more weights to the recent trends

Compare train and test accuracy: Which is higher and which one should you use to report

Train is higher, test is what you should report


Conjuntos de estudio relacionados

Spanish 3 Chapter 2 Study Packet

View Set

Final Final Finals Criminal Procedure

View Set

Unit 3 Civil Rights 9.1, 9.2 and 9.3 Review Questions

View Set

Module 9 McGraw-Hill Connect Finance

View Set