Tableau Final
Which of the following type of points is NOT used to categorize a chosen data point in DBSCAN clustering?
Centroid sample
Seasonal Autoregressive Integrated Moving Average (SARIMA) can accommodate which of the following models?
- AR Model - MA Model - ARMA Model - ARIMA Model - Seasonal Model
Which of the following ensemble models are built through sequentially coordinated sampling method of records?
- AdaBoosting - Gradient Boosting
Which of the following statements about Addressing and Partitioning is correct?
- Addressing is similar to direction in relative reference for table calculation - Removing addressing will break the Table Calculation - Rearranging addressing does not change computing
Which of the following statement about AdaBoosting classifier are correct?
- At each round for a base classifier, create a bootstrap sample based on the weights - At each round for a base classifier, records wrongly classified will have their weights increased
Which of the following ensemble models are independent of base classifiers (i.e., use any type of classifiers as base classifiers)?
- Bagging - AdaBoosting
Which of the following statements are correct regarding benefits of ensemble model?
- Better performance than single model - Improved robustness than single model
Which of the following statements about advantages of DBSCAN over K-Means is correct?
- Capture clusters of complex shapes based on density - Do not require the number of clusters a priori - Insensitive to the ordering of the points in data - Very useful as outliers detector
Direction (relative)
- Define how the Table Calculation moves within the scope - Refer to the order in which table cells are calculated - Across, Down, Across then Down, Down then Across
Scope (relative)
- Define the boundaries within which a given Table Calculation can reference other values - Determine when the table calculation will reset to a beginning value - Table, Pane, Cell
What determines the performance of table calculation
- Depends on the cache of whatever machine Tableau is running - Row- and aggregate-level calculations utilize enterprise-level hardware with a live connection
Seasonal Autoregressive Integrated Moving Average (SARIMA)
- Extended ARIMA model to account for seasonal components - Represented as SARIMA(p, d, q)(P, D, Q) m ["m" for seasonal period) - Suitable for time series with trend and/or seasonal components
Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
- Extended SARIMA model to model exogenous variables (=covariates) - Parallel input sequences of observations at the same time steps - Suitable for time series with trend and/or seasonal components and exogenous variables
Data Densification
- Fill in sparse data with missing values - Place in the source with joins, unions, or queries - Place in Tableau after aggregate data is returned from the source
Which of the following layers does NOT include neurons?
- Hidden layer - Output layer - Input layer
Which of the following statements about ensemble models is correct?
- It aggregates the prediction of each base classifier - Its base classifiers needs to be independent for a better performing - It combines multiple diverse base classifiers to predict an outcome
How to promote the diversity of base classifiers?
- May use different training examples through sampling - May use many different modeling algorithms - Introduce randomness into learning procedures - May use different features through sampling
Autoregression (AR)
- Model the next step as a linear function of the observations at prior time steps - Represented as AR(p) - Suitable for time series without trend and seasonal variation
Autoregressive Integrated Moving Average (ARIMA)
- Model the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps - Combine AR, MA and Integration (I) - A differencing pre-processing step to make the sequence stationary - Represented as ARIMA(p, d, q) - Suitable for time series with trend and without seasonal components
Autoregressive Moving Average (ARMA)
- Model the next step in the sequence as a linear function of the observations and residual errors at prior time steps - Combine AR and MA models and represented as ARMA(p, q) - Suitable for time series without trend and seasonal variation
Moving Average (MA)
- Model the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps - Represented as MA(q) - Suitable for time series without trend and seasonal variation
Which of the following ensemble models are using different features to promote diversity of base classifiers?
- Random Forest - Gradient Boosting
EPS
- Specify how close points should be to be in the cluster - If the distance between points <= eps, they are neighbors
Min Points
- Specify the min number of points to form a dense region - If minPoints = 5, need at least 5 points to form a dense region
Relative Table Calculations
- The Table Calculation is computed relative to the layout of the table - Rearranging dimensions that changes the table will change the Table Calculation results - Use the same relative scope and direction, even if you rearrange the view
Fixed Table Calculations
- The Table Calculation is computed using one or more dimensions - Rearranging those dimensions will not change the computation of the Table Calculation - The scope and direction remain fixed to one or more dimensions, no matter where they are moved within the view
Which of the following statements about statistical forecasting for time series analysis is correct?
- The default method is Exponential Smoothing Model with Seasonality - In Exponential smoothing model, more recent values are given greater weight - Exponential smoothing model iteratively forecasts future values from weighted averages of past values
Which of the following statement about Bagging classifier are correct?
- Train a base classifier on each bootstrap sample from the train data - Combine trained multiple base classifiers for predictions on the test data - Improve performance in almost all cases
Which of the following clustering index refers to the outlier data points?
-1
While referring to the short expression for a calculated field, find all correct statements about Pearson Correlation Analysis? __________("import numpy as npreturn np.corrcoef(_arg1,_arg2)[0,1]",SUM([Sales]),SUM([Profit])) 1. The same output will be obtained for np.corrcoef(_arg1,_arg2)[0,1] and np.corrcoef(_arg2,_arg1)[0,1] 2. The return output type should be set to SCRIPT_REAL 3. The returned correlation value will be between 0 and 1
1 and 2
________________ algorithm, one of the most common algorithms for training neural networks, calculates the degree of error, if any, and adjusts the weights that are associated with the inputs for that neuron, working backward from output neurons to input neurons?
Backpropagation
In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don't communicate with each other while casting their votes with the same weight. Which of the following ensemble method works similar to above-discussed election procedure?
Bagging
Which of the following charts use Circle Mark to highlight values and Line Mark to measure the gap between two groups of data points?
Barbell Chart
Imagine that you create a chart to show the percentage of Sales for each Category out of the total Sales. After you filter out one Category, the percentage of Sales of two remaining Categories always add up to 100%. Why does this happen?
Because regular filters on the source are applied before Table Calculations
Which of the following charts is the best to show changes in Rank of a value over a time dimension and a place dimension?
Bump Chart
Decision tree algorithm is called a ______________ box algorithm because its learned tree structure can be represented as if-then rules to improve human readability and interpretation?
White
How to delete a column from a data frame
X=data.drop(['ID','Class'],axis=1)
How to train a classifier model
clf.fit(X,y)
How to read .csv into Python in tabpy
data=pd.read_csv(r'filepath.csv')
A field on a shelf in the view or a calculated field using Table Calculation functions will have a __________ symbol:
delta
How to predict the class label with the trained classifier
pred = clf.predict(X_pred)
Clustering algorithm prefers to have clustering outputs with data points in the same cluster are very _____________ ?
similar
What is the required parameter to K-Means clustering algorithm?
the number of clusters
How to set dependent variable (y) and which variables you should select
y=data['Class']
_______________ are data points that are closer to each other than eps parameter and hence are put into the same cluster
Core samples
Which of the following parameters reflects seasonality factor in Simple Exponential Smoothing (SES) model?
Gamma
_________________ ensemble model is based on Boosting, not Bagging, to add a tree classifier at a time, so that the next classifier is trained to improve the already trained ensemble.
Gradient Boosting
Which of the following charts is the best to show scientific research on medical or environmental studies due to its Line chart characterized by a sharp increase after a relatively flat period?
Hockey Stick Chart
How to compute the accuracy value
IF ATTR([Class]) = [DT Prediction] THEN "Correct" ELSE "Incorrect"END (I think??)
Which of the following clustering algorithms belongs to centroid clustering algorithm?
K-Means
Which of the following clustering algorithms is readily available in Tableau through Analytics Pane?
K-Means
An exploratory data analysis to group data points into clusters is _____________?
K-Means Clustering
Which of the following charts is the best to show the performance over KPI using/adding custom shapes?
KPI Chart
A _____________ is the bottom node, from which no further branches grow?
Leaf node
Generally, an ensemble method works better, if the individual base models have ____________?
Less correlation among predictions
The maximum number of clusters in Clustering analysis with N records and M attributes for each record is ________?
N
The _______________ algorithm is computational models inspired by the behavior of neurons and the electrical signals, processing, and output from the brain?
Neural Network
Which of the following statements is NOT applicable to Decision tree algorithm?
None of the above
Which of the following charts is the best to identify the key segments of your customer base (e.g., top 20% of customers) that are most important for your business success?
Pareto Chart
What kind of Table Calculation should be applied to measure values in Pareto Chart?
Percent of Total
Differences of table calculation from other calculations
Performed after the initial query
_________________ is a new API (Application Programming Interface) that enables evaluation of Python code within a Tableau workbook.
TabPy
________ calculations make it possible to compare and perform calculations on aggregate values across rows of the resulting table
Table
Neural networks model is called a black box algorithm because its final structure is hidden and hence it is very difficult for human to understand the patterns (True/False)?
True
If you want to compute the 3 month moving average of SUM(Quantity) in two previous months and the current month, which of the following expression should you use?
WINDOW_AVG(SUM([Quantity]), -2, 0)
Visualize and analyze the cumulative impact of sequential positive or negative drivers
Waterfall Chart
While referring to the short expression for a calculated field, find all correct statements about Linear Regression Analysis? ___________('import numpy as npfrom sklearn import linear_modelclf = linear_model.LinearRegression()x=np.transpose(np.array([_arg1]))y=np.array(_arg2)clf.fit(x,y)return clf.predict(x).tolist()',SUM([Profit]),SUM([Sales])) 1. The return output type should be set to SCRIPT_INT 2. The above calculated field will execute without errors 3. You need to change the last two lines of the code into SUM([Sales]), SUM([Profit])) to reflect causality between two measures
2 and 3
What are the required parameters to DBSCAN clustering algorithm? 1. the number of clusters 2. the number of features 3. how close points should be to be in the cluster 4. the number of points to form a cluster
3 and 4 only
Which of the following statements is NOT one of K-Means clustering process? 1. Initialize the centroids of clusters randomly along variables 2. Assign each data point to the nearest cluster 3. Recalculate the centroids of clusters 4. For a chosen data point, determine if it is a noise sample that is not a core point nor a border point
4
Which of the following statements about eps parameter are NOT correct? 1. eps parameter specifies how close points should be to be in the cluster 2. If the distance between points <= eps, they are considered neighbors 3. Too small value of eps will result in that no points are core samples 4. Too large value of eps will lead to all points being labeled as noise 5. minPoints parameter specifies the maximum number of points to form a dense region
4 and 5 only
________________ acts as a mathematical gate in between the input and its output and transform/normalize the output of each neuron in [0,1] or [-1,1]
Activation function
Which of the following statements about Time Series Analysis are correct? 1. Autocorrelation refers to the association of two observations at prior time points 2. Seasonal variation refers to the recurring patterns of observations over short time periods 3. Trend refers to the increasing or decreasing mean value of the series 4. Stationarity refers to constant mean (& variance) value of the series over a long time period
All of the above
Which of the following statements about table calculations are NOT correct? 1. Regular filters on the source are applied after Table Calculations 2. Fields in a Table Calculation are only as a row level 3. Its performance depends on enterprise-level hardware with a live connection 4. It is not appropriate for tables
All of the above
Which of the following clustering algorithms belongs to density clustering algorithm?
DBSCAN
The ____________________ algorithm builds a prediction model by creating a series of splits at nodes after evaluating input features in terms of how cleanly it divides the data across the class states of target variable?
Decision Tree
Which of the following algorithm is not an example of an ensemble method?
Decision Tree
Which of the following Table Calculations is NOT used in Sankey Diagram?
Difference
Which of the following charts is a bar chart with two marks: One for some dimension members pointing up or right and another for other dimension members pointing in the opposite direction (down or left, respectively)?
Diverging Bar Chart
Which of the following metrics is LEAST appropriate to evaluate the K-Means clustering output?
Prediction accuracy
Late Filtering
Problem: The percent of the total will always add up to 100%, the percentage of the filtered total Fix: use LOOKUP(ATTR[__________]), 0)
Which of the following charts is a Bar chart plotted in polar coordinates instead of a Cartesian plane?
Radial Bar Chart
1. Choose T, number of trees to grow 2. Choose m < M (M is the number of total features), number of features to calculate the best split at each node (typically 20%) For each tree 3. Choose a training set by choosing N times (N is the number of training examples) with replacement from the training set At each node, randomly choose m features and calculate the best split Fully grown and not pruned The above algorithm is the best description of the _______________ algorithm?
Random Forest
__________________ refers to the shaded areas behind the marks in the view and is usable for any continuous axis in the view?
Reference Band
Result in several gradient shaded bands at various intervals across the numeric axis. Distribution can be defined by percentages, percentiles, quantiles, or standard deviation
Reference Distribution
A _____________ is the topmost node in a tree structure in a decision tree algorithm?
Root node
What kind of Table Calculation should be applied to measure values in Hockey Stick Chart?
Running Total
Which of the following choice is the best for the underlined portion of the following calculated field? _______________(" MyList = [] for x in _arg1: MyList.append(x>0) return MyList ", SUM([Profit]) )
SCRIPT_BOOL
Which of the following output type is the most appropriate for a calculated field that returns the outcome of clustering algorithms?
SCRIPT_REAL
Which of the following charts is the best to show flows between entities and their quantities in proportion to one another and magnitudes of flows with the width of flow arrows?
Sankey Diagram
_________________ is a repeating, predictable variation in value over months, quarters, days, or hourly.
Seasonality
Which of the following statement is NOT one of training steps of neural network algorithms?
Select the best splitting attribute at the root node
Alpha (Base factor)
The higher, the more weights to the most recent observations
Gamma (Seasonality factor)
The higher, the more weights to the recent seasonal components
Beta (Trend factor)
The higher, the more weights to the recent trends
Compare train and test accuracy: Which is higher and which one should you use to report
Train is higher, test is what you should report