USU Data 5400 Final Exam Review

¡Supera tus tareas y exámenes ahora con Quizwiz!

Random Forest

Choose T, number of trees to grow Choose m < M (M is the number of total features), number of features to calculate the best split at each node (typically 20%) For each tree Choose a training set by choosing N times (N is the number of training examples) with replacement from the training set At each node, randomly choose m features and calculate the best split Fully grown and not pruned The above algorithm is the best description of the _______________ algorithm?

2 and 3 only

While referring to the short expression for a calculated field, find all correct statements about Linear Regression Analysis? ___________('import numpy as npfrom sklearn import linear_modelclf = linear_model.LinearRegression()x=np.transpose(np.array([_arg1]))y=np.array(_arg2)clf.fit(x,y)return clf.predict(x).tolist()',SUM([Profit]),SUM([Sales])) 1. The return output type should be set to SCRIPT_INT 2. The above calculated field will execute without errors 3. You need to change the last two lines of the code into SUM([Sales]), SUM([Profit])) to reflect causality between two measures

1 and 2 only

While referring to the short expression for a calculated field, find all correct statements about Pearson Correlation Analysis? __________("import numpy as npreturn np.corrcoef(_arg1,_arg2)[0,1]",SUM([Sales]),SUM([Profit])) 1. The same output will be obtained for np.corrcoef(_arg1,_arg2)[0,1] and np.corrcoef(_arg2,_arg1)[0,1] 2. The return output type should be set to SCRIPT_REAL 3. The returned correlation value will be between 0 and 1

Core samples

_______________ are data points that are closer to each other than eps parameter and hence are put into the same cluster

Activation function

________________ acts as a mathematical gate in between the input and its output and transform/normalize the output of each neuron in [0,1] or [-1,1]

Backpropagation

________________ algorithm, one of the most common algorithms for training neural networks, calculates the degree of error, if any, and adjusts the weights that are associated with the inputs for that neuron, working backward from output neurons to input neurons?

Table

_________________ calculations make it possible to compare and perform calculations on aggregate values across rows of the resulting table

TabPy

_________________ is a new API (Application Programming Interface) that enables evaluation of Python code within a Tableau workbook.

Seasonality

_________________ is a repeating, predictable variation in value over months, quarters, days, or hourly.

Gradient Boosting

__________________ ensemble model is based on Boosting, not Bagging, to add a tree classifier at a time, so that the next classifier is trained to improve the already trained ensemble.

Reference Band

__________________ refers to the shaded areas behind the marks in the view and is usable for any continuous axis in the view?

similar

Clustering algorithm prefers to have clustering outputs with data points in the same cluster are very _____________ ?

Less correlation among predictions

Generally, an ensemble method works better, if the individual base models have ____________?

All of the above

How to promote the diversity of base classifiers? May use different features through sampling May use different training examples through sampling All of the above Introduce randomness into learning procedures May use many different modeling algorithms

WINDOW_AVG(SUM([Quantity]), -2, 0)

If you want to compute the 3 month moving average of SUM(Quantity) in two previous months and the current month, which of the following expression should you use?

Because regular filters on the source are applied before Table Calculations

Imagine that you create a chart to show the percentage of Sales for each Category out of the total Sales. After you filter out one Category, the percentage of Sales of two remaining Categories always add up to 100%. Why does this happen?

Bagging

In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don't communicate with each other while casting their votes with the same weight. Which of the following ensemble method works similar to above-discussed election procedure?

True

Neural networks model is called a black box algorithm because its final structure is hidden and hence it is very difficult for human to understand the patterns (True/False)?

Neural Network

The _______________ algorithm is computational models inspired by the behavior of neurons and the electrical signals, processing, and output from the brain?

Decision Tree

The ____________________ algorithm builds a prediction model by creating a series of splits at nodes after evaluating input features in terms of how cleanly it divides the data across the class states of target variable?

3 and 4 only

What are the required parameters to DBSCAN clustering algorithm? 1. the number of clusters 2. the number of features 3. how close points should be to be in the cluster 4. the number of points to form a cluster

the number of clusters

What is the required parameter to K-Means clustering algorithm?

Running Total

What kind of Table Calculation should be applied to measure values in Hockey Stick Chart?

Percent of Total

What kind of Table Calculation should be applied to measure values in Pareto Chart?

Difference

Which of the following Table Calculations is NOT used in Sankey Diagram?

Decision Tree

Which of the following algorithm is not an example of an ensemble method?

Radial Bar Chart

Which of the following charts is a Bar chart plotted in polar coordinates instead of a Cartesian plane?

Diverging Bar Chart

Which of the following charts is a bar chart with two marks: One for some dimension members pointing up or right and another for other dimension members pointing in the opposite direction (down or left, respectively)?

Pareto Chart

Which of the following charts is the best to identify the key segments of your customer base (e.g., top 20% of customers) that are most important for your business success?

Bump Chart

Which of the following charts is the best to show changes in Rank of a value over a time dimension and a place dimension?

Sankey Diagram

Which of the following charts is the best to show flows between entities and their quantities in proportion to one another and magnitudes of flows with the width of flow arrows?

Hockey Stick Chart

Which of the following charts is the best to show scientific research on medical or environmental studies due to its Line chart characterized by a sharp increase after a relatively flat period?

KPI Chart

Which of the following charts is the best to show the performance over KPI using/adding custom shapes?

Barbell Chart

Which of the following charts use Circle Mark to highlight values and Line Mark to measure the gap between two groups of data points?

SCRIPT_BOOL

Which of the following choice is the best for the underlined portion of the following calculated field? _______________(" MyList = [] for x in _arg1: MyList.append(x>0) return MyList ", SUM([Profit]) )

K-Means

Which of the following clustering algorithms belongs to centroid clustering algorithm?

DBSCAN

Which of the following clustering algorithms belongs to density clustering algorithm?

K-Means

Which of the following clustering algorithms is readily available in Tableau through Analytics Pane?

-1

Which of the following clustering index refers to the outlier data points?

2 and 4 only

Which of the following ensemble models are built through sequentially coordinated sampling method of records? 1. Bagging 2. AdaBoosting 3. Random Forest 4. Gradient Boosting

1 and 2 only

Which of the following ensemble models are independent of base classifiers (i.e., use any type of classifiers as base classifiers)? 1. Bagging 2. AdaBoosting 3. Random Forest 4. Gradient Boosting

3 and 4 only

Which of the following ensemble models are using different features to promote diversity of base classifiers? 1. Bagging 2. AdaBoosting 3. Random Forest 4. Gradient Boosting

None of the above

Which of the following layers does NOT include neurons? Input and Hidden layer Hidden layer None of the above Output layer Input layer

Prediction accuracy

Which of the following metrics is LEAST appropriate to evaluate the K-Means clustering output?

SCRIPT_REAL

Which of the following output type is the most appropriate for a calculated field that returns the outcome of clustering algorithms?

Gamma

Which of the following parameters reflects seasonality factor in Simple Exponential Smoothing (SES) model?

1 and 2 only

Which of the following statement about AdaBoosting classifier are correct? 1. At each round for a base classifier, create a bootstrap sample based on the weights 2. At each round for a base classifier, records wrongly classified will have their weights increased 3. Robust in noisy settings 4. Use majority voting or averaging of predictions from base classifiers to predict the class label

1, 2 and 4 only

Which of the following statement about Bagging classifier are correct? 1. Train a base classifier on each bootstrap sample from the train data 2. Combine trained multiple base classifiers for predictions on the test data 3. Use weighted voting of predictions from base classifiers to predict the class label 4. Improve performance in almost all cases

Select the best splitting attribute at the root node

Which of the following statement is NOT one of training steps of neural network algorithms?

Useful to control how Table Calculations are computed with relative reference to fields in the view

Which of the following statements about Addressing and Partitioning is NOT correct?

None of the above

Which of the following statements about Scope and Direction is NOT correct? -None of the above -Scope determines when the table calculation will reset to a beginning value -Scope defines the boundaries within which a given Table --Calculation can reference other values -Direction refers to the order in which table cells are calculated Direction defines how the Table Calculation moves within the scope

1, 2, 3 and 4 only

Which of the following statements about Time Series Analysis are correct? 1. Autocorrelation refers to the association of two observations at prior time points 2. Seasonal variation refers to the recurring patterns of observations over short time periods 3. Trend refers to the increasing or decreasing mean value of the series 4. Stationarity refers to constant mean (& variance) value of the series over a long time period

More scalable for large data than K-Means

Which of the following statements about advantages of DBSCAN over K-Means is NOT correct?

each base classifier does not need to perform better than random guessing

Which of the following statements about ensemble models is NOT correct? It aggregates the prediction of each base classifier Its base classifiers needs to be independent for a better performing It combines multiple diverse base classifiers to predict an outcome each base classifier does not need to perform better than random guessing

4 and 5 only

Which of the following statements about eps parameter are NOT correct? 1. eps parameter specifies how close points should be to be in the cluster 2. If the distance between points <= eps, they are considered neighbors 3. Too small value of eps will result in that no points are core samples 4. Too large value of eps will lead to all points being labeled as noise 5. minPoints parameter specifies the maximum number of points to form a dense region

Exponential smoothing model utilizes only two parameters: Alpha and Beta

Which of the following statements about statistical forecasting for time series analysis is NOT correct?

All of the above

Which of the following statements about table calculations are NOT correct? 1. Regular filters on the source are applied after Table Calculations 2. Fields in a Table Calculation are only as a row level 3. Its performance depends on enterprise-level hardware with a live connection 4. It is not appropriate for tables

1 and 2 only

Which of the following statements are correct regarding benefits of ensemble model? 1. Better performance than single model 2. Improved robustness than single model 3. Better interpretability than single model 4. Faster implementation than single model

None of the above

Which of the following statements is NOT applicable to Decision tree algorithm? None of the above Fast training Easy to understand the patterns Robust and reliable Accurate prediction

4

Which of the following statements is NOT one of K-Means clustering process? 1. Initialize the centroids of clusters randomly along variables 2. Assign each data point to the nearest cluster 3. Recalculate the centroids of clusters 4. For a chosen data point, determine if it is a noise sample that is not a core point nor a border point

Centroid sample

Which of the following type of points is NOT used to categorize a chosen data point in DBSCAN clustering?

Leaf node

A _____________ is the bottom node, from which no further branches grow?

Root node

A _____________ is the topmost node in a tree structure in a decision tree algorithm?

delta

A field on a shelf in the view or a calculated field using Table Calculation functions will have a __________ symbol:

K-Means Clustering

An exploratory data analysis to group data points into clusters is _____________?

White

Decision tree algorithm is called a ______________ box algorithm because its learned tree structure can be represented as if-then rules to improve human readability and interpretation?

All of the above

Seasonal Autoregressive Integrated Moving Average (SARIMA) can accommodate which of the following models? 1. AR Model 2. MA Model 3. ARMA Model 4. ARIMA Model 5. Seasonal Model

N

The maximum number of clusters in Clustering analysis with N records and M attributes for each record is ________?


Conjuntos de estudio relacionados

Thermodynamics Final Exam Review

View Set

Biology - Evolution: Early Earth Practice Questions

View Set

The Process of Occupational Therapy NBCOT

View Set

Ch.19 Industrial Revolution and Unification Test

View Set

Autonomic Nervous System chap 15

View Set

Intro to Climate Studies 2021-2022 (ENSC 220) Investigation 1B

View Set

1 KINEMATICS AND DYNAMICS-KHAN ACADEMY

View Set