CIS_5450_FINAL_EXAM

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

19 - The number of weights for each layer is equal to A) the bias + the number of features or units in the prior layer, plus the number of activation units in the current layer B) the number of features or units in the prior layer, times the number of activation units in the current layer C) the bias + the number of features or units in the prior layer, times the number of activation units in the current layer D) the maximum number of units in a layer

C) the bias + the number of features or units in the prior layer, times the number of activation units in the current layer

15 - k-Means in Spark SQL A) Iterates exactly twice B) Requires the data to be in a relational database system like Oracle or MySQL C) iterates and assigns clusters to centroids D) Iterates over two stages, assigning instances to clusters and recomputing cluster centroids

Iterates over two stages, assigning instances to clusters and recomputing

17 - Ridge regression adds a regularization term equal to the A) L1 norm of the weights B) Square root of the weights C) L2 (squared) norm of the weights

L2 (squared) norm of the weights

17 - The standard error function minimized by linear regression is the A) Mean Squared Error B) Mean Total Error C) Variance D) Log Likelihood Error

Mean Squared Error

15 - Single linkage: A) Merges clusters based on triadic closure B) Merges clusters whose *nearest* points are closest, out of all pairs of clusters C) Merges clusters whose *most distant* points are closest, out of all pairs of clusters

Merges clusters whose *nearest* points are closest, out of all pairs of clusters

20 - Does a larger deep network always perform better than a smaller one on image classification? A) No B) I don't know C) Yes

No

15 - k-Means in MLLib A) Requires that we put the features in a vector within a dataframe B) Can work directly on any dataframe C) Requires a RowMatrix as input

Requires that we put the features in a vector within a dataframe

11A - In the web, the links that matter for ranking are considered to be those: A) between sites B) within sites C) between or within sites

between sites

15 - Agglomerative clustering is: A) bottom-up B) not a real clustering method C) top-down

bottom-up

21 - A continuous query produces a A) data stream B) dataframe C) single result D) table

data stream

18 - Which method listed below is scale invariant? A) decision tree B) PCA C) ridge regression D) k-means clustering

decision tree

19 - Which of the following is NOT a commonly used activation function? A) hyperbolic tangent (tanh) B) dot product C) sigmoid (logistic) D) ReLU

dot product

21 - Which of the following do stream processing systems typically guarantee? A) messages do not experience congestion B) every message gets delivered at least once C) every message gets delivered more than once D) every message gets delivered at most once

every message gets delivered at least once

14 - To pick the number of dimensions, we look at the A) explained variance ratio B) KL divergence C) dimensionality reduction constant D) average value

explained variance ratio

12 - "Fancy indexing" refers to A) generating a B+ Tree index B) transposing rows and columns in an array C) extracting rows from an array given a list of indexing D) creating a dataframe index

extracting rows from an array given a list of indexing

17 - Weight assignments in logistic regression can be computed with a closed-from solution A) False B) True

false

19 - In a multilayer network, we compute the output using A) learning parameter eta B) feed-forward from layer to layer C) logistic regression D) stochastic gradient descent

feed-forward from layer to layer

15 - The number of clusters k is set by: A) finding an elbow in the distortion or inertia B) maximizing the distance between points C) finding a knee in the distortion or inertia

finding an elbow in the distortion or inertia

16 - The two stages of training and using a classifier in SciKit-Learn are called, respectively: A) fit_transform() and accuracy_score() B) fit() and predict() C) fit() and transform() D) train() and classify()

fit() and predict()

13 - The "grammar of graphics" is implemented in: A) lightning B) ggplot and ggplot2 C) matplotlib and pyplot D) D3.js

ggplot and ggplot2

13 - A "facet grid" in seaborn is similar to what operation in Pandas? A) filter B) pivot C) join D) group by

group by

16 - Decision trees are built using A) optimal choices of split points B) random choices of split points C) heuristic decisions about split points

heuristic decisions about split points

18 - As we overfit a classifier, its variance is A) high B) no different from underfitting a classifier C) always zero D) low

high

12 - The t-test is useful for testing A) whether a probability distribution is Gaussian B) how far the mean of a probability distribution differs from a reference, e.g., 0 C) any given hypothesis D) the standard deviation of a probability distribution

how far the mean of a probability distribution differs from a reference, e.g., 0

21 - Apache Spark Streaming processes tuples A) as they arrive B) in micro-batches C) after the stream ends D) all at once

in micro-batches

16 - Which of the following is NOT a legitimate technique to reduce overfitting in a decision tree? A) increasing the number of positive examples to exceed the negative examples B) pruning the tree C) pre-processing using PCA D) not splitting if the number of samples is below a threshold

increasing the number of positive examples to exceed the negative examples

11 - What properties of PageRank are important for web search applications: A) independence of the query and guaranteed convergence B) speed C) self-importance D) ability to take query semantics into account

independence of the query and guaranteed convergence

21 - A data stream can be viewed as an A) infinite data frame with a finite subset visible B) a finite table from time 0 to time 100 C) an infinite data frame requiring infinite memory

infinite data frame with a finite subset visible

11A - Link analysis for the web defines a node's influence in terms of: A) influence of a node's neighbors B) direct neighbors C) connecting paths

influence of a node's neighbors

17 - Linear regression learns a feature that maps from A) input features to continuous output using a linear function B) input features to Boolean output using a linear function C) input features to output features D) input features to Boolean output using a sigmoid function

input features to continuous output using a linear function

13 - Streamli works by A) interacting with the browser as HTML controls are manipulated B) sending data to Javascript for visualization C) inverting an index D) streaming its data

interacting with the browser as HTML controls are manipulated

20 - Which of the following is NOT a characteristic of a convolutional neural network? A )it has shared weights B) it has sparsely connected layers C) it ONLY uses sigmoid activation functions D) it has many compositional layers

it ONLY uses sigmoid activation functions

19 - Training a model in PyTorch requires all of the stages listed below EXCEPT A) model training B) k-fold cross-validation C) model initialization

k-fold cross-validation

13 - The "grammar of graphics" counts data and its transformations into a visual image as A) layers B) coordinates C) scales D) facets

layers

14 - The first component in PCA is based on the vector that A) minimizes variance B) is orthogonal to the original features C) minimizes entropy D) maximizes variance

maximizes variance

12 - Matrix bulk operations include: A) crop, invert B) join, group-by, select C) extract primes D) multiply, add, transpose, determinant, slice

multiply, add, transpose, determinant, slice

16 - Supervised learning is typically... A) never used after unsupervised learning B) never used before unsupervised learning C) always used

never used after unsupervised learning

14 - For machine learning matrices in Spark, do we need to consider which rows are on which machine? A) yes B) it depends on the classifier C) it depends on the values D) no

no

19 - In PyTorch, calling loss.backward() will populate the gradients of all A) objects of type torch.nn.Parameter used in the computation of loss B) All Parameters that the optimizer was made aware of C) Examples in your training data D) All Modules used in the computation of loss

objects of type torch.nn.Parameter used in the computation of loss

19 - One-vs-rest classification for a single-layer network means A) the outputs are indeterminate if we train the network B) only one output should be 1 at any point in time C) multiple outputs should be active and we use majority vote

only one output should be 1 at any point in time

18 - If the learning curve shows a big gap between training and test accuracy, the issue is likely to be...: A) underfitting B) overfitting or model complexity C) that we've plotted the curve incorrectly

overfitting or model complexity

17 - Gradient descent involves taking the A) partial derivative of the cost function with respect to a given weight B) partial derivative of the feature with respect to the weight C) partial derivative of the cost function with respect to the feature maximum possible step size

partial derivative of the cost function with respect to a given weight

18 - A model that is too low in complexity will generally A) perform poorly only with training data B) perform poorly only with test data C) perform poorly with both training and test data

perform poorly with both training and test data

17 - The SciKit LogisticRegression classifier can be used in a A) pipeline B) dimensionality reduction stage C) clustering algorithm

pipeline

18 - If our classifier only returns positive under conditions with very high probability -- returning negative in all other cases, it will do well with: A) precision B) recall C) sensitivity

precision

21 - Which of the following is not a good use case for data streaming? A) triggering alerts based on health biomarkers B) monitoring web server traffic C) querying a data warehouse D) tracking memes on Twitter

querying a data warehouse

16 - To derive training and test sets in Apache Spark MLLib, we can use A) randomSplit() B) split_training_data() C) test_train_split()

randomSplit()

18 - Undersampling is a method that: A) removes rows from the larger class B) combines smaller classes to form one larger class C) adds rows to the smaller class D) removes rows from the smaller class

removes rows from the larger class

20 - To perform domain adaptation of a pretrained network, we: A) reset all of the weights to 0 B) replace the first few layers and retrain C) replace the last few layers and retrain D) reset all of the weights to 1

replace the last few layers and retrain

21 - Windows in Spark Streaming are created as A) Python collections B) sharded dataframes C) separate dataframe fields, whose values represent the start and stop D) split dataframes

separate dataframe fields, whose values represent the start and stop

18 - When we do value imputation, which method will always fill the same value into each null, for a given feature? A) clustering B) simple imputation using mean C) k-nearest-neighbors imputation using mean D) k-nearest-neighbors imputation using median

simple imputation using mean

11 - The PageRank of a node, as it "flows" out from a node, is: A) split uniformly across all out-links B) assigned in full to each out-link C) split across out-links in order of importance D) not considered

split uniformly across all out-links

18 - For Principal Components Analysis (PCA), we should generally use which method for scaling: A) min-max scaling B) the original scale C) standardization

standardization

14 - Which algorithm focuses on preserving neighbor relationships? A) PCA B) t-SNE C) PIK D) SVD

t-SNE

17 - Lasso regression A) tends to push the weights towards zero B) equally affects all weights C) works best when there are many large parameters of around the same value

tends to push the weights towards zero

12 - With Spark Matrices, we need to shard by: A) the coordinates or tiles in a matrix B) the join key C) the values in a given column D) the values in a given row

the coordinates or tiles in a matrix

14 - (If we are not using SVD), PCA uses the eigenvectors and eigenvalues of A) the covariance matrix B) the weight matrix C) the variance matrix D) the weight transfer matrix

the covariance matrix

17 - The derivative of the log loss function in logistic regression is similar to A) the derivative of the SSE loss function B) the derivative of the gradient C) the derivative of the quadratic loss function D) the derivative of the MSE loss function

the derivative of the MSE loss function

16 - Entropy represents: A) the number of bits needed to encode a sequence of samples or data items B) the repetitiveness in the data C) chaos

the number of bits needed to encode a sequence of samples or data items

12 - We generally distinguish between a 2D array and a matrix in Numpy based on A) whether we initialize using np.array or np.matrix B) the index coordinates C) whether there are an equal number of rows and columns D) the operations we apply to the array

the operations we apply to the array

11 - The decay factor alpha, as defined in the slides, can be considered to be: A) the proportion of the time the user traverses a link from the page B) the proportion of pages that are important C) the proportion of PageRank that is important D) the proportion of time the user randomly jumps at uniform to another page

the proportion of the time the user traverses a link from the page

21 - In streaming, which is typically fixed, and which is dynamic? A) both are typically fixed B) the data is fixed, the query is dynamic C) both are typically dynamic D) the query is fixed, the data is dynamic

the query is fixed, the data is dynamic

16 - Information gain is computed as: A) the growth in entropy after a split point B) the reduction in entropy after a split point

the reduction in entropy after a split point

21 - When data stream elements have timestamps, what is the timestamp of the result of a join? A) the timestamp of the *older* input that was joined B) the average timestamp of the joined inputs C) the current system time D) the timestamp of the *newer* input that was joined

the timestamp of the *newer* input that was joined

12 - With Spark DataFrames, we shard by: A) the index of a row B) the index of a column C) the values of a given join or grouping key D) the coordinates of a matrix

the values of a given join or grouping key

18 - The learning curve plots a classifier's A) training samples vs feature values B) validation accuracy vs hyperparameter accuracy C) hyperparameter accuracy and training accuracy vs training samples D) training accuracy and validation accuracy vs samples

training accuracy and validation accuracy vs samples

18 - We are looking to achieve balance across classes within our: A) training data B) test data C) validation data

training data

16 - In supervised machine learning, we learn the function over (which) data and evaluate its efficacy over (which) data? A) test, validation, respectively B) test, training, respectively C) training, test, respectively

training, test, respectively

19 - The ROC curve allows us to see A) the probability of flipping a coin B) area under the curve vs accuracy C) true positive vs false positive rates D) returns over crypto investment

true positive vs false positive rates

21 - Which type of window is NON-overlapping? A) tumbling window B) convolutional window C) sliding window D) striding window

tumbling window

15 - k-Means clustering iterates A) k times B) 2k times C) k^2 times D) until a stop condition

until a stop condition

17 - Logistic regression does NOT: A) use mean-squared error as its cost function B) use a piecewise cost function C) use a thresholded sigmoid function to make classification D) predict the probability of instance membership in a class

use mean-squared error as its cost function

20 - AlexNet was notable because it did all of the following EXCEPT: A) used a deeper network than prior methods B) used a GPU unlike other prior methods C) used a sigmoid function unlike prior methods D) had a much higher score than prior methods

used a sigmoid function unlike prior methods

13 - When might we want to normalize our data? A) when all of the data has similar value ranges B) when the overall trends are hard to spot due to differences in scale C) when we want to make certain results look bigger than others D) when a log-scale plot is best

when the overall trends are hard to spot due to differences in scale

16 - A random forest creates different decision trees (or stumps) over subsets of the training data drawn A) with replacement B) without replacement

with replacement

22 - An operation that is easier to integrate into Apache Storm than Spark Streaming is an incrementally trained model a sharded join a selection operation a windowed aggregation operation

an incrementally trained model

17 - If we are worried about multicollinearity with logistic regression, we can initially...: A) use decision trees B) remove rows such that collinearity is reduced C) apply PCA D) apply k-means clustering

apply PCA

11 - PageRank encapsulates a random walk in that: 1) as a "random walker" visits a node, they randomly choose a link to follow, and PageRank measures the proportion of time at each node 2) PageRank captures the importance of randomness in a measure of importance 3) the random walker jumps from a node to any other random node with equal probability 4) PageRank captures the randomness inherent in the Internet

as a "random walker" visits a node, they randomly choose a link to follow, and PageRank measures the proportion of time at each node

13 - Interactive visualization typically requires A) graphical Python B) pens and papers C) a PDF reader D) a browser and web server

a browser and web server

14 - To store values, a sparse matrix uses A) an array of lists B) a dictionary C) an array of arrays D) a list of arrays

a dictionary

19 - The weight update during training is based on A) a learning rate, eta B) a random number generator, rng C) a mean squared error, MSE D) a logistic update rate, mu

a learning rate, eta

12 - One-hot encoding refers to: A) a scheme for encoding everything in 1's and 0's B) a scheme in which each categorical attribute is represented by an int value in a column C) a scheme for encoding categorical attributes as strings D) a scheme in which each categorical attribute gets its own Boolean column

a scheme in which each categorical attribute gets its own Boolean column

16 - A decision tree partitions the data using A) a tree of split points used to partition the instances B) a tree of tuples C) tree-structured data

a tree of split points used to partition the instances

13 - Line plots should only be used for: A) categorical data B) Boolean data C) String data D) continuous data

continuous data

11 - The PageRank weight transfer matrix initializes each M[i,j] to be: A) 1/N_j if j points to i, where N_j is the number of out-edges from j B) 1/N_j if i points to j, where N_j is the number of out-edges from j C) 1/N_i if j points to i, where N_i is the number of out-edges from i D) 1/n

1/N_j if j points to i, where N_j is the number of out-edges from j

13 - How many dimensions does a typical heatmap represent? 4 2 3 1

3

13 - A common way to mislead with visualizations is: A) Use non-representative scales B) Visualize a quantity that does not match the title C) All of these D) Invert the scale

All of these

21 - Which of the following is a stream processing system? A) Apache Marvel B) Apache Flink C) Amazon Neptune D) Oracle Rdb

Apache Flink

12 - If we test many hypotheses, we may wish to use the A) false discovery test B) p-hacking defense C) Bonferroni correction D) pi test

Bonferroni correction

18 - If we have a huge and diverse training set with a small number of missing values, which is generally preferable? A) Imputing values using k-nearest-neighbors B) Imputing values using the mean C) Dropping entries with nulls D) Imputing values using the mode

Dropping entries with nulls

17 - Gradient descent is the only way to find the optimal weights for linear regression A) False B) True C) Maybe

False

14 - Unsupervised machine learning A) Develops machine learning models without needing human input B) Finds structure in the values of the features C) Predicts class membership D) requires TensorFlow

Finds structure in the values of the features

15 - k-Means++ changes the: A) Initial assignment of centroids B) Number of points clustered C) Stop condition

Initial assignment of centroids

19 - Which activation function is piecewise? A) sigmoid B) tanh C) ReLU D) dot product

ReLU

20 - In PyTorch, to create a multilayer network, we add layers to an object of type: A) CNN B) Cascade C) Multilayer D )Sequential

Sequential

14 - Before applying the SciKit-Learn PCA algorithm, it is a good idea to run fit_transform from the A) machine learning classifier B) StandardScaler C) Pipeline D) LinearRegression module

StandardScaler

15 - What is not a good stop condition? A) A fixed number of iterations occurs B) Cluster distortion goes below a threshold C) The clusters converge D) The cluster coefficient is changed

The cluster coefficient is changed

17 - The notion of a mini-batch gradient descent is general enough to capture both stochastic gradient descent and traditional gradient descent. A) I don't know B) False C) True

True

16 - Classification involves finding a function between input features and an output y that is A) numeric B) encoded as a string C) categorical D) continuous

categorical


संबंधित स्टडी सेट्स

Chapter 7 - Advanced Shell Programming

View Set

Physics Unit 9: Force and Motion

View Set

Fundamentals- managing client care

View Set