40 Questions to test a Data Scientist on Dimensionality Reduction Techniques

Ace your homework & exams now with Quizwiz!

Which of the following techniques would perform better for reducing dimensions of a data set? A. Removing columns which have too many missing values B. Removing columns which have high variance in data C. Removing columns with dissimilar data trends D. None of these

(A) If a columns have too many missing values, (say 99%) then we can remove such columns.

In which of the following case LDA will fail? A. If the discriminatory information is not in the mean but in the variance of the data B. If the discriminatory information is in the mean but not in the variance of the data C. If the discriminatory information is in the mean and variance of the data D. None of these

(A) Option A is correct

If we project the original data points into the 1-d subspace by the principal component [ √ 2 /2, √ 2 /2 ] T. What are their coordinates in the 1-d subspace? A. (− √ 2 ), (0), (√ 2) B. (√ 2 ), (0), (√ 2) C. ( √ 2 ), (0), (-√ 2) D. (-√ 2 ), (0), (-√ 2)

(A) The coordinates of three points after projection should be z1 = x T 1 v = [−1, −1][ √ 2/ 2 , √ 2 /2 ] T = − √ 2, z2 = x T 2 v = 0, z3 = x T 3 v = √ 2.

For the projected data you just obtained projections ( (− √ 2 ), (0), (√ 2) ). Now if we represent them in the original 2-d space and consider them as the reconstruction of the original data points, what is the reconstruction error? Context: 29-31: A. 0% B. 10% C. 30% D. 40%

(A) The reconstruction error is 0, since all three points are perfectly located on the direction of the first principal component. Or, you can actually calculate the reconstruction: z1 ·v. xˆ1 = − √ 2·[ √ 2/ 2 , √ 2/2 ] T = [−1, −1]T xˆ2 = 0*[0, 0]T = [0,0] xˆ3 = √ 2* [1, 1]T = [1,1] which are exactly x1, x2, x3.

Imagine you are dealing with 10 class classification problem and you want to know that at most how many discriminant vectors can be produced by LDA. What is the correct answer? A. 20 B. 9 C. 21 D. 11 E. 10

(B) LDA produces at most c − 1 discriminant vectors. You may refer this link for more information.

What will happen when eigenvalues are roughly equal? A. PCA will perform outstandingly B. PCA will perform badly C. Can't Say D.None of above

(B) When all eigen vectors are same in such case you won't be able to select the principal components because in that case all principal components are equal.

Which of the following algorithms cannot be used for reducing the dimensionality of data? A. t-SNE B. PCA C. LDA False D. None of these

(D) All of the algorithms are the example of dimensionality reduction algorithm.

In t-SNE algorithm, which of the following hyper parameters can be tuned? A. Number of dimensions B. Smooth measure of effective number of neighbours C. Maximum number of iterations D. All of the above

(D) All of the hyper-parameters in the option can tuned.

Which of the following statement is correct for t-SNE and PCA? A. t-SNE is linear whereas PCA is non-linear B. t-SNE and PCA both are linear C. t-SNE and PCA both are nonlinear D. t-SNE is nonlinear whereas PCA is linear

(D) Option D is correct. Read the explanation from this link

Which of the following option(s) is / are true? 1. You need to initialize parameters in PCA 2. You don't need to initialize parameters in PCA 3. PCA can be trapped into local minima problem 4. PCA can't be trapped into local minima problem A. 1 and 3 B. 1 and 4 C. 2 and 3 D. 2 and 4

(D) PCA is a deterministic algorithm which doesn't have parameters to initialize and it doesn't have local minima problem like most of the machine learning algorithms has.

Imagine, you have 1000 input features and 1 target feature in a machine learning problem. You have to select 100 most important features based on the relationship between input features and the target features. Do you think, this is an example of dimensionality reduction? A. Yes B. No

[ True or False ] It is not necessary to have a target variable for applying dimensionality reduction algorithms. A. TRUE B. FALSE

A LDA is an example of supervised dimensionality reduction algorithm.

Which of the following comparison(s) are true about PCA and LDA? 1. Both LDA and PCA are linear transformation techniques 2. LDA is supervised whereas PCA is unsupervised 3. PCA maximize the variance of the data, whereas LDA maximize the separation between different classes, A. 1 and 2 B. 2 and 3 C. 1 and 3 D. Only 3 E. 1, 2 and 3

The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an unsupervised method 2. It searches for the directions that data have the largest variance 3. Maximum number of principal components <= number of features 4. All principal components are orthogonal to each other A. 1 and 2 B. 1 and 3 C. 2 and 3 D. 1, 2 and 3 E. 1,2 and 4 F. All of the above

(F) All options are self explanatory.

PCA works better if there is? 1. A linear structure in the data 2. If the data lies on a curved surface and not on a flat surface 3. If variables are scaled in the same unit A. 1 and 2 B. 2 and 3 C. 1 and 3 D. 1 ,2 and 3

Which of the following gives the difference(s) between the logistic regression and LDA? 1. If the classes are well separated, the parameter estimates for logistic regression can be unstable. 2. If the sample size is small and distribution of features are normal for each class. In such case, linear discriminant analysis is more stable than logistic regression. A. 1 B. 2 C. 1 and 2 D. None of these

Which of the following options are correct, when you are applying PCA on a image dataset? 1 .It can be used to effectively detect deformable objects. 2. It is invariant to affine transforms. 3. It can be used for lossy image compression. 4. It is not invariant to shadows. A. 1 and 2 B. 2 and 3 C. 3 and 4 D. 1 and 4

What is of the following statement is true about t-SNE in comparison to PCA? A. When the data is huge (in size), t-SNE may fail to produce better results. B. T-NSE always produces better result regardless of the size of the data C. PCA always performs better than t-SNE for smaller size data. D. None of these

(A) Option A is correct

Which of the following option is true? A. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class. B. Both attempt to model the difference between the classes of data. C. PCA explicitly attempts to model the difference between the classes of data. LDA on the other hand does not take into account any difference in class. D. Both don't attempt to model the difference between the classes of data.

(A) Options are self explanatory.

[ True or False ] Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model. A. TRUE B. FALSE

(A) Reducing the dimension of data will take less time to train a model.

[ True or False ] PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE

(A) Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal components and then visualize the data using scatter plot.

I have 4 variables in the dataset such as - A, B, C & D. I have performed the following actions: Step 1: Using the above variables, I have created two more variables, namely E = A + 3 * B and F = B + 5 * C + D. Step 2: Then using only the variables E and F I have built a Random Forest model. Could the steps performed above represent a dimensionality reduction method? A. True B. False

(A) Yes, Because Step 1 could be used to represent the data into 2 lower dimensions.

[True or False] t-SNE learns non-parametric mapping. A. TRUE B. FALSE

(A) t-SNE learns a non-parametric mapping, which means that it does not learn an explicit function that maps data from the input space to the map. For more information read from this link.

Imagine you are dealing with text data. To represent the words you are using word embedding (Word2vec). In word embedding, you will end up with 1000 dimensions. Now, you want to reduce the dimensionality of this high dimensional data such that, similar words should have a similar meaning in nearest neighbor space.In such case, which of the following algorithm are you most likely choose? A. t-SNE B. PCA C. LDA D. None of these

(A) t-SNE stands for t-Distributed Stochastic Neighbor Embedding which consider the nearest neighbours for reducing the data.

Which of the following statement is true for a t-SNE cost function? A. It is asymmetric in nature. B. It is symmetric in nature. C. It is same as the cost function for SNE.

(B) Cost function of SNE is asymmetric in nature. Which makes it difficult to converge using gradient decent. A symmetric cost function is one of the major differences between SNE and t-SNE.

Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct? A. Higher 'k' means more regularization B. Higher 'k' means less regularization C. Can't Say

(B) Higher k would lead to less smoothening as we would be able to preserve more characteristics in data, hence less regularization.

Under which condition SVD and PCA produce the same projection result? A. When data has zero median B. When data has zero mean C. Both are always same D. None of these

(B) When the data has a zero mean vector, otherwise you have to center the data first before taking SVD.

Xi and Xj are two distinct points in the higher dimension representation, where as Yi & Yj are the representations of Xi and Xj in a lower dimension. 1. The similarity of datapoint Xi to datapoint Xj is the conditional probability p (j|i) . 2. The similarity of datapoint Yi to datapoint Yj is the conditional probability q (j|i) . Which of the following must be true for perfect representation of xi and xj in lower dimensional space? A. p (j|i) = 0 and q (j|i) = 1 B. p (j|i) < q (j|i) C. p (j|i) = q (j|i) D. p (j|i) > q (j|i)

(C) The conditional probabilities for similarity of two points must be equal because similarity between the points must remain unchanged in both higher and lower dimension for them to be perfect representations.

In which of the following scenarios is t-SNE better to use than PCA for dimensionality reduction while working on a local machine with minimal computational power? A. Dataset with 1 Million entries and 300 features B. Dataset with 100000 entries and 310 features C. Dataset with 10,000 entries and 8 features D. Dataset with 10,000 entries and 200 features

(C) t-SNE has quadratic time and space complexity. Thus it is a very heavy algorithm in terms of system resource utilization.

Which of the following can be the first 2 principal components after applying PCA? 1. (0.5, 0.5, 0.5, 0.5) and (0.71, 0.71, 0, 0) 2. (0.5, 0.5, 0.5, 0.5) and (0, 0, -0.71, -0.71) 3. (0.5, 0.5, 0.5, 0.5) and (0.5, 0.5, -0.5, -0.5) 4. (0.5, 0.5, 0.5, 0.5) and (-0.5, -0.5, 0.5, 0.5) A. 1 and 2 B. 1 and 3 C. 2 and 4 D. 3 and 4

(D) For the first two choices, the two loading vectors are not orthogonal.

What happens when you get features in lower dimensions using PCA? 1. The features will still have interpretability 2. The features will lose interpretability 3. The features must carry all information present in data 4. The features may not carry all information present in data A. 1 and 3 B. 1 and 4 C. 2 and 3 D. 2 and 4

(D) When you get the features in lower dimensions then you will lose some information of data most of the times and you won't be able to interpret the lower dimension data.

40 Questions to test a Data Scientist on Dimensionality Reduction Techniques

Related study sets

EN EL AEROPUERTO

Chapter 31: Assessment and Management of Patients With Hypertension

CH 11

3 OBM PART #10

The Present Perfect

Chapter 2 - Job Performance Concepts and Measures

ENVS Exam 2

BUS stat ch.7 Q&A

class 4 - chapter 28: safety, security, and emergency preparedness

AP Psych, Unit 5, Cognitive Psychology

Chapter 10 Quiz

Foundations FINAL

Marketing Research Final (Philip)

MGMT440 ch12-14

Area and Volume

Sociology Chapter 4 : Social Structure and Interaction in Everyday Life

MIS585 Midterm (Ch. 1-5)

IST (True/False Ch 6,7 and 8)

Chapter 1 : Accountability

Nutrition - Nutrition and Physical Fitness