43- Combined final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Q35. Which of the following sequences is correct for a K-Means algorithm using Forgy method of initialization? 1. Specify the number of clusters 2. Assign cluster centroids randomly 3. Assign each data point to the nearest cluster centroid 4. Re-assign each point to nearest cluster centroids 5. Re-compute cluster centroids Options: A. 1, 2, 3, 5, 4 B. 1, 3, 2, 4, 5 C. 2, 1, 3, 4, 5 D. None of these

Solution: (A) The methods used for initialization in K means are Forgy and Random Partition. The Forgy method randomly chooses k observations from the data set and uses these as the initial means. The Random Partition method first randomly assigns a cluster to each observation and then proceeds to the update step, thus computing the initial mean to be the centroid of the cluster's randomly assigned points.

Q7. Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means A. Yes B. No C. Can't say D. None of these

Solution: (A) When the K-Means algorithm has reached the local or global minima, it will not alter the assignment of data points to clusters for two successive iterations.

3) I have 4 variables in the dataset such as - A, B, C & D. I have performed the following actions: Step 1: Using the above variables, I have created two more variables, namely E = A + 3 * B and F = B + 5 * C + D. Step 2: Then using only the variables E and F I have built a Random Forest model. Could the steps performed above represent a dimensionality reduction method? A. True B. False

Solution: (A) Yes, Because Step 1 could be used to represent the data into 2 lower dimensions.

13) [True or False] t-SNE learns non-parametric mapping. A. TRUE B. FALSE

Solution: (A) t-SNE learns a non-parametric mapping, which means that it does not learn an explicit function that maps data from the input space to the map. For more information read from this link. https://lvdmaaten.github.io/tsne/

Question 12 Imagine you are dealing with text data. To represent the words you are using word embedding (Word2vec). In word embedding, you will end up with 1000 dimensions. Now, you want to reduce the dimensionality of this high dimensional data such that, similar words should have a similar meaning in nearest neighbor space.In such case, which of the following algorithm are you most likely choose? A. t-SNE B. PCA C. LDA D. None of these

Solution: (A) t-SNE stands for t-Distributed Stochastic Neighbor Embedding which consider the nearest neighbours for reducing the data.

Q37. Which of the following is/are not true about Centroid based K-Means clustering algorithm and Distribution based expectation-maximization clustering algorithm: 1. Both starts with random initializations 2. Both are iterative algorithms 3. Both have strong assumptions that the data points must fulfill 4. Both are sensitive to outliers 5. Expectation maximization algorithm is a special case of K-Means 6. Both requires prior knowledge of the no. of desired clusters 7. The results produced by both are non-reproducible. Options: A. 1 only B. 5 only C. 1 and 3 D. 6 and 7 E. 4, 6 and 7 F. None of the above

Solution: (B) All of the above statements are true except the 5th as instead K-Means is a special case of EM algorithm in which only the centroids of the cluster distributions are calculated at each iteration.

Q5. What is the minimum no. of variables/ features required to perform clustering? A. 0 B. 1 C. 2 D. 3

Solution: (B) At least a single variable is required to perform clustering analysis. Clustering analysis with a single variable can be visualized with the help of a histogram.

11) Which of the following statement is true for a t-SNE cost function? A. It is asymmetric in nature. B. It is symmetric in nature. C. It is same as the cost function for SNE.

Solution: (B) Cost function of SNE is asymmetric in nature. Which makes it difficult to converge using gradient decent. A symmetric cost function is one of the major differences between SNE and t-SNE.

B) Web site developers can increase Web site search rankings.

34) Search engine optimization (SEO) is a means by which A) Web site developers can negotiate better deals for paid ads. B) Web site developers can increase Web site search rankings. C) Web site developers index their Web sites for search engines. D) Web site developers optimize the artistic features of their Web sites.

C) off-site and on-site Web analytics

35) What are the two main types of Web analytics? A) old-school and new-school Web analytics B) Bing and Google Web analytics C) off-site and on-site Web analytics D) data-based and subjective Web analytics

C) Web site visitors download few of your offered PDFs and videos.

36) Web site usability may be rated poor if A) the average number of page views on your Web site is large. B) the time spent on your Web site is long. C) Web site visitors download few of your offered PDFs and videos. D) users fail to click on all pages equally.

D) how well visitors understand your products.

37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A) the hardware your Web site is running on. B) the type of Web browser being used by your Web site visitors. C) most of your Web site visitors' wants and needs. D) how well visitors understand your products.

B) Visitors who begin a purchase on most Web sites must complete it.

38) Which of the following statements about Web site conversion statistics is FALSE? A) Web site visitors can be classed as either new or returning. B) Visitors who begin a purchase on most Web sites must complete it. C) The conversion rate is the number of people who take action divided by the number of visitors. D) Analyzing exit rates can tell you why visitors left your Web site.

C) They have different costs to own and operate.

39) What is one major way in which Web-based social media differs from traditional publishing media? A) Most Web-based media are operated by the government and large firms. B) They use different languages of publication. C) They have different costs to own and operate. D) Web-based media have a narrower range of quality.

C) It examines the content of online conversations.

40) What does advanced analytics for social media do? A) It helps identify your followers. B) It identifies links between groups. C) It examines the content of online conversations. D) It identifies the biggest sources of influence online.

19) In which of the following case LDA will fail? A. If the discriminatory information is not in the mean but in the variance of the data B. If the discriminatory information is in the mean but not in the variance of the data C. If the discriminatory information is in the mean and variance of the data D. None of these

Solution: (A) Option A is correct

Q10. Which of the following algorithm is most sensitive to outliers? A. K-means clustering algorithm B. K-medians clustering algorithm C. K-modes clustering algorithm D. K-medoids clustering algorithm

Solution: (A) Out of all the options, K-Means clustering algorithm is most sensitive to outliers as it uses the mean of cluster data points to find the cluster center.

Q30. Which of the following method is used for finding optimal of cluster in K-Mean algorithm? A. Elbow method B. Manhattan method C. Ecludian mehthod D. All of the above E. None of these

Solution: (A) Out of the given options, only elbow method is used for finding the optimal number of clusters. The elbow method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data.

5) [ True or False ] Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model. A. TRUE B. FALSE

Solution: (A) Reducing the dimension of data will take less time to train a model.

Q4. Which of the following is the most appropriate strategy for data cleaning before performing clustering analysis, given less than desirable number of data points: 1. Capping and flouring of variables 2. Removal of outliers Options: A. 1 only B. 2 only C. 1 and 2 D. None of the above

Solution: (A) Removal of outliers is not recommended if the data points are few in number. In this scenario, capping and flouring of variables is the most appropriate strategy.

7) [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE

Solution: (A) Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal components and then visualize the data using scatter plot.

Q39. Which of the following are the high and low bounds for the existence of F-Score? A. [0,1] B. (0,1) C. [-1,1] D. None of the above

Solution: (A) The lowest and highest possible values of F score are 0 and 1 with 1 representing that every data point is assigned to the correct cluster and 0 representing that the precession and/ or recall of the clustering analysis are both 0. In clustering analysis, high value of F score is desired.

Q20 Given, six points with the following attributes: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114548/q20.jpg Which of the following clustering representations and dendrogram depicts the use of MAX or Complete link proximity function in hierarchical clustering: A. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114557/q20_a.png B. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114607/q20_b.png C. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114616/q20_c.png D. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114626/q20_d.png

Solution: (B) For the single link or MAX version of hierarchical clustering, the proximity of two clusters is defined to be the maximum of the distance between any two points in the different clusters. Similarly, here points 3 and 6 are merged first. However, {3, 6} is merged with {4}, instead of {2, 5}. This is because the dist({3, 6}, {4}) = max(dist(3, 4), dist(6, 4)) = max(0.1513, 0.2216) = 0.2216, which is smaller than dist({3, 6}, {2, 5}) = max(dist(3, 2), dist(6, 2), dist(3, 5), dist(6, 5)) = max(0.1483, 0.2540, 0.2843, 0.3921) = 0.3921 and dist({3, 6}, {1}) = max(dist(3, 1), dist(6, 1)) = max(0.2218, 0.2347) = 0.2347.

9) Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct? A. Higher 'k' means more regularization B. Higher 'k' means less regularization C. Can't Say

Solution: (B) Higher k would lead to less smoothening as we would be able to preserve more characteristics in data, hence less regularization.

Question Context 26 The below snapshot shows the scatter plot of two features (X1 and X2) with the class information (Red, Blue). You can also see the direction of PCA and LDA. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/03/06064252/Image_cont_26.jpg 26) Which of the following method would result into better class prediction? A. Building a classification algorithm with PCA (A principal component in direction of PCA) B. Building a classification algorithm with LDA C. Can't say D. None of these

Solution: (B) If our goal is to classify these points, PCA projection does only more harm than good—the majority of blue and red points would land overlapped on the first principal component.hence PCA would confuse the classifier.

Q6. For two runs of K-Mean clustering is it expected to get same clustering results? A. Yes B. No

Solution: (B) K-Means clustering algorithm instead converses on local minima which might also correspond to the global minima in some cases but not always. Therefore, it's advised to run the K-Means algorithm multiple times before drawing inferences about the clusters. However, note that it's possible to receive same clustering results from K-means by setting the same seed value for each run. But that is done by simply making the algorithm choose the set of same random no. for each run.

24) Imagine, you are given the following scatterplot between height and weight. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/03/06064637/Image_24.jpg Select the angle which will capture maximum variability along a single axis? A. ~ 0 degree B. ~ 45 degree C. ~ 60 degree D. ~ 90 degree

Solution: (B) Option B has largest possible variance in data.

Q14. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed? https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05113621/Image2.jpg A. 1 B. 2 C. 3 D. 4

Solution: (B) Since the number of vertical lines intersecting the red horizontal line at y=2 in the dendrogram are 2, therefore, two clusters will be formed.

Q15. What is the most appropriate no. of clusters for the data points represented by the following dendrogram: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05113858/image3.png A. 2 B. 4 C. 6 D. 8

Solution: (B) The decision of the no. of clusters that can best depict different groups can be chosen by observing the dendrogram. The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cut by a horizontal line that can transverse the maximum distance vertically without intersecting a cluster. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05113935/q15sol.png In the above example, the best choice of no. of clusters will be 4 as the red horizontal line in the dendrogram below covers maximum vertical distance AB.

40) What are the optimum number of principle components in the below figure ? https://cdn.analyticsvidhya.com/wp-content/uploads/2017/03/06061851/Image_40.jpg A. 7 B. 30 C. 40 D. Can't Say

Solution: (B) We can see in the above figure that the number of components = 30 is giving highest variance with lowest number of components. Hence option 'B' is the right answer.

21) What will happen when eigenvalues are roughly equal? A. PCA will perform outstandingly B. PCA will perform badly C. Can't Say D.None of above

Solution: (B) When all eigen vectors are same in such case you won't be able to select the principal components because in that case all principal components are equal.

Q33. What should be the best choice for number of clusters based on the following results: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114826/q33.png A. 5 B. 6 C. 14 D. Greater than 14

Solution: (B) Based on the above results, the best choice of number of clusters using elbow method is 6.

Q24. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis? A. Imputation with mean B. Nearest Neighbor assignment C. Imputation with Expectation Maximization algorithm D. All of the above

Solution: (C) All of the mentioned techniques are valid for treating missing values before clustering analysis but only imputation with EM algorithm is iterative in its functioning.

Q25. K-Mean algorithm has some limitations. One of the limitation it has is, it makes hard assignments(A point either completely belongs to a cluster or not belongs at all) of points to clusters. Note: Soft assignment can be consider as the probability of being assigned to each cluster: say K = 3 and for some point xn, p1 = 0.7, p2 = 0.2, p3 = 0.1) Which of the following algorithm(s) allows soft assignments? 1. Gaussian mixture models 2. Fuzzy K-means Options: A. 1 only B. 2 only C. 1 and 2 D. None of these

Solution: (C) Both, Gaussian mixture models and Fuzzy K-means allows soft assignments.

Q32. Which of the following can be applied to get good results for K-means algorithm corresponding to global minima? Try to run algorithm for different centroid initialization Adjust number of iterations Find out the optimal number of clusters Options: A. 2 and 3 B. 1 and 3 C. 1 and 2 D. All of above

Solution: (D) All of these are standard practices that are used in order to obtain good clustering results.

Q31. What is true about K-Mean Clustering? 1. K-means is extremely sensitive to cluster center initializations 2. Bad initialization can lead to Poor convergence speed 3. Bad initialization can lead to bad overall clustering Options: A. 1 and 3 B. 1 and 2 C. 2 and 3 D. 1, 2 and 3

Solution: (D) All three of the given statements are true. K-means is extremely sensitive to cluster center initialization. Also, bad initialization can lead to Poor convergence speed as well as bad overall clustering.

Q38. Which of the following is/are not true about DBSCAN clustering algorithm: 1. For data points to be in a cluster, they must be in a distance threshold to a core point 2. It has strong assumptions for the distribution of data points in dataspace 3. It has substantially high time complexity of order O(n3) 4. It does not require prior knowledge of the no. of desired clusters 5. It is robust to outliers Options: A. 1 only B. 2 only C. 4 only D. 2 and 3 E. 1 and 5 F. 1, 3 and 5

Solution: (D) DBSCAN can form a cluster of any arbitrary shape and does not have strong assumptions for the distribution of data points in the dataspace. DBSCAN has a low time complexity of order O(n log n) only.

Q40. Following are the results observed for clustering 6000 data points into 3 clusters: A, B and C: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114846/q35.png What is the F1-Score with respect to cluster B? A. 3 B. 4 C. 5 D. 6

Solution: (D) Here, True Positive, TP = 1200 True Negative, TN = 600 + 1600 = 2200 False Positive, FP = 1000 + 200 = 1200 False Negative, FN = 400 + 400 = 800 Therefore, Precision = TP / (TP + FP) = 0.5 Recall = TP / (TP + FN) = 0.6 Hence, F1 = 2 * (Precision * Recall)/ (Precision + recall) = 0.54 ~ 0.5

Q16. In which of the following cases will K-Means clustering fail to give good results? 1. Data points with outliers 2. Data points with different densities 3. Data points with round shapes 4. Data points with non-convex shapes Options: A. 1 and 2 B. 2 and 3 C. 2 and 4 D. 1, 2 and 4 E. 1, 2, 3 and 4

Solution: (D) K-Means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points across the data space is different and the data points follow non-convex shapes. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114142/q16sol.png

14) Which of the following statement is correct for t-SNE and PCA? A. t-SNE is linear whereas PCA is non-linear B. t-SNE and PCA both are linear C. t-SNE and PCA both are nonlinear D. t-SNE is nonlinear whereas PCA is linear

Solution: (D) Option D is correct. Read the explanation from this link. https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/

Q9. Which of the following clustering algorithms suffers from the problem of convergence at local optima? K- Means clustering algorithm Agglomerative clustering algorithm Expectation-Maximization clustering algorithm Diverse clustering algorithm Options: A. 1 only B. 2 and 3 C. 2 and 4 D. 1 and 3 E. 1,2 and 4 F. All of the above

Solution: (D) Out of the options given, only K-Means clustering algorithm and EM clustering algorithm has the drawback of converging at local minima.

25) Which of the following option(s) is / are true? You need to initialize parameters in PCA You don't need to initialize parameters in PCA PCA can be trapped into local minima problem PCA can't be trapped into local minima problem A. 1 and 3 B. 1 and 4 C. 2 and 3 D. 2 and 4

Solution: (D) PCA is a deterministic algorithm which doesn't have parameters to initialize and it doesn't have local minima problem like most of the machine learning algorithms has.

Q22. Given, six points with the following attributes: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114723/q22.jpg Which of the following clustering representations and dendrogram depicts the use of Ward's method proximity function in hierarchical clustering: A. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114741/q22_a.png B. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114750/q22_b.png C. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114759/q22_c.png D.https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114808/q22_d.png

Solution: (D) Ward method is a centroid method. Centroid method calculates the proximity between two clusters by calculating the distance between the centroids of clusters. For Ward's method, the proximity between two clusters is defined as the increase in the squared error that results when two clusters are merged. The results of applying Ward's method to the sample data set of six points. The resulting clustering is somewhat different from those produced by MIN, MAX, and group average.

23) What happens when you get features in lower dimensions using PCA? The features will still have interpretability The features will lose interpretability The features must carry all information present in data The features may not carry all information present in data A. 1 and 3 B. 1 and 4 C. 2 and 3 D. 2 and 4

Solution: (D) When you get the features in lower dimensions then you will lose some information of data most of the times and you won't be able to interpret the lower dimension data.

20) Which of the following comparison(s) are true about PCA and LDA? Both LDA and PCA are linear transformation techniques LDA is supervised whereas PCA is unsupervised PCA maximize the variance of the data, whereas LDA maximize the separation between different classes, A. 1 and 2 B. 2 and 3 C. 1 and 3 D. Only 3 E. 1, 2 and 3

Solution: (E) All of the options are correct

Q13. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset? A. Proximity function used B. of data points used C. of variables used D. B and c only E. All of the above

Solution: (E) Change in either of Proximity function, no. of data points or no. of variables will lead to different clustering results and hence different dendrograms.

Q1. Movie Recommendation systems are an example of: 1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression Options: B. A. 2 Only C. 1 and 2 D. 1 and 3 E. 2 and 3 F. 1, 2 and 3 H. 1, 2, 3 and 4

Solution: (E) Generally, movie recommendation systems cluster the users in a finite number of similar groups based on their previous activities and profile. Then, at a fundamental level, people in the same cluster are made similar recommendations. In some scenarios, this can also be approached as a classification problem for assigning the most appropriate movie class to the user of a specific group of users. Also, a movie recommendation system can be viewed as a reinforcement learning problem where it learns by its previous recommendations and improves the future recommendations.

Q2. Sentiment Analysis is an example of: 1. Regression 2. Classification 3. Clustering 4. Reinforcement Learning Options: A. 1 Only B. 1 and 2 C. 1 and 3 D. 1, 2 and 3 E. 1, 2 and 4 F. 1, 2, 3 and 4

Solution: (E) Sentiment analysis at the fundamental level is the task of classifying the sentiments represented in an image, text or speech into a set of defined sentiment classes like happy, sad, excited, positive, negative, etc. It can also be viewed as a regression problem for assigning a sentiment score of say 1 to 10 for a corresponding image, text or speech. Another way of looking at sentiment analysis is to consider it using a reinforcement learning perspective where the algorithm constantly learns from the accuracy of past sentiment analysis performed to improve the future performance.

8) The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? PCA is an unsupervised method It searches for the directions that data have the largest variance Maximum number of principal components <= number of features All principal components are orthogonal to each other A. 1 and 2 B. 1 and 3 C. 2 and 3 D. 1, 2 and 3 E. 1,2 and 4 F. All of the above

Solution: (F) All options are self explanatory.

Q12. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning): 1. Creating different models for different cluster groups. 2. Creating an input feature for cluster ids as an ordinal variable. 3. Creating an input feature for cluster centroids as a continuous variable. 4. Creating an input feature for cluster size as a continuous variable. Options: A. 1 only B. 1 and 2 C. 1 and 4 D. 3 only E. 2 and 4 F. All of the above

Solution: (F) Creating an input feature for cluster ids as ordinal variable or creating an input feature for cluster centroids as a continuous variable might not convey any relevant information to the regression model for multidimensional data. But for clustering in a single dimension, all of the given methods are expected to convey meaningful information to the regression model. For example, to cluster people in two groups based on their hair length, storing clustering ID as ordinal variable and cluster centroids as continuous variables will convey meaningful information.

Q29. Feature scaling is an important step before applying K-Mean algorithm. What is reason behind this? A. In distance calculation it will give the same weights for all features B. You always get the same clusters. If you use or don't use feature scaling C. In Manhattan distance it is an important step but in Euclidian it is not D. None of these

Solution; (A) Feature scaling ensures that all the features get same weight in the clustering analysis. Consider a scenario of clustering people based on their weights (in KG) with range 55-110 and height (in inches) with range 5.6 to 6.4. In this case, the clusters produced without scaling can be very misleading as the range of weight is much higher than that of height. Therefore, its necessary to bring them to same scale so that they have equal weightage on the clustering result.

D) parsing the documents.

33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A) preprocessing the documents. B) document analysis. C) creating the term-by-document matrix. D) parsing the documents.

A) categorizing a block of text in a sentence.

22) In text mining, tokenizing is the process of A) categorizing a block of text in a sentence. B) reducing multiple words to their base or root. C) transforming the term-by-document matrix to a manageable size. D) creating new branches or stems of recorded paragraphs.

A) dividing up a text into individual words in English.

23) All of the following are challenges associated with natural language processing EXCEPT A) dividing up a text into individual words in English. B) understanding the context in which something is said. C) distinguishing between words that have more than one meaning. D) recognizing typographical or grammatical errors in texts

D) all of these

24) Natural language processing (NLP) is associated with which of the following areas? A) text mining B) artificial intelligence C) computational linguistics D) all of these

B) The customer service I got for my TV was laughable.

26) In sentiment analysis, which of the following is an implicit opinion? A) The hotel we stayed in was terrible. B) The customer service I got for my TV was laughable. C) The cruise we went on last summer was a disaster. D) Our new mayor is great for the city.

A) They examine customer sentiment at the aggregate level.

28) What do voice of the market (VOM) applications of sentiment analysis do? A) They examine customer sentiment at the aggregate level. B) They examine employee sentiment in the organization. C) They examine the stock market for trends. D) They examine the "market of ideas" in politics.

C) use an English lexicon appropriate to the project at your discretion.

29) Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to A) use only the single, approved English lexicon. B) use any general English lexicon. C) use an English lexicon appropriate to the project at your discretion. D) create an English lexicon for the project.

A) a catalog of words, their synonyms, and their meanings

30) In text analysis, what is a lexicon? A) a catalog of words, their synonyms, and their meanings B) a catalog of customers, their words, and phrases C) a catalog of letters, words, phrases, and sentences D) a catalog of customers, products, words, and phrases

B) small- to medium-sized documents

31) What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? A) medium- to large-sized documents B) small- to medium-sized documents C) large-sized documents D) collections of documents

B) analyzing the unstructured content of Web pages

32) What does Web content mining involve? A) analyzing the universal resource locator in Web pages B) analyzing the unstructured content of Web pages C) analyzing the pattern of visits to a Web site D) analyzing the PageRank and other metadata of a Web page

Q27. Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0)} C3: {(5,5), (9,9)} What will be the Manhattan distance for observation (9, 9) from cluster centroid C1. In second iteration. A. 10 B. 5*sqrt(2) C. 13*sqrt(2) D. None of these

Solution: (A) Manhattan distance between centroid C1 i.e. (4, 4) and (9, 9) = (9-4) + (9-4) = 10

16) What is of the following statement is true about t-SNE in comparison to PCA? A. When the data is huge (in size), t-SNE may fail to produce better results. B. T-NSE always produces better result regardless of the size of the data C. PCA always performs better than t-SNE for smaller size data. D. None of these

Solution: (A) Option A is correct

1) Imagine, you have 1000 input features and 1 target feature in a machine learning problem. You have to select 100 most important features based on the relationship between input features and the target features. Do you think, this is an example of dimensionality reduction? A. Yes B. No

Solution: (A)

Q18. Which of the following are true? 1. Clustering analysis is negatively affected by multicollinearity of features 2. Clustering analysis is negatively affected by heteroscedasticity Options: A. 1 only B. 2 only C. 1 and 2 D. None of them

Solution: (A) Clustering analysis is not negatively affected by heteroscedasticity but the results are negatively impacted by multicollinearity of features/ variables used in clustering as the correlated feature/ variable will carry extra weight on the distance calculation than desired.

Q3. Can decision trees be used for performing clustering? A. True B. False

Solution: (A) Decision trees can also be used to for clusters in the data but clustering often generates natural clusters and is not dependent on any objective function.

Q26. Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0)} C3: {(5,5), (9,9)} What will be the cluster centroids if you want to proceed for second iteration? A. C1: (4,4), C2: (2,2), C3: (7,7) B. C1: (6,6), C2: (4,4), C3: (9,9) C. C1: (2,2), C2: (0,0), C3: (5,5) D. None of these

Solution: (A) Finding centroid for data points in cluster C1 = ((2+4+6)/3, (2+4+6)/3) = (4, 4) Finding centroid for data points in cluster C2 = ((0+4)/2, (4+0)/2) = (2, 2) Finding centroid for data points in cluster C3 = ((5+9)/2, (5+9)/2) = (7, 7) Hence, C1: (4,4), C2: (2,2), C3: (7,7)

Q19. Given, six points with the following attributes: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114502/Q19qu.jpg Which of the following clustering representations and dendrogram depicts the use of MIN or Single link proximity function in hierarchical clustering: A. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114511/q19_a.png B. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114521/q19_b.png C. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114530/q19_c.png D. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114539/q19_d.png

Solution: (A) For the single link or MIN version of hierarchical clustering, the proximity of two clusters is defined to be the minimum of the distance between any two points in the different clusters. For instance, from the table, we see that the distance between points 3 and 6 is 0.11, and that is the height at which they are joined into one cluster in the dendrogram. As another example, the distance between clusters {3, 6} and {2, 5} is given by dist({3, 6}, {2, 5}) = min(dist(3, 2), dist(6, 2), dist(3, 5), dist(6, 5)) = min(0.1483, 0.2540, 0.2843, 0.3921) = 0.1483.

4) Which of the following techniques would perform better for reducing dimensions of a data set? A. Removing columns which have too many missing values B. Removing columns which have high variance in data C. Removing columns with dissimilar data trends D. None of these

Solution: (A) If a columns have too many missing values, (say 99%) then we can remove such columns.

Q28. If two variables V1 and V2, are used for clustering. Which of the following are true for K means clustering with k =3? 1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line 2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line Options: A. 1 only B. 2 only C. 1 and 2 D. None of the above

Solution: (A) If the correlation between the variables V1 and V2 is 1, then all the data points will be in a straight line. Hence, all the three cluster centroids will form a straight line as well.

2) [ True or False ] It is not necessary to have a target variable for applying dimensionality reduction algorithms. A. TRUE B. FALSE

Solution: (A) LDA is an example of supervised dimensionality reduction algorithm.

Q21 Given, six points with the following attributes: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114636/q21.jpg Which of the following clustering representations and dendrogram depicts the use of Group average proximity function in hierarchical clustering: A. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114646/q21_a.png B. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114655/q21_b.png C. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114704/q21_c.png D. https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114713/q21_d.png

Solution: (C) For the group average version of hierarchical clustering, the proximity of two clusters is defined to be the average of the pairwise proximities between all pairs of points in the different clusters. This is an intermediate approach between MIN and MAX. This is expressed by the following equation: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114732/q22.png Here, the distance between some clusters. dist({3, 6, 4}, {1}) = (0.2218 + 0.3688 + 0.2347)/(3 ∗ 1) = 0.2751. dist({2, 5}, {1}) = (0.2357 + 0.3421)/(2 ∗ 1) = 0.2889. dist({3, 6, 4}, {2, 5}) = (0.1483 + 0.2843 + 0.2540 + 0.3921 + 0.2042 + 0.2932)/(6∗1) = 0.2637. Because dist({3, 6, 4}, {2, 5}) is smaller than dist({3, 6, 4}, {1}) and dist({2, 5}, {1}), these two clusters are merged at the fourth stage

Q34. What should be the best choice for number of clusters based on the following results: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114837/q34.jpg A. 2 B. 4 C. 6 D. 8

Solution: (C) Generally, a higher average silhouette coefficient indicates better clustering quality. In this plot, the optimal clustering number of grid cells in the study area should be 2, at which the value of the average silhouette coefficient is highest. However, the SSE of this clustering solution (k = 2) is too large. At k = 6, the SSE is much lower. In addition, the value of the average silhouette coefficient at k = 6 is also very high, which is just lower than k = 2. Thus, the best choice is k = 6.

Q36. If you are using Multinomial mixture models with the expectation-maximization algorithm for clustering a set of data points into two clusters, which of the assumptions are important: A. All the data points follow two Gaussian distribution B. All the data points follow n Gaussian distribution (n >2) C. All the data points follow two multinomial distribution D. All the data points follow n multinomial distribution (n >2)

Solution: (C) In EM algorithm for clustering its essential to choose the same no. of clusters to classify the data points into as the no. of different distributions they are expected to be generated from and also the distributions must be of the same type.

22) PCA works better if there is? A linear structure in the data If the data lies on a curved surface and not on a flat surface If variables are scaled in the same unit A. 1 and 2 B. 2 and 3 C. 1 and 3 D. 1 ,2 and 3

Solution: (C) Option C is correct

27) Which of the following options are correct, when you are applying PCA on a image dataset? It can be used to effectively detect deformable objects. It is invariant to affine transforms. It can be used for lossy image compression. It is not invariant to shadows. A. 1 and 2 B. 2 and 3 C. 3 and 4 D. 1 and 4

Solution: (C) Option C is correct

Q23. What should be the best choice of no. of clusters based on the following results: https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05114817/q23.png A. 1 B. 2 C. 3 D. 4

Solution: (C) The silhouette coefficient is a measure of how similar an object is to its own cluster compared to other clusters. Number of clusters for which silhouette coefficient is highest represents the best choice of the number of clusters.

10) In which of the following scenarios is t-SNE better to use than PCA for dimensionality reduction while working on a local machine with minimal computational power? A. Dataset with 1 Million entries and 300 features B. Dataset with 100000 entries and 310 features C. Dataset with 10,000 entries and 8 features D. Dataset with 10,000 entries and 200 features

Solution: (C) t-SNE has quadratic time and space complexity. Thus it is a very heavy algorithm in terms of system resource utilization.

Q11. After performing K-Means Clustering analysis on a dataset, you observed the following dendrogram. Which of the following conclusion can be drawn from the dendrogram? https://cdn.analyticsvidhya.com/wp-content/uploads/2017/02/05113106/Image1.png A. There were 28 data points in clustering analysis B. The best no. of clusters for the analyzed data points is 4 C. The proximity function used is Average-link clustering D. The above dendrogram interpretation is not possible for K-Means clustering analysis

Solution: (D) A dendrogram is not possible for K-Means clustering analysis. However, one can create a cluster gram based on K-Means clustering analysis.

Q8. Which of the following can act as possible termination conditions in K-Means? For a fixed number of iterations. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum. Centroids do not change between successive iterations. Terminate when RSS falls below a threshold. Options: A. 1, 3 and 4 B. 1, 2 and 3 C. 1, 2 and 4 D. All of the above

Solution: (D) All four conditions can be used as possible termination condition in K-Means clustering: 1. This condition limits the runtime of the clustering algorithm, but in some cases the quality of the clustering will be poor because of an insufficient number of iterations. 2. Except for cases with a bad local minimum, this produces a good clustering, but runtimes may be unacceptably long. 4. This also ensures that the algorithm has converged at the minima. 3. Terminate when RSS falls below a threshold. This criterion ensures that the clustering is of a desired quality after termination. Practically, it's a good practice to combine it with a bound on the number of iterations to guarantee termination.

6) Which of the following algorithms cannot be used for reducing the dimensionality of data? A. t-SNE B. PCA C. LDA False D. None of these

Solution: (D) All of the algorithms are the example of dimensionality reduction algorithm.

15) In t-SNE algorithm, which of the following hyper parameters can be tuned? A. Number of dimensions B. Smooth measure of effective number of neighbours C. Maximum number of iterations D. All of the above

Solution: (D) All of the hyper-parameters in the option can tuned.

Q17. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1. Single-link 2. Complete-link 3. Average-link Options: A. 1 and 2 B. 1 and 3 C. 2 and 3 D. 1, 2 and 3

Solution: (D) All of the three methods i.e. single link, complete link and average link can be used for finding dissimilarity between two clusters in hierarchical clustering.


Ensembles d'études connexes

Chapter 6 - Group Life Insurance

View Set

CH. 34 PEDIATRIC EMERGENCIES (NEW)

View Set

Chapter 16 book questions + answers

View Set

Computer Science Lesson 1-7 Questions

View Set

Formulas For Area and Perimeter of Shapes

View Set