INF 141 / CS 121 Information Retrieval Quiz 1 S17
Which of the following statements is *true* with regards to Stemming and Lemmatization? - Both try to reduce morphological word variants to a single version - Lemmatization is easier to perform and does not require morphological analysis - Lemmatization reduces morphological variations of words to a common stem - Stemming removes inflection to arrive at base dictionary form of the word.
Both try to reduce morphological word variants to a single version
Which of the following black hat techniques focuses on web re-directing (HTTP 302) to a different site? - Keyword Stuffing - Cloaking - Doorway Pages - Clicker bots
Doorway Pages
Which of the following web page features is not a legitimate approach to perform search engine optimization (SEO)? - Age - Having relevant content, well written and organized. - Imlementing web standards and good practices - Duplicating important content to make sure users find it.
Duplicating important content to make sure users find it.
Which of the following sentences is *true* regarding web search engines? - Recall refers to the relevance of the first few results - Precision refers to the number of relevant results that are presented - In general web IR, precision is more important than recall - Web search engines consider the web as a graph, where hype
In general web IR, precision is more important than recall
Which of the following is *not* a mapping rule for tokens Canonicalization? - Removing characters such as hyphen, periods and accents. - Reducing all letters to lower case (case-folding) - Collapsing alternate spellings (colour -> color) - Keeping synonyms as different classes to include more diverse tokens
Keeping synonyms as different classes to include more diverse tokens
Q10 -Which of the following web crawling issues is related to the client-side scripting? - Impolite crawler, which hits the same web too often. - Crawler traps - Data noise - Missed content
Missed content
Which of the following is *not* a characteristic of the world-wide web (www)? - Large and dynamic corpus - There is information to avoid (spam, misleading and false information) - Mostly stable, long-living content - High linkage
Mostly stable, long-living content
Which of the following is *not* a property of popular web search engines? - Including spell check - Linking to resources (maps, images, etc) by guessing what the user is looking for - Suggesting alternatives searches - Not interpreting syntactic cues, such as math equations
Not interpreting syntactic cues, such as math equations
Consider the following sentence "I dance the Macarena song on the dance floor." Which of the following statements is *false*? - Using assignment #1 methodology, the sentence contents 7 tokens - The sentence has 8 bigrams (i.e., 8 2-grams) - Dance (verb) and dance (adjective) are considered the same token - The sentence has 9 words. Therefore, we can find 9^3 trigrams (i.e., 729 3-grams)
The sentence has 9 words. Therefore, we can find 9^3 trigrams (i.e., 729 3-grams)
Which of the following is *not* a characteristic of Information Retrieval (IR)? - Classic IR was originated in the scientific domain and library records. - Finding material of unstructured nature - Satisfies an information need from within large collections - Web IR mainly focuses on static content
Web IR mainly focuses on static content