INF 141 / CS 121 Information Retrieval Quiz 1 S17

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Which of the following statements is *true* with regards to Stemming and Lemmatization? - Both try to reduce morphological word variants to a single version - Lemmatization is easier to perform and does not require morphological analysis - Lemmatization reduces morphological variations of words to a common stem - Stemming removes inflection to arrive at base dictionary form of the word.

Both try to reduce morphological word variants to a single version

Which of the following black hat techniques focuses on web re-directing (HTTP 302) to a different site? - Keyword Stuffing - Cloaking - Doorway Pages - Clicker bots

Doorway Pages

Which of the following web page features is not a legitimate approach to perform search engine optimization (SEO)? - Age - Having relevant content, well written and organized. - Imlementing web standards and good practices - Duplicating important content to make sure users find it.

Duplicating important content to make sure users find it.

Which of the following sentences is *true* regarding web search engines? - Recall refers to the relevance of the first few results - Precision refers to the number of relevant results that are presented - In general web IR, precision is more important than recall - Web search engines consider the web as a graph, where hype

In general web IR, precision is more important than recall

Which of the following is *not* a mapping rule for tokens Canonicalization? - Removing characters such as hyphen, periods and accents. - Reducing all letters to lower case (case-folding) - Collapsing alternate spellings (colour -> color) - Keeping synonyms as different classes to include more diverse tokens

Keeping synonyms as different classes to include more diverse tokens

Q10 -Which of the following web crawling issues is related to the client-side scripting? - Impolite crawler, which hits the same web too often. - Crawler traps - Data noise - Missed content

Missed content

Which of the following is *not* a characteristic of the world-wide web (www)? - Large and dynamic corpus - There is information to avoid (spam, misleading and false information) - Mostly stable, long-living content - High linkage

Mostly stable, long-living content

Which of the following is *not* a property of popular web search engines? - Including spell check - Linking to resources (maps, images, etc) by guessing what the user is looking for - Suggesting alternatives searches - Not interpreting syntactic cues, such as math equations

Not interpreting syntactic cues, such as math equations

Consider the following sentence "I dance the Macarena song on the dance floor." Which of the following statements is *false*? - Using assignment #1 methodology, the sentence contents 7 tokens - The sentence has 8 bigrams (i.e., 8 2-grams) - Dance (verb) and dance (adjective) are considered the same token - The sentence has 9 words. Therefore, we can find 9^3 trigrams (i.e., 729 3-grams)

The sentence has 9 words. Therefore, we can find 9^3 trigrams (i.e., 729 3-grams)

Which of the following is *not* a characteristic of Information Retrieval (IR)? - Classic IR was originated in the scientific domain and library records. - Finding material of unstructured nature - Satisfies an information need from within large collections - Web IR mainly focuses on static content

Web IR mainly focuses on static content


Kaugnay na mga set ng pag-aaral

Funds Exam 3 Ch 36 EAQs and practice questions

View Set

apes final semester 1, apes final semester 2

View Set

ANT 101 Study Guide - Test 2 Review (Final Exam)

View Set

RN Concept-Based Assessment Level 2 Online Practice B

View Set

APGov Semester 1 Exam Questions for Final

View Set

Hun 3- Chapter 3: Digestion, Absorption, and Transport

View Set

EXAM 2- Chapter 12 Quiz econ micro

View Set

Management of Discomfort, intrapartum: normal labor and birth

View Set