NLP

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

POS Tagging

Assigning the correct POS to each word in a corpus: tokenization, POS tagging

N-grams Brown Corpus

Brown Corpus - famous, 1 million words

Use and Tool

Closed class word - can be enumerated, used frequently, stop list word

Word classes

Nouns, verbs

Tokenizer

Recognizes words, root words Playing -> play

Syntatic ambiguity

Same sequence with different meanings like flying planes can be dangerous Impact of meaning on structure like I washed the shirt with soap (shirt and soap go together)

N-Grams

Sentences - sequence of words, rules apply how words can be ordered.

N-grams Continued 2

Simplest, any word can follow any other word Frequency based: how often does a word occur

Morphology

Single words: made up of morphemes: stems + affixes Example test -> tests, testing

N-grams In Information Systems

Spell check, text analysis in buisness

Assigning structure to text

Step 1: POS Tagging : rule based, stochastic, and transformation-based learning Step 2: CFG - rules on how to combine chunks

Words NLP

Text = sequence of characters, recognize the words in the sequence

Morphology

Two classes of morphemes: stem = main meaning "cat" Affix = additional meaning "cats" Parsing = assignment structure

Parsing

assigning a structure to a text that fits the rules

Conjunction

closed class, and, or

Preposition

closed class, before nouns or phrase (in, by)

Pronoun

closed class, refere to person/entity: I, she, he

Syntax

combining words according to set of rules

Semantics

connecting linguistic elements to non-linguistic knowledge of the world (meaning of words)

Words lemma

describes the set of lexical forms with the same stem: cats, cat

Words tokenizer

extracts token, rules to split up sequence of characters. Basic: white space

Regular Expressions

formula to describe/specify a string

Morphology stemming

go back to base morpheme, elected -> elect

Parts of Speech

lexical tagset = POS Provides information about a word, neighbors, pronounciation

Pragmatics

meaning in context, language as a means of communication needs three elements: linguistic expression what expression refers to context Syntax 1 Semantic 1-2 Pragmatic 1-2-3

Word tokens

number of appearances of distinct words

Word types

number of distnct words

Adjective

open class word, properties and qualities (old, blue)

Adverb

open class word: modify something, (very, extremely)

Verbs

open class words (draw), Auxilaries - closed class words, tense, mood (can, to do)

Morphological parsing

rewriting words as its base components Porter Stemmer - most famous, get stems of words: relational -> relate

Morpheme

smallest meaning bearing unit

Basic structure of syntax

subject - predicate

Natural Language Processing (NLP)

Increases availability of text Technical: UI, information retrieval, extraction Domain: Business-Google Advertising, Comp Sci: voice recognition

Trigram Model

Counts for every consecutive three words, based on previous two words

Bigrams Model

Counts for every consecutive word pair, predictions based on previous word

Closed class word

Determiners: that, there, the. Article = subclass of determiner: the, a

Tools and Resources for NLP

Like using Python NLTK

Sequence of words

N-gram, useful to build language models for word prediction

Nouns

Open class words, proper nouns - name of place, common nouns - mass noun: group, count nouns: countable - pear


Kaugnay na mga set ng pag-aaral

Chapter 54 Drugs Acting on the Upper Respiratory Tract PrepU

View Set

Pro Domain 1: Device Setup and Configuration

View Set

Chapter 3: Health Education and Health Promotion

View Set

Nurs 441 Psych-Mental Health Final

View Set

Understanding of Art Mid Term Exam

View Set