English linguistics: Lexicology
Lexical words
"Content words", have meanings that are clearly related to some parts or aspects of the world around us, i.e. they have non-grammatical or lexical meaning• Main carriers of information • Very large number of members and their number is growing all the time ( e.g . birdish ) -> open classes • Have derivational morphology - Possible to form nouns , lexical verbs , adjectives, adverbs by adding suffixes ( e.g . teach-er, class-ify) • The words that remain if a sentence is compressed in a newspaper headline - Queen meets Pope • The words that are generally stressed most in speech
Grammatical words
"Function words", words which only convey grammatical meaning, represent the structural devices or categories the English language makes use of• determiners , pronouns , prepositions , conjunctions , auxiliary verbs (!)(!), numerals (!), discourse markers (oh, well , yes , right, wow, gee) • Three function words are unique and do not fit into any of these classes ... (Biber et al.) - Existential there - The negator not - The infinitive marker to • Signal grammatical function and grammatical structure - indicate meaning relationships and help us to interpret units containing lexical words by showing how the units are related to each other • Closed classes: very limited and fixed membership - Limited number of members - New ones cannot be easily added (are extended very slowly over centuries)
Problem of 'a word is a minimal free form'
"Minimal" (compounds?), "free" (articles, conjunctions...?)
Lexical sophistication
"lexical frequency profile", the proportion of 'advanced ' words in a text.rare infrequent types / lexical tokens
N-gram/cluster analysis
'Lexical bundles'(Biber et al . 1999), recurrent sequences of words' Altenberg 1998), n-grams : - sequences of contiguous words that most commonly co-occur in a register - Frequency threshold (must occur so many times); ->no free slot (clusters, bundles) -> one or more free slots (collocational frameworks) Extracted with the n-gram method, automatic extraction of sequences of 2 , 3, 4... words
'words are linguistic units characterized by internal stability and uninterruptability'
(Lyons 1968) =No rearrangement, Not interrupted by anything else, Show a high degree of cohesion
Restricted collocations
- combinations of two lexical items - which make an isolable semantic contribution - belong to different word classes and show a restricted range
Type/token ratio
-(Number of word types / number of word tokens)x100 -The higher the type/token ratio, the more lexical variation.
Lemma
-abstract (grammatical) entity which includes all the inflected forms of a word-words as vocabulary items , i.e. given separate entries in dictionaries (meaning ) ->Convention: capital letters (word forms in italics) = GO = go, goes, going, gone, went
How were recent developments in the English language brought to light in the NGSL?
378 lexical innovations Identified how? In the top 3,000 items BE06 and EnTenTen12 (lexical overlap ) but not in the BNC or LOB top 3,000 Three categories - New words (neologisms): Internet, website, online, email (not a very large category) - New meanings/functions of old words: user, via, network, client , mobile , file, web - Old word with recent prominence: medium, phone, key , technology , guy , kid , environment, computer, movie , definitely
'a word is a minimal free form'
= not possible to subdivide, can stand alone
Orthographic word/Word form
A word is any sequence of letters (and a limited number of other characteristics such as hyphen and apostrophe) bounded on either side by a space or punctuation mark (Carter 1987)
Average Reduced Frequency
Absolute frequency + its distribution in the corpus
Limitations of NGSL
Automatic word class identification (POS tagging) is not 100% accurate Primarily British English and written language Single word list; no multi word lexical items[ + polysemy!!: list of most frequent words but in which meanings ? E.g. to run : 19 meanings in Macmillan English Dictionary Online ; only some of these meanings are very common
Frequency based approach to phraseology
Broad/distributional: Much larger set of units identified on the basis of quantitative criteriaQuantitative approach : bottom up corpus driven; identification of word partnerships Co occurrence - statistically defined units e.g . MI, t score) (Rob a bank , privitize a bank) Recurrence (I don't know why , I thought that, there is a, can I have a, the fact that the) Inductive approach : wide set of units. It has opened up a 'huge area of syntagmatic prospection' (Sinclair 2004)-> 'idiom principle' (=language has many semi-preconstructed phrases)-> The phrase is primary carrier of meaning, not the wordAllows Free Combinations (unpredictable for learners, idioms are infrequent, The most frequent multi word units tend to be semantically compositional collocations, recurrent phrases), recurring compounds allowed, Grammatical collocations/colligations (association between a lexical word and its frequent grammatical environment) is a major category - hard for learners
Headword
Dictionary entry
Problem of implementation of native speakers' intuitive formulaicity judgements
FatigueLapses of concentration Defining formulaic language for native speaker judges? Linguistic expert judges, layperson native speakers?
Irreversible bi-and trinomials
Fixed sequences of 2 or 3 word forms of same POS category, linked by conjunction 'and'/'or' (bed and breakfast; left, right and centre)
To what extent does the lexical core in GSL and NGSL differ?
GSL : 4,114 lemmas NGSL: 2,497 lemmas (378 lexical innovations)NGSL 40% types/lemmas than GSLNotion of word family put less frequent words in NGSL higher in GSL-list Still, 80% overlap
Attitudinal formulae
Gambits/Phrasemes signaling attitude towards utterances and interlocutors. (in fact, I think that)
GSL
General Service List (Michael West) = a general vocabulary wordlistBasis of defining vocabulary in learner's dictionaries.2,000 word families Words + inflectional and derivational forms
Proverbs
General ideas, metaphorical, often abbreviated complete sentenecs. (When in Rome, A bird in the hand is worth two in the bush)
Simple definition of lexical variation
It is the measure of how frequently a writer/speaker makes use of one and the same word type.
Lexical density
It is the ratio between the number of lexical/content words and the total number of word tokens. Number of lexical words / tokens
How was the NGSL compiled?
Lemmas : words + inflectional forms (NOT derivational forms)Compiled on the basis of four corpora: explores the stability of a core general voc. across a range of written and spoken contexts - Lancaster Oslo Bergen Corpus (LOB) - British National Corpus (BNC) - Corpus of British English (BE06) - EnTenTen12 Pairwise comparisonsNB: Proper nouns , erroneous entries manually discarded ; spelling standardized (Br norm)
Lexical collocations
Lexemes in specific syntactic pattern, where the one is the base for its independent meaning, the second is a collocator dependent on the base. Both make semantic contribution, but don't share status. (heavy rain, closely linked).
Lexical variation by genre
NEWS > FICTION > ACADEMIC > CONVEVSATION
Traditional approach to phraseology.
Narrow: Specific subset of linguistically defined units provided useful criteria to categorise multiword units. Overemphasis on some categories of multiword units - e.g. idioms , phrasal verbs , proverbs - Most idiomatic = most 'core' (Gläser 1998) Free Combinations: watch a film / a cricket match / TV; blue/red/yellow jumper/skirt/T shirt - only governed by semantic co-occurrence restrictions Restricted Collocation: heavy rain - restricted collocability and figurative or specialised meaning of one of the elements; make a comment delexical verb) Figurative Idiom: do a U turn (=completely change your plans/ideas ++)), "blow your own trumpet" to talk a lot about your own achievements - have figurative meaning + also preserve a literal interpretation ( It's illegal to do a U turn on a motorway Pure Idiom: "blow the gaff "to reveal a plot or secret) - Semantically non compositional and fixed
What is the NGSL?
New General Service List (2013) c . 2,000 items - lemmas
Lexical density by genre
News > Academic prose > Fiction > CONVO
Commonplaces
Non-metaphoric complete sentences expressing tautologies, truisms, sayings. (enough is enough, it's a small world)
Problem of 'words are linguistic units characterized by internal stability and uninterruptability'
Phrasal verbs:to look up a word; he looked it up Idioms: some idioms do allow some degree of flexibility (a drop in the ocean/bucket)
Idioms
Phrasemes constructed around verbal nucleus. Semantically non-compositional, fixed structures. (spill the beans, le the cat out of the bag)
Advantage of NGSL
Purely quantitative criteria frequency, dispersion, stability across language corpora ) ; hence objective and replicable; ARFNB (lexical overlap in corpora, 71%), proves core vocabulary
Compilation of the GSL
Quantitative measure of word frequency - Manually collected word frequency data!] three qualitative criteria- Ease of learning (similarity of word form; but frequency?)- Principles of necessity and cover (make it possible to express all necessary ideas- Stylistic and emotional neutrality (! Learners : communicating ideas not emotions ...)
Speech act formulae
Relatively inflexible phrasemes, used for performing certain functions (greetings, compliments etc). (good morning, you're welcome)
Grammatical collocation
Restricted combination of lexical and grammatical word, "ADJ/VERB/NOUN+PREP" .(Depend on, cope with)
Problem of "word = orthographic word"
Restricted to writing, arbitrary spelling, contracted forms, compounds, phrasal verbs, idioms (+automatic word counts)
Slogans
Short directive phrases made popular by their repeated use in politics or advertising. (make love, not war)
Similes
Stereotyped comparisons. "as ADJ as (DET) NOUN" and "VERB like a NOUN". (as old as the hills, to swear like a trooper)
Simple definition of 'phraseology'
Study of structure, meaning and use of word combinations. This phenomenon of word combinations may also be called phraseology, and is thus an object of investigation. 50% (80 speech, 30 writing) of words as part of recurring combinations.
Lexicology
Study of the lexicon
Criticism of the GSL
The GSL = dated (1953; revised version of a list compiled in 1936!), e.g - servant, cart preserve food = still in general use, a necessary idea today - television and computer Inconsistencies in the selection of words - E.g. elephant is in, tiger is not... The selection principles are too subjective - e.g. Ease of learning? Necessity principle? Emotions (frequency?)? The notion of word family is problematic Based on the idea that words from the same family are transparent (e.g. develop and development But the semantic link is not always obvious (e.g. to train vs. trainers [shoes ], please and unpleasantly Users ' morphological skills should not be overrated (cf.research) •both receptively and especially productively (esp. beginners)
Psycholinguistic approach to phraseology.
Units that are stored holistically in the mind - Wray (2002): 'a sequence (...) stored and retrieved whole from memory at the time of use
Defining vocabulary
Vocabulary used to describe headwords in dictionaries - based on core vocabulary
Lexeme
What all the word forms associated with it have in common
Linguistic approach to phraseology
Word combinations that display varying degrees of syntactic/lexical and semantic unity
Co-occurrence analysis
Words that co-occur within a given span - usually four words to the left and four words to the right of the node Discontinuous combinations of 2 words Statistical measures: Mutual Information (MI) / t score / other measures e.g . logDice ) -> Co-occurences
'Back to black: what goes up can go down'
Words, like 'computer, may vary in use (and meaning) over the course of history. 'Computer' started as 'person doing calculations', to being used by scientists, to be used and owned by all, to now be on a downfall.
Semantic criterion of 'word'
a word expresses a unified semantic concept (travel agency, kick the bucket)
Morphological criterion of 'word'
a word has internal cohesion and is indivisible by other units; a word may be modified only externally by the addition of suffixes and prefixes
Phonological word
a word occurs between potential pauses in speaking (BUT... a word spoken in isolation has one and only one primary stress) (NB: other languages) ('er')
native speakers' intuitive formulaicity judgements
any sequences of two or more words that are perceived to be more constrained than usual in their co-occurence
Tokens
every occurrence of a 'word', whether it is a repetition of the 'same' one that has occurred before or not.
Traditional apporach to phraseology as a continuum:
from most transparent and variable to most opaque and fixed ( Cowie, 1998) Semantic transparency/opacity - Non compositional meaning = is more than or different from the sum of its parts Specialised meaning of one of the constituents - Figurative, delexical meaning Degree of fixedness - the degree to which it is possible to substitute an item in a combination of words with alternative synonyms or near synonyms from the same word class + syntactic variation (passivisation pluralissation,...)
Types
the different words in a text.
"The traditional approach to phraseology and the frequency-based approach to phraseology should be used conjointly"
the frequency-based approach generates raw material that needs to be filtered and categorized linguistically to suit theoretical and/or applied objectives
Macrostructure
the list of all the words (headwords ) described in the dictionary
the t-score tends to highlight collocations involving ___ words, while the MI score (MI stands for ___ ) tends to highlight collocations with ___ words.
the t-score tends to highlight collocations involving HIGH FREQUENCY words, while the MI score (MI stands for MUTUAL INFORMATION) tends to highlight collocations with LOW FREQUENCY words.
Lexicon
total stock of words in a language
Textual phraseme
typically used to structure and organize the content (i.e. referential information) of a text or any type of discourse
Referential phraseme
used to convey a content message: they refer to objects, phenomena or real life facts (lex. and gramm. collocations, idioms, similies, bi-/trinomials, phrasal verbs)
Communicative phraseme
used to express feelings or beliefs towards a propositional content or to explicitly address interlocutors, either to focus their attention, include them as discourse participants or influence them
Grammatical criterion of 'word'
words fall into particular classes - nouns , adjectives, determiners ( the )
Syntactic criterion of 'word'
• smallest unit that can be manipulated by syntax • can be independently relocated in a new position in the sentence by a syntactic rule • may be used alone as a single utterance