WL4102 Language and IT
XML
Extensible Markup Language
Elements have attributes, information about attributes is given in
PCDATA (parsed character data)
TEI
Text Encoding Initiative (a set of elements for encoding text)
Markdown
a very simple form of markup
Tag
abbreviation denoting parts of speech
Latin script features
alphabetical, bimarcal (upper and lowercase), case sensitive, left-to-right, word separation (spaces), hyphenation, mid-baseline (descending - p, ascending - k), native digits (numerals)
open file format
anyone can read the code
CSS does what?
assembles the page (e.g. there is a search box, the header is here and blue)
GDEX System
automatically identifies examples "good enough" to use in a dictionary
CSS
cascading style sheets
UTF8
character set (charset)
Lexicon
collection of words
CLI
command file interface
machine readable
data in a format that can be processed by a computer
index.html
default file
Hypertext is made of ________
elements (e.g. <head>)
HTML
hypertext markup language (head (information about the page), body (content of the page))
ICO
icon file
Favicon
icon on tab denoting the page
Token
individual occurrence of POS
KWIC
key word in context
site map
list of directories
LATex
markup language for PDF
Corpus
organised collection of linguistic data
POS
parts of speech
Markup language
set of rules (what elements to use and how the elements behave)
Scale
the corpus must scale up according to usage (specific versus generalised), bigger scales mean more accuracy
Lemma
the dictionary form of the word
Lexicography
the making of dictionaries
Sense relations
the relations of meaning between words, as expressed in synonymy, hyponymy, and antonymy
Parse
to mark the components of a sentence
Word nets
tools to explore sense relations
Lossless
when a file is saved it keeps all the data