A web interface to the 300k-word CIG1 and 570k-word CIG2 corpora
Page language:


Format of the lexicon

(Replicated from the Child Language Databases website. The original page offers downloadable lexicon files, but those links are dead.)

A lexicon has been created which lists the word forms which the children use, together with their categories (parts of speech) and lexemes (dictionary entry form). The conventions of CHILDES are used:

word-form{[scat category]}lexeme
afal{[scat en]}"afal"
fale{[scat en]}"afal"
wedi{[scat ag]}"wedi" \
{[scat ar]}"wedi"

In the case the category of nouns, number and gender are also indicated as follows:

word-form{[scat category]}lexeme
afal{[scat en] [rhif un] [cen g]}"afal"
fale{[scat en] [scat en] [rhif ll]}"afal"

Details of the coding are given under en in the description of the Categories below.

Categories in the lexicon

The following categories and codes are used:

??multi-category form which is ambiguous in context
a1pro-form place adjuncts like there, fama (here, yonder)
abconjuncts and disjuncts like hefyd (also), felly (therefore)
adother adjuncts
agaspect markers yn (progressive), wedi (perfect)
asadverbs allan (out), ymlaen (onwards), i+ffwrdd (away), i+lawr (down), etc.
atadverbs beginning with tu - tu+allan (outside), tu+ol (behind), etc.
b4Welsh finite verb with English inflection
bdEnglish verbs in -ed, -en or equivalent e.g. crashed, drunk
beverbnoun forms (compare English plain infinitive) including auxiliaries but not bod (be)
bffinite-verb forms (including the imperative forms) except bod (be)
bgEnglish verbs in -ing
bpEnglish plain infinitive forms
cdco-ordinating conjunctions like a (and), neu (or), ond (but)
ceverbnoun (compare English plain infinitive) of bod (be)
cffinite forms of bod (be)
cmmwy (more) as a comparative particle before adjectives, as in mwy addas (more appropriate), llai oer (less cold)
cngreetings and farewells like helo (hello)
cysubordinating conjunctions like achos (because)
d1preverbal particle, tag: oni (namely) 't, 'n', yn', ynd, etc.
d2preverbal particle, positive: y1, fe1, mi1
d3preverbal particle, negative: d, t, na, na5
ebstandard exclamations like aa (ah), oo (oh)
ennouns - features on nouns are:
rhif (number) = un(igol) (singular) or ll(uosog) (plural)
cen(edl) (gender) = g(wrywaidd) (masculine) or b(enywaidd) (feminine) or gb
erthe post-modifying words arall (other) and eraill (others)
eseisiau (wants, needs) - a nominal form
f1answer word, positive: ie, ia, do(yes)
f2answer word, negative: nage, nace, naci, naddo (no)
g1nominal wh- words - beth (what), pwy (who)
g2adverbial wh- words - pryd (when), pam (why), sut (how)
g3the wh- word pa (which)
g4compounds involving wh- words like beth+bynnag (whatever), pryd+bynnag (whenever)
g5the wh- word faint (how much/many)
gagrammatically invariant answer words ie (yes), nage (no), do (yes), naddo (no)
gcthe comparative particle na (than)
gddemonstrative words dyna (there/that is), dyma (here/this is), dacw (yonder is)
ggintensifiers like rhy (too), go (fairly), mor (so)
gmquantifiers like digon (enough), llawer (much/many), mwy (more)
grpreverbal particles like mi, fe, ni and focussing particles like mai, ai
gtthe predicatival particle yn
gyparticle onid e: yntefe, tefe, and also 'de, 'te, ynte, etc.; the latter may be ynteu sometimes
llpro-form adjuncts yna (there), yma (here), acw (yonder)
lyletters of the alphabet
mowords indicating epistemic modality: efallai (perhaps), hwyrach (perhaps)
nethe negator dim (no/not) both as quantifier and adverb
ononomatopoeic-type forms
papoliteness expressions like pardon, plis, sori
pedeterminers like y (the)
piforms of piau, used to indicate ownership
qqfor obscure forms
r1personal pronouns like ti (you), fo (he/him)
r2demonstrative pronouns like hwn
r3indefinite pronouns like rhywun (someone)
r4negative pronouns like neb (no-one)
r5reflexive pronouns like fy+hun (myself)
r6reciprocal pronouns like ei+gilydd (each other)
r7conjunctive pronouns like finnau (me too)
r8prefixed (possessive) pronouns like fy (my), ei (his/her)
r9the 'alternative' pronoun llall (other), lleill (others)
rdrhaid (must, necessity)
rpuniversal pronouns like pawb (someone)
rqindefinite phrases like beth+'na (thingie), lle+'na, be+ti'+'n+galw (what do you call it)
sgstandard verbal pauses like ymm (uhm)
systandard paralinguistic forms like hy+hy (uh-uh), mm+mm (uhm-uhm)
yamanner-adverbial particle yn e.g. yn gyflym (quickly)
z1fronting particle, interrogative: efe
z2fronting particle, declarative: na2, mai, taw

Forms not in the lexicon

Nonsense wordsSuffixed with @gl in the data files.chic+chics+tics@gl
NoisesSuffixed with @sn in the data files.iii@sn
English wordsSingle English words in Welsh sentences and in isolation are included.
Strings of English words as phrases or sentences are excluded. In the data files,
they are surrounded with <...> which is followed with [% Saesneg].
Words from other languages are treated in the same way.
welish i <big christmas tree> [% Saesneg]
Words in songs etc.In the data files, they are enclosed in <...> which is followed with [% ca:n].<dau gi bach yn mynd i 'r coed> [% ca:n]
Proper namesBegin with a capital letter in the data files.
Unfinished wordsBegin with & in the data files.&ffl