A web interface to the 300k-word CIG1 and 570k-word CIG2 corpora
Page language:


CIG2 documentation

(Replicated from the Child Language Databases website.)

CIG 2 is a database of the Welsh of children between the ages of three and seven years of age. The database was created by a project which was funded by the Economic and Social Research Council, and which was located in the Department of Education, Univerity of Wales Aberystwyth.


Two projects are responsible for this database:

1. Economic and Social Research Council Project 1999-2000

This electronic version is the direct result of a project, The Aberystwyth Child Language Databases which was funded by the Economic and Social Research Council (R000237978) with a grant of L60,611. The project was located in the Department of Education, University of Wales Aberystwyth, and ran for 12 months from 1st July 1999 until the 30th June 2000.

The director of this project was Bob Morris Jones, and the recordings were transcribed in CHILDES format by Merris Griffiths and Mared Roberts, Research Officers.

Because of dialectal differences, it was advantageous to have one researcher from a northern dialect area and the other from a southern dialect area.

This project had the following objectives, all of which have been achieved and are described elsewhere in these pages:

Then the data and the lexicon would be publicly available at the CHILDES site for work in:

Because of the regularity of the CHILDES transcriptional system, howevever, researchers would also be free to write their own computer programs for their own aims. A programming language such as Icon is suitable for this purpose.

2. Welsh Office Project 1974-1977

The audio recordings were originally collected, with the permission of the parents, the schools, and the Local Education Authority, as part of a project which was funded by the Welsh Office, Concept and Language Development, under the direction of Professor CJ Dodson between 1974 ac 1977. This project was administered by Bob Morris Jones, and staffed at different times by Brec'hed Piette, Hefin Jones, John Jones, Wyn James, Christine James, and Nesta Dodson.

Brech'ed Piette was mainly responsible for designing concept tests which were adminstered to many of the children who were recorded. The results have been coded on to plain text computer files at the University of Wales Aberystwyth.

A questionnaire was distributed through the schools to the parents which asked for details about the backgrounds of the children. The background details overall included information about:


Age Files Hours Children Filenames Filesize
3 25 12.5 42 c3001 - c3025 418Kb
4 31 15.5 63 c4001 - c4031 498Kb
5 39 19.5 77 c5001 - c5039 859kb
5a 44 22 87 c5a001 - c5a044 855Kb
6 48 24 96 c6001 - c6048 1000Kb
7 52 26 104 c7001 - c7052 1140Kb
239 119.5 469 4.66Mb

There are two cohorts: children from three to five, and children from five to seven. The first digit in the names of the files which make up the database gives the age of the children. The file names of the five year olds of the older cohort are distinguished by the letter 'a' after the first digit. The remaining digits number the files within the age group.

The basis of the data is a collection of audio recordings of some half to three quarters of an hour each. A large number of children between three and seven years of age were recorded in schools throughout Wales.

There are two cohorts: 3-5 and 5-7 years of age. The children were recorded once a year for three years - but losses and additions occurred as children missed a recording session and others came into the original project for the first time.

The recording sessions were undertaken in the children's schools. A standard play situation was used: a large box full of sand which also contained various toys, a wheel into which sand could be poured to turn it, and containers which could hold sand. There were some exceptional occassions early in the life of the original project which collected the data when other play situations were used (a farm or building set).

A researcher supervised the recording session in situ. But the aim was to obtain spontaneous interactions between the children, and there was no use of standard elicitation techniques. The investigator only spoke with the children: (a) for normal social reasons, (b) to engage with shy or quiet children, or to encourage a flagging conversation, and (c) to discourage unruly behaviour.


The database has been placed in the public domain for use in academic research. Every researcher is welcome to use the data. Please fully acknowledge the roles of the University of Wales Aberystwyth and the Economic and Social Research Council in the creation of the database.