For the current REF see the REF 2021 website REF 2021 logo

Output details

28 - Modern Languages and Linguistics

Queen's University Belfast

Return to search Previous output Next output
Output 19 of 82 in the submission
Title

French Oral Narrative Corpus

Type
H - Website content
Year
2013
Number of additional authors
0
Additional information

This output contains both website content (H) and a research dataset (S): the website contextualises the dataset and provides access to the files for viewing, downloading and searching. The dataset is a digitised linguistic corpus of French oral narrative developed by Carruthers in partnership with the Conservatoire Contemporain de Littérature Orale and the Oxford Text Archive. The corpus contains 87 narratives recounted by 18 different storytellers in authentic settings, representing a range of story types. The narratives have been transcribed and annotated by Carruthers, using Text Encoding Initiative guidelines, for a range of linguistic features that are of particular interest for oral narrative, notably speech and thought presentation, left- and right- detachment, subject-verb inversion and loss/retention of negative 'ne'. The dataset contains 609 files (87 x 7): for each story, there is an audio recording, a fully annotated xml version with TEIP5 encoding, an encoded version with TEIP4 encoding, an encoded PDF version, an encoded HTML version and stripped PDF and HTML versions. The xml files have Headers containing full metadata on the setting, storyteller and story, as well as the annotation taxonomy. The website offers access to all the files for viewing and download and contains a search engine for the xml files, so that advanced searches on the annotated structures can be carried out. There is also a simple lexical search tool. The website discusses (i) the methodology Carruthers employed in developing the corpus, including sampling, data collection, transcription and her design and implementation of the annotation taxonomy; (ii) the linguistic and digital research contexts in which the corpus is situated. The corpus is also held at the Oxford Text Archive and the website provides a link to this as well as a number of other links. A date-stamped electronic copy (14&15/10/13) has been submitted to REF.

Interdisciplinary
-
Cross-referral requested
-
Research group
None
Proposed double-weighted
Yes
Double-weighted statement

This output consists of both website content and a dataset (an annotated linguistic corpus). The corpus involved collection, transcription, detailed linguistic analysis and annotation of a considerable body of oral material (87 narratives/142,000 words). The researcher designed a TEI tagset of 137 tags for a range of complex linguistic structures (involving many problematic and ambiguous categories) and deployed this to annotate the corpus. Digitisation and annotation required extensive consultation on TEI methodology at the Oxford Text Archive (six visits). The source data and metadata were complex and difficult to access, requiring six substantial fieldwork/archival/consultation visits to the CLIO.

Reserve for a double-weighted output
No
Non-English
No
English abstract
-