Output details
29 - English Language and Literature
University of Surrey
Compara: Free online parallel corpus
The COMPARA corpus is an electronic resource that was developed for educational and research purposes, with funding from the European Union (FEDER) and the Portuguese government. The corpus contains 3 million words of fiction text in English and Portuguese aligned with their respective Portuguese and English translations. This parallel and bidirectional structure allows users to look up words and phrases in one language and retrieve human translations in the other language. Part-of-speech annotation has been added to the corpus to allow users to search not just for orthographic words, but also for grammatical categories and collocates. A unique feature of the corpus is the manual post-editing of source and target-language segments, which enables users to examine changes in sentence structure that have occurred in the process of translation. The Web interface to COMPARA was designed to cater for the needs of experienced corpus users as well as of people who have never used corpora before. It can be used by lexicographers, translation scholars and research students to analyse large quantities of translation data. The corpus makes an innovative contribution to strengthening the field corpus-based translation studies, which has developed into a major research paradigm in Translation Studies. The corpus enables the systematic analysis of translation choices and strategies, which lead to linguistic and stylistic variation in translation, as well as the analysis of explicitation and other potentially universal features of translated texts. Although the COMPARA project ended on 31 December 2008, we have ensured it remains available free of charge at http://www.linguateca.pt/COMPARA/. The corpus has been averaging around 9,000 page loads per month and several research articles and MA and PhD dissertations have made use of COMPARA.