Output details
11 - Computer Science and Informatics
University of Brighton
An investigation into the validity of some metrics for automatically evaluating natural language generation systems
<22>
This paper presents an empirical investigation into the validity of corpus-based evaluation metrics such as BLEU for evaluating Natural Language Generation (NLG) systems. It is helping to shape the NLG community’s perspective on using corpus-based evaluation metrics. The experimental design, for human ratings-based evaluations of NLG systems, has since been adapted and used by other NLG researchers, such as in the context of the Generation Challenges series of NLG system competitions. Computational Linguistics is a top journal in the field and is high on international journal rankings, e.g. A* on the Australian ERA/CORE list.