Output details
11 - Computer Science and Informatics
University of St Andrews
The imagination of crowds : Conversational AAC language modeling using crowdsourcing and large data sources
<20>EMNLP is a top NLP conference (acceptance rate: 23%). The paper introduces a novel methodology for collecting data for statistical language models. A long-standing problem in AAC is a lack of representative data. This paper shows that we can crowd source the creation of such data and it also shows that this surrogate data predicts AAC-like text better than previously used texts. We expanded the surrogate data using cross-entropy difference selection on social media and show 5-11% keystroke savings---nearly an order of magnitude better than recent approaches. The paper was featured in New Scientist (February 26, 2012; pp. 24-25; http://www.newscientist.com/article/mg21328536.600-crowdsourcing-improves-predictivetexting.html).