Output details
11 - Computer Science and Informatics
University of Sheffield
Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval.
<22> Efficient storage of language models derived from large datasets is a critical issue in enabling new methods to impact practical technologies, e.g. machine translation, speech recognition. This paper proposes a novel approach for ngram data-storage, using randomised methods to achieve impressive memory efficiency, whilst maintaining fast look-up, thereby allowing storage and exploitation of massive data (e.g. the entire Google Web-1T ngram dataset). The method remains the state-of-the-art for space-efficiency in ngram data-storage. EMNLP has an ERA ranking of A, and the paper has been cited 16 times (GoogleScholar).