For the current REF see the REF 2021 website REF 2021 logo

Output details

11 - Computer Science and Informatics

University of Sheffield

Return to search Previous output Next output
Output 0 of 0 in the submission
Output title

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval.

Type
E - Conference contribution
DOI
-
Name of conference/published proceedings
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Volume number
-
Issue number
-
First page of article
262
ISSN of proceedings
-
Year of publication
2010
URL
-
Number of additional authors
1
Additional information

<22> Efficient storage of language models derived from large datasets is a critical issue in enabling new methods to impact practical technologies, e.g. machine translation, speech recognition. This paper proposes a novel approach for ngram data-storage, using randomised methods to achieve impressive memory efficiency, whilst maintaining fast look-up, thereby allowing storage and exploitation of massive data (e.g. the entire Google Web-1T ngram dataset). The method remains the state-of-the-art for space-efficiency in ngram data-storage. EMNLP has an ERA ranking of A, and the paper has been cited 16 times (GoogleScholar).

Interdisciplinary
-
Cross-referral requested
-
Research group
None
Citation count
4
Proposed double-weighted
No
Double-weighted statement
-
Reserve for a double-weighted output
No
Non-English
No
English abstract
-