You are in : Home » Results & submissions » Select UOA » 11 - Computer Science and Informatics » View submission: University of Sheffield » Outputs » Detail

Output details

11 - Computer Science and Informatics

University of Sheffield

Return to search Previous output Next output

Output 0 of 0 in the submission

Output title

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval.

Type

E - Conference contribution

DOI

Name of conference/published proceedings

Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Volume number

Issue number

First page of article

262

ISSN of proceedings

Year of publication

2010

URL

Number of additional authors

Additional information

<22> Efficient storage of language models derived from large datasets is a critical issue in enabling new methods to impact practical technologies, e.g. machine translation, speech recognition. This paper proposes a novel approach for ngram data-storage, using randomised methods to achieve impressive memory efficiency, whilst maintaining fast look-up, thereby allowing storage and exploitation of massive data (e.g. the entire Google Web-1T ngram dataset). The method remains the state-of-the-art for space-efficiency in ngram data-storage. EMNLP has an ERA ranking of A, and the paper has been cited 16 times (GoogleScholar).

Interdisciplinary

Cross-referral requested

Research group

None

Citation count

Proposed double-weighted

Double-weighted statement

Reserve for a double-weighted output

Non-English

English abstract