For the current REF see the REF 2021 website REF 2021 logo

Output details

11 - Computer Science and Informatics

University of Edinburgh

Return to search Previous output Next output
Output 169 of 401 in the submission
Output title

Genre distinctions for discourse in the Penn TreeBank

Type
E - Conference contribution
DOI
-
Name of conference/published proceedings
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Volume number
-
Issue number
-
First page of article
674
ISSN of proceedings
-
Year of publication
2009
Number of additional authors
0
Additional information

<22> Originality: First-ever demonstration that the widely-used 1-million word Penn TreeBank corpus is not simply a collection of news reports, but actually comprises documents from a range of genres, from financial reports to film reviews to errata and verse, each type showing very different lexical, syntactic and organizational properties.

Significance: Subsequent work that has used the Penn TreeBank corpus for training parsers, doing domain adaptation, assessing discourse coherence, etc, now acknowledges this fact and conditions its claims on the particular genre involved.

Rigour: Uses standard methodology of corpus linguistics and tools from computational linguistics.

Interdisciplinary
-
Cross-referral requested
-
Research group
D - Institute for Language, Cognition & Computation
Citation count
-
Proposed double-weighted
No
Double-weighted statement
-
Reserve for a double-weighted output
No
Non-English
No
English abstract
-