For the current REF see the REF 2021 website REF 2021 logo

Output details

36 - Communication, Cultural and Media Studies, Library and Information Management

University of Liverpool

Return to search Previous output Next output
Output 6 of 26 in the submission
Name of software

Cheshire Information Retrieval System

Type
G - Software
Name of software house
University of Liverpool
Year
2013
Number of additional authors
1
Additional information

Cheshire3 is an information retrieval system researched, developed, and published as an independent software product at the University of Liverpool during 2008-2013, as a complete redesign and reimplementation of the previous Cheshire digital library system, using an object model that enables Grid-based information retrieval over the data grid. The research was funded by the EU, the JISC, NSF, and Research Councils.


It is designed to meet the following needs:-

• Effective real-time retrieval from petabyte-scale collections.


• Efficient retrieval from large scale, globally distributed, heterogeneous collections.


• The need to integrate data and text-mining with storage, curation and retrieval.


The system implements a “meta-search” capability to build combined indexes “harvested” from distributed sources. Statistical ranking methods are used to rank servers or databases by the probability that they contain relevant information for a given user query. This capability can be recursively executed, which radically improves retrieval performance and efficiency.

Cheshire3 incorporates innovations emerging from the research programme, including:

• Updated ‘metasearch’ techniques, formally evaluated using real-time searching of large-scale document datasets.


• Research advances in indexing and term extraction for automatic classification of database records, using blind relevance feedback during augmented second stage search.


• Data and text mining, digital library tools.


• Integration of computational and data grids, including the integrated Rule Oriented Data System, to provide scalable retrieval.


• A staged logistic regression algorithm, and other algorithms, which form the basis of search and merge of results sets and ranks documents by relevance.


• Blind relevance feedback to enable cross-language and patent retrieval, as assessed in formal evaluations (Cross Language Evaluation Forum, the INEX Evaluation Initiative)

Cheshire3 is used internationally in production services for searching archives of science data, and in multiple digital library services including the Archives Hub, and the British Library.

Interdisciplinary
-
Cross-referral requested
-
Research group
None
Proposed double-weighted
Yes
Double-weighted statement

• The research and development for the Cheshire3 digital library from 2008-2013 is represented in the production of 154,890 lines of code in python, representing the work of 252 person months at the University of Liverpool

• The functionality has been engineered in response to the research and performance needs of 49 partners through three EU integrated projects, one EU STREP project, one JISC/NSF funded project, along with two national services, running simultaneously

• Ongoing innovation has been required since the research was initiated in 2008 to meet the emerging demands in both scale and complexity of big data

Reserve for a double-weighted output
No
Non-English
No
English abstract
-