Output details
11 - Computer Science and Informatics
University of Sheffield
GATECloud.net: a platform for large-scale, open-source text processing on the cloud
<15>GATE Cloud is the first user-extendable cloud-based text mining platform, employing parallel and distributed computation for Big Data text processing. This paper summarises results from a JISC/EPSRC project (EP/I034092/1) which won best paper award at UK eScience All Hands’ Meeting in 2011. Here we: discuss infrastructural facilities (load balancing, efficient data upload and storage, deployment to virtual machines, security, fault tolerance); quantify the scaleability profile of the distributed computation; evaluate the system in use at Public Health England (Amanda Semper <Amanda.Semper@phe.gov.uk>). In 2011-12 Cunningham was ANR Chaire d’Excellence at the Internet Memory Foundation applying this work to multi-terabyte web crawls.