Indiana University
To design a prototype system that demonstrates non-consumptive, computational access to a restricted full-text corpus
Access to some datasets is justifiably restricted for legal, ethical, or business reasons. The existence of such datasets presents an opportunity for the smart application of technology that permits aggregate statistical or computational research on the data without violating the constraints that prevent full access. This grant to researcher Beth Plale at Indiana University, supports a collaborative project with the Hathi Trust, holder of over 8.5 million digitized print works, to address the immense technical and theoretical issues involved in designing digital methods for mining data from in-copyright materials that respect current legal restrictions governing access to such works. Plale's team will develop a secure computing environment that will enable researchers to bring their own algorithms and tools to bear on Hathi's full?text digitized corpus, while at the same time limiting the ability of that software (or researchers) to access the work in a way that runs afoul of copyright law.