Harvard University
To improve the ability to curate and verify replication datasets within the Dataverse data archiving platform through a suite of software containerization and metadata tools, and to support the development of a new data curation service at the Harvard Dataverse
This grant funds a series of four projects by Mercи Crosas, Chief Data Science and Technology Officer at Harvard’s Institute for Quantitative Social Science to expand and improve software handling capabilities of the Dataverse open source data repository platform. First Crosas will integrate Dataverse with Encapsulator, an open source tool that allows the creation of a computational “time capsule” that preserves the exact computational environment used to conduct a piece of data analysis. Second, Crosas will create links between Dataverse and Code Ocean, a computational reproducibility platform that was spun out of Cornell Technion’s incubator program. Third, Crosas will develop a set of metadata versioning and exploration tools that will increase incentives for data curation by returning richer usage statistics to data providers and publishers. Finally, Crosas will model and pilot a fee-based curation service that would allow the sustainable scaling of data and code curation in Dataverse. This work, like all other development on and organizational innovation within the Dataverse community, will be freely available and useful to the dozens of other institutions running the software to power their own data archives.