The Miami Foundation Inc
To grow Dat, a system for real-time replication, transformation, and versioning of large tabular data sets, into a vibrant, healthy open source project
Github, the collaborative software development and versioning platform, has become so essential to the software development ecosystem that scientists have begun experimenting with using it for the collaborative versioning and sharing of datasets. Though the potential value is immense, Github was designed to handle software code containing thousands of lines per file, not tabular datasets containing millions of entries. Large datasets of the kind regularly used by scientists grinds the system to a halt. Moreover, tabular data, unlike textual software code, might exist in any one of myriad data formats ranging from comma?separated to Excel to SQL. Funds from this grant provide support for the development of a solution to this problem, a Git-esque platform called “Dat.” Created by open source developer Max Ogden, Dat borrows heavily from Github’s syntax and mechanics, but is optimized for large-scale tabular data and has been programmed to be able to translate seamlessly between the wide variety of formats commonly used to store data. Grant funds will support the hiring of two developers: one focused on core development and one focused on providing interfaces useful to researchers and on ensuring the system’s interoperability with existing scientific data repositories. Additional funds will support outreach and partnership building with stakeholders in the scientific community.