Raw data needs preparation to be useful for research. In some cases, what is needed is cleanup and normalization; in others, tagging or categorizing dataset elements. Depending on the domain and kind of data, computers can do much of the necessary work, but some tasks, due to fuzziness or complexity in the data, are currently beyond the bounds of computation. Much data prep requires human eyes, human minds, human judgment, and human labor, a daunting demand when the size of many modern scientific datasets is measured in terabytes.
The Zooniverse project, an international effort initially based at Oxford University and now housed primarily at the Adler Planetarium in Chicago, offers a straightforward solution to this problem: divide the work into very granular tasks, gather a large crowd of science enthusiasts, and let them loose on the data. "Galaxy Zoo," the first Zooniverse initiative, asked participants to view images of galaxies collected by the Sloan Digital Sky Survey and to categorize their shapes, successfully engaging 130,000 participants who performed over 100 million distinct classifications. Subsequent projects have expanded the Zooniverse strategy into other scientific domains, asking volunteers, in one case, to help reconstruct historical climate data by entering records from the digitized images of ship logbooks.
Funds from this two-year grant will support the extension of the Zooniverse platform into new mechanics beyond image classification (for example, sound classification of whale songs, or tagging of species from video feeds), outreach efforts to identify scientific datasets that might be usefully improved through tapping Zooniverse volunteers, and activities to engage the large and growing community of the citizen scientists that participate in Zooniverse projects.