Many researchers fail to appropriately capture, log, and version their work as it moves through the research process from data collection through multiple stages of cleaning and preparation to analysis. In part, this failure is due to the difficulty of logging changes in data as it moves from one software platform or set of scripts to another, each of which might be ideal for a particular part of the research process, but none of which are tied together by a common platform that can track the provenance of data as it moves from one system to the next.
For decades, experimental scientists captured their research activities in lab notebooks. What is needed is a revitalization of that old idea: a lab notebook for the modern era of computational science. This two-year grant funds the development of just such an electronic lab notebook environment, called the IPython Notebook. Built on top of Python, R, and other widely used software languages in the data science community, the IPython Notebook is an early prototype computational platform that allows researchers to run a wide variety of high-powered data cleaning, modeling, and analysis algorithms inside a common computational environment. Grant funds will help IPython developers make the leap from early adoption to mainstream usage, focusing particularly on the development and scaling of features in three key areas: interactive exploration of data, collaborative authoring, and dissemination/sharing. Additional grant funds cover the salary of a full-time outreach coordinator to give presentations and tutorials at universities and professional society meetings, and funds to support the development of a set of live "notebooks" for use in introductory statistics classes, to better introduce students to the platform.