University of Texas, Austin
To raise the visibility of and improve incentives for software work as a contribution in the scientific literature
The writing of scientific software is an increasingly important part of modern scientific practice. Properly rewarding such activity requires the wide adoption of new citation practices where authors formally recognize the software they use in their work. Yet a change in citation practices would leave untouched the scientific literature produced to date, which is filled with explicit or implicit mentions of software in the body, footnotes, figures, or acknowledgments sections of articles. Funds from this grant support a project by James Howison of the University of Texas, Austin, School of Information, to develop means to identify software citations from the current corpus of scientific papers. Howison will assemble a team that includes technologists Heather Pirowar and Jason Priem, compile a gold-standard dataset of software references in the scientific literature, and then develop a machine learning system trained on that dataset to recognize software references in scientific articles. The team will then deploy, test, and refine this system in three different prototypes.