University of Michigan
To design, develop, and implement data linkage tools that connect software mentions to research papers, repositories, and grant sources
The Institute for Research on Innovation in Science (IRIS) maintains a dataset linking university grant and HR/procurement data with scholarly publications. That data has become essential infrastructure for the growing research community focused on measuring scientific productivity and the return on public and private investments in science. This grant funds efforts by Jinseok Kim and Jason Owen-Smith to expand this database by adding in linkages to research software, creating a new resource that can be used to begin to quantify the role software plays in scientific productivity. Unlike publications and patents, software doesn’t necessarily have well-curated author lists, and citations to software codebases are not necessarily well-structured for data mining. On the other hand, versioning platforms like Github have much more granular data on the specific contributions by individuals to codebases over time, which could enable very detailed analyses on who does what kinds of software work.In order to enable research on individual contributions to software as products of research, the IRIS team will identify relevant software repositories and link contributor usernames to the faculty, students, and staff who are represented in university records. Kim will identify software referenced in a corpus of papers, then develop algorithmic ways to match the names of software projects with active Github repositories. Next, he will use a set of name disambiguation methods to link contributors to those repositories with people already represented in the IRIS data, in the process linking those repositories and contributions to funding and other IRIS entities.