Stanford University
To model the supply of and demand for research data, including decisions about hoarding, sharing, and privacy protection
This grant funds a project by Stanford economist Liran Einav to study the factors influencing researchersХ decisions about when and how to share data. Topics to be studied include researchersХ incentives for data sharing, the perceived privacy risk if data are shared, the impact of data-sharing decisions on scientific progress, and the relevant policy implications. On the privacy issue, specifically, Einav will develop and train a machine learning algorithm to estimate the vulnerability of different datasets to privacy violations, a potentially useful tool in and of itself, as it would bring more objectivity to risk assessment than current processes, through which researchers and review boards try to estimate these risks subjectively. Using the algorithmic risk assessment and many other field-specific variables as inputs, Einav and his team will then develop econometric models of researchersХ data-sharing and data-hoarding decisions. The regression coefficients calculated from this estimation should permit causal insights about the relative importance of different factors in those decisions, as well as generate predictions of the effect of various policy changes on scientific progress. This research therefore stands to inform and empower the many existing efforts to encourage better data practices among academics.