I’ve had a changing lens at times in this project, looking at DataONE page views, examining issues with data citation metrics, identifying where references to the use of existing datasets might occur in the research process. The data life cycle has seemed a useful place to concentrate and bring some issues into sharper focus, but there are also broader impacts, in changing behavior among scientists and supporting reproducible science and encouraging public trust. There are economic benefits in allowing scientists to work on their research, rather than on managing long term data preservation; as well as the savings in not having to re-do research studies. Awareness of data issues, and the availability of shared datasets can potentially change research questions, and allow studies across datasets and across disciplines, bringing in the potential for “big data” studies. Properly archived datasets are assets whose benefits may extend over a very long period of time.
There are ways of measuring some of these broader impacts. Freedman, Leonard and Cockburn (2015) suggests that data repositories can save the US $14 billion a year by improving reproducibility results. Surveys could potentially show changing behavior of scientists with respect to sharing data, and improved data management practices. Public surveys could measure how public trust in science is affected by data repositories.
In the final few weeks I’ll be focusing on specific measures and suggestions within the data life cycle, and trying to contextualize these metrics within the much broader impact environment.
Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The Economics of Reproducibility in Preclinical Research. PLOS Biology, 13(6), e1002165. doi:10.1371/journal.pbio.1002165