Week 8: Analysis, figures and data citation best practices

In our 8th week, we worked with project mentors to refine our systematic review database and analysis. The main question for our internship was whether a selection of data-aggregation studies could be repeated through repositories available in DataONE. During the preceding weeks we extracted dozens of pieces of data from each manuscript we included in the systematic review. This week, we directly compared data sources aggregated in our selection of 80 papers to repositories on DataONE and repositories on the well-known list of repos at https://www.re3data.org/

Giancarlo presented a series of statistical models (and figure below) during our weekly check-in with accompanying figures. We found that certain types of data tended to be more accessible. For example, data stored as databases were (in general) more accessible than spatial data. We also had an interesting discussion on the best ways to visually present our results in the manuscript. A fun part of the internship has been collaborating on everything from project design to writing the manuscript.

Plot of predicted click score values as predicted by the age of the data source (n = 313 sources). Click score is an ordered measure of data availability from 1 (easily accessible) to 4 (not accessible). Data source age was calculated from the year of publication for any citation (e.g., article or report) associated with the data source. This plot indicates an higher probability that older data sources were more difficult to access (click score 3 or 4).

We also continued work on the best practices documents. Our main product was a series of best practices for data citation in data aggregation research. Our mentor Megan is providing edits to this document. We also created two other figures to accompany the best practices. The first (show below) summarizes the best practices in one concise figure. The second graphically documents the steps we took to conduct the systematic review.

A concise summary of some best practices for data aggregation research.

Leave a Reply

Your email address will not be published.