Week 3 – Database refinement and data extraction

Continuing with our collaboration and joint post from last week, our main goal for week #3 was the extraction of data source information from papers employing data syntheses. We anticipated the need to refine our database (e.g., fields and categories) along the way.

In reviewing the abstracts of papers identified from a Web of Science (WoS) search (N = 366), Giancarlo had the task of assigning categories to papers based on criteria considering the scope and methods. First, a number of papers synthesizing data sources were identified by WoS but were deemed peripheral to (or completely outside) the ecological sciences. It was interesting to see the various ways “ecology” or “ecological” are used in other disciplines! Among papers that were firmly in the ecological sciences, a number of other studies did not synthesize data but rather reviewed the current science on a given topic. Other papers went further to conduct meta-analyses from suites of other papers, but did not synthesize data per se. This process resulted in a set of over 100 potential data syntheses. While extraction of data source information for WoS papers is in the early stages, the initial search and subsequent exclusion criteria have captured a diverse array of studies from various systems across the world.

Rob took on the task of extracting data from synthesis research conducted by NCEAS between 2015-2017 (N = 162). From this first week of data extraction we found that many of the NCEAS articles we started with were qualified for the systematic review based on our inclusion criteria. Similar to the screening process outlined above for WoS articles, some of the articles that we excluded from NCEAS were meta-analyses (N = 6) or narrative reviews (N = 3). One surprise during NCEAS data extraction has been the variation in how many data sources are analyzed in a single manuscript. Some articles brought in a single outside data source, while another paper synthesized data from over 70 sources!

During the process of extracting data source information, we both identified common or novel approaches to data citation and description. Many were stylistic or dictated by journal guidelines. For example, data sources may sometimes be briefly listed in methods sections, but more extensive data descriptions and links to the online data source may only appear in a supplemental file. Other papers highlighted challenges researchers sometimes face when attempting to replicate an analysis, for example when a data repository goes off-line. These varying approaches and challenges have helped us in (1) the development of a document detailing some of the best practices for accurate citation and (2; with the help of our mentor Megan) the refinement of our database to accommodate the diversity of data source types and our ability to access them.

We’re looking forward to the completion of our database in the coming week and the opportunity to investigate patterns of data source citation practices, repository choices, and accessibility.

Leave a Reply Cancel reply