This week was a lot of data extraction and cleaning; the necessary evils before being able to proceed into some more exciting citation analysis! In the beginning of the week I extracted all the citations from Library 1 (which is composed of the citations authored by those affiliated with DataONE). This database will be the private-facing collection of all the contents of Library 1 along with citation metrics including cited by data (articles that cite the article of interest), web views, etc. Once the database was constructed, I started populating it with citation metrics data as well as continued my work creating Library 3 (all the publications citing articles in Library 1). As I discussed last week, this started with search each title in Web of Science (WoS), exportation of cited by .ris files from WoS, and importation of said files into Library 3 in Zotero.
This week I searched all the articles not found in WoS in Scopus. To do this, I had to first add missing digital object identifier (DOI) information into the database. Of all the 235 current items, 156 did not have a DOI. In some cases this was due to the item type (e.g., software, books, conference presentations) and in others it was just not included in the metadata of the item. Based on a manual Google search for each title I was able to supply 71 missing DOIs. I then searched all DOIs that did not have cited by information from WoS in Scopus, entered the information into the database, exported all the cited by articles from Scopus as an .ris file, and imported them into Library 3. Unfortunately I did not find a way to reliably search DOIs simultaneously in Scopus and therefore had to manually enter each one.
The title search for missing DOIs in Google also yielded additional interesting usage information from either the publisher or database vendor (e.g., EBSCO, PLOS ONE, etc.) such as web views, PDF downloads, and cited by information not found in WoS or Scopus for many of the articles. Cited by numbers that were not extracted from WoS or Scopus and these additional metrics were added to the database when available. Many of the bigger database vendors and publishers have partnerships with Altmetrics, Plum Metrics, or Dimensions. These handy tools measure scholarly impact in a more Web 2.0 fashion using metrics like tweets, Facebook mentions, web mentions, etc. If this was listed on an item’s synopsis page, I included those metrics into the database as well. Also, for all items with a DOI, I also used Altmetrics’ bookmarklet to generate their Altmetrics score. This score is a calculation based on all the recorded web activity surrounding an item’s DOI.
All in all my week 4 endeavors proved more manually intensive than I had anticipated, pushing back my bibliometrics tasks to week 5. Hopefully I’ll have some interesting preliminary insights to share with you all next week!