Most of today was spent tracking the Biological Magnetic Resonance Data Bank (BMRB) ID numbers in Google Scholar. It was really hard getting rid of the “chatter” in the search results and determining which citations actually cited the dataset based on the small blurb on Google Scholar. I collected 77 articles that potentially cited the dataset. However, after collecting the full-text for the 73 articles for which I had access, 64 did not actually cite the dataset. This is the major downfall of having a plain 4+ digit number as the repository specific ID number for each dataset; it gets really difficult to search for that particular dataset cited within the literature. However, I did find 9 articles that reused the datasets so all was not a total waste. The Google Scholar search terms used and number of relevant hits can be seen in this Google Spreadsheet. The citations for the articles can be found in this Mendeley group.
I also spent some time fine tuning the search string for the Protein Data Bank ID numbers in order to get all the PDB cited articles while getting rid of a lot of non-relevant results. I ended up with the search string: (PDB OR “Protein Data Bank”) AND “1TL6” -heinonline -patents; where 1TL6 is the dataset ID. The -heinonline ruled out a lot of non-related government documents and the -patents rules out any patents or patent applications. Search terms fro the PDB can be found in this Google Spreadsheet.