This week I began by going through the paper by Chapin et al., 2006 (doi:10.1007/s10021-005-0105-7) and pulled out terms to search on the DataOne site. I then queried those terms using the R dataone package and query code. The query code had to be slightly altered since the default search results number is 10; we added “origin$columns = 1000” to allow for all items in the database to be recovered. Compiling a csv of those query results and annotating it with information such as the query search term, whether or not the result were accurate, and if the Arctic Data Center website has Attribute data for that dataset. From these query searches I am finding additional terminology to search for within the database. Several interesting finds resulting in the query results including multiple datasets that have metadata files but no actual data, datasets which are available through the NCAR/UCAR site but not DataONE, and the fact that some data sets while related come up for search terms that imply they have those data when in fact they do not. In edition to the compiled query results csv I have also created a summary statistics csv that outlines the details of each search. For example, for the search term “carbon flux” there were 21 search results of those datasets 3 did not contain carbon flux related data and 2 did not have data files and could not be checked, resulting in a 84% recall precision score. Precision scores will allow us to quantitatively evaluate the precision of dataset search queries.
Another goal this week was to compare terms within the ECSO (The Ecosystem Ontology) and those within Chapin et al. noting whether these terms were already present or need to be created. Unfortunately, we are having issues importing the ECSO ontology into Protégé 5. Troubleshooting the import for several hours I found one potential issues with the existing ontology is the light weight import of terms from other existing ontologies such as those from ENVO and CHEBI from the obolibrary. These vocabularies do not appear to be properly linked. This paper on MIREOT by Courtot et al. (2009) details the importance of these “shallow” imports and some of the best practices, which can facilitate their functionality. Stay tuned for more on this issue next week!