Week 3 – Provenance exploration

Hi all,

In the week 3, the goal was to learn and play more about Whole Tale, and also get used to provenance tools for R scripts (e.g., recordr) and have some use cases to capture provenance over that data packages using those tools.

As an extension of the work from last week, using an example tale, I have thought about how to incorporate Whole Tale with DataONE. For example, one could use and update a tale to reproduce the original and/or updated work while reproducing those scientific synthesis. In order to reproduce those updated version, we also need to store those updated datasets. Thus, I have created (still working on it) a script that can be part of a tale which automatically handle this process (e.g., from creating a package to publish to DataONE within a tale).

Moreover, I have been helped to find some use cases that can be used to capture provenance. With the given use cases, mainly two tools have been looked into, “Recordr” and “provRmd”, i.e., capturing provenance for R scripts and Rmarkdown (Rmd) files, respectively. I have studied with a simple use case through Recordr, e.g., set up the tool, how to capture provenance, and what provenance information is delivered (these tools support so called retrospective provenance that provides what steps were executed to generate the output artifacts). Capturing provenance through provRmd to support Rmd scripts is still in the process as the information about the tool is limited. However, this is importantly required as Rmd script provides more complex use cases (e.g., having multiple computational scripts to generate multiple artifacts).

For the next week, I will have more complex use cases by either finding other use cases that can be used with “Recordr” or figuring out how to use of “provRmd”. The main goal is to see what insight we can get from the provenance information (why this provenance is useful and important), and how much provenance can be delivered over these complex datasets. Beside, I may be able to look into different types of provenance (e.g., what are required to generate an artifact called prospective provenance) by having some use cases.

I think this is all regarding updates.
Hope that all have nice weekend.

