Week 4 – Use case exploration

Hi all,

In the week 4, with an actual scientific dataset, I practice and generate some scripts for completing the whole process, i.e., getting a use case (downloading from DataONE), reproduce and update the objects in it, and publish the updated package to DataONE.

The use case is about analysis of hydrocarbon samples that were collected in the Gulf Of Alaska (GOA) after Exxon Valdez oil spill (EVOS). The dataset contains some source data, program scripts, and output images, and provides metadata information for each object. It also includes a provenance information, i.e., how those input sources are used to generate the output. With this use case, 1) I learnt about how to reproduce (intermediate) outputs / artifacts with given scripts and information (this step requires changes of the objects in the package), 2) added some new objects, e.g., a retrospective visualization created by hand with the given information (YesWorkflow is used), and 3) publish this updated package to DataONE as a new package. The step 3) requires some additional substeps, but I create a single script in R that handle all substeps in 3) at once.

As mentioned above, to use YesWorkflow, I read some references and learnt how to use it for generating both retrospective and prospective provenance (as introduced above with the link for the output). Additionally, I tried to do the steps above as an automation of the entire process inside a tale in Whole Tale by importing the dataset directly from the DataONE. An important issue that I faced is the changes cannot be committed inside the tale (permission not given) which means all the steps above should be done locally and then load those into the tale (my opinion has not confirmed yet).

Since this week was spent to see the whole process, I will dive into the step 1) and 2) above. 1) and 2) requires more observations of, e.g., what information exists / is missing, and how we can capture provenance from the package itself without using the information available in DataONE (make this more independent) in a sense of one execution. Furthermore, it also need to study about the meaning of “Reproducible” in depth again, because reproduce does not mean recreating those objects.

I think this is all for this week. Hope that all have nice weekend.

Leave a Reply

Your email address will not be published. Required fields are marked *

*