Prospective and Retrospective Provenance Queries: Get to know YesWorkflow

My first week internship at DataOne started with a meeting with my mentors Prof. Bertram Ludaescher, Timothy McPhillips and Paolo Missier and my co-intern Hui Lyu. We discussed about what are the objectives of the internship and came up with a list of tasks for the next following weeks. Overall, the objective of the internship is to reimplement existing query capabilities of YesWorkflow in more mainstream approaches using RDF and SPARQL. Towards this end, there are two major tasks are considered: first, exploring RDF and figuring out how to use it to represent YW outputs and second, examine the possibility of using SPARQL to query provenance data and figure out how to do it.

The first week objective is to get to know YesWorkflow and to get familiar with the product implementation process. In order to achieve this goal, I experimented Yesworkflow demos following the guidelines and the materials that are available on YesWorkflow GitHub. There are two demos that I reproduced: YesWorkflow prototype and IDCC 2017. After reproducing the demos, I and my co-interns came up with a flowchart that conceptualized what are the features provided by YesWorkflow and how to implement them. I also spent time to read some relevant papers (about YesWorkflow, PROV Data Model, ProvONE) in order to get the overall picture of what have been done in term of the project. Finally, I also learned how to use a SPARQL query engine named Virtuoso (one start-of-the-art SPAQRL querying tools), which would be beneficial when we move forwards to use the language for provenance querying.

It has been a great start and it’s a pleasure for me to join the team. I am looking forwards to the next 8 weeks of the internship!

Bests,

Linh

Leave a Reply Cancel reply