Prospective and Retrospective Provenance Queries: Week 2 – SPARQL Provenance Query

Hi all,

It’s Linh Hoang from Project 3. This week, I am working on exploring SPARQL querying capabilities in YesWorkflow. The objective is to testing how well SPARQL can query YW outputs, which are represented in RDF format. First, I completed installing Virtuoso on my machine and wrote a manual of how to do the installation as well as how to start querying in Virtuoso. I personally think documentation is very important as we want to make sure other people in the team will be able to perform the same process later on if we decide to use Virtuoso as our querying tool. Next step, since our advisor (Prof. Tim McPhillips) provided a sample of RDF file, I started to write and run SPARQL queries using Virtuoso. The sample queries are in different levels, from easy, simple to more complicated. We want to know how far SPARQL capability is in querying our RDF outputs and also to compare with Prolog/Datalog (which are the two querying mechanisms that are used in YesWorkflow previously) to see whether it is more desirable (or more inconvenient?). Based on the querying results, we also start to think about how current RDF format can be improved in order to resolve the query limitations.

Besides that, I also spent time to get to know ProvONE data model, which is a model for scientific workflow provenance and tried to map components in ProvONE model with YesWorkflow Model. This will help us, in the future, to be able to revise our current RDF in order to improve the querying capabilities.

This is just a high level summary. Please let me know if you need more details about the works! Thank you for reading!


