Scientific workflow systems are increasingly used to automate scientific computations and data analysis and visualization pipelines. An important feature of scientific workflow systems is their ability to record and subsequently query and visualize provenance information. Provenance includes the processing history and lineage of data, and can be used, e.g., to validate/invalidate outputs, debug workflows, document authorship and attribution chains, etc. and thus facilitate “reproducible science”. We aim to develop (1) a provenance repository system for publishing and sharing data provenance collected from runs of a number of scientific workflow systems (Kepler, Taverna, Vistrails), together with (2) a provenance trace publication system that allows scientists to interactively and graphically select relevant fragments of a provenance trace for publishing. The selection may be driven by the need to protect private information, thus including hiding, abstracting, or anonymizing irrelevant or sensitive parts.


The following people are contributing to this project:

  • Saumen Dey – Student Intern (University of California, Davis)
  • Michael Agun – Student Intern (Gonzaga University)
  • Bertram Ludaescher – Primary Mentor (University of California, Davis)
  • Paolo Missier – Mentor (Newcastle University)
  • Shawn Bowers (Gonzaga University)

The project started June 1, 2011, followed by a kick-off meeting on June 8th (co-located with a Provenance Working Group meeting June 7th at UC Davis).  The project will continue through the first week of August 2011; follow-up work and publications are planned through the Provenance Working Group.

