Professor Bertram and I had several meetings this week. During the first meeting, we identified several terminologies I had read about last week. The main topic we discussed is what is provenance? Form OPM to W3C PROV to PROV ONE, provenance has different meanings. OPM and W3C focus more on retrospective provenance while PROV ONE is more about prospective provenance. In the following days of this week, I went deeper into the use of a provenance tool called Galaxy and started research on PROV ONE.
Galaxy is a web-based platform for computational biomedical research. The goal of Galaxy is to allow users without a professional background to perform data analysis on, especially, those data related to biology. Furthermore, the operations scientists execute on data will be captured by the system within Galaxy which can be used to create workflows. In the next time, users just need to find appropriate datasets and re-run the workflows on the datasets to make certain changes or analysis.
Through redoing the step by step hands-on experiment and learning official tutorials, it is not hard to manage the skills such as how to input/get data, how to process data by multiple tools, how to create a workflow based on the history and how to execute the “recipe” on new datasets. Several highlighted part of Galaxy need to be pointed out for further discussion.
- Abundant Datasource
Major databases in the biomedical field can be found in this platform, which means it is convenient for scientists or users to get essential datasets and do further research.
Unlike programming, the operation-oriented platform has limited tools for users to analyze data. But things are a little different in Galaxy. Instead of treating the tools as “certain fixed steps of data analysis”, Galaxy designs those tools more like software, which means the tools have their own version, people can contribute to design or improve tools for better performance and this help add more flexibility to Galaxy platform. Now 6891 valid tools existed in Galaxy. ( Till May 30, 2019)
After creating workflows based on the histories, users can share or publish their process to the public or assign access rights to a certain group of people by email.
Galaxy also works well in reproduce. By running the workflow on datasets and wait for a few minutes, users can get the exact results they want. Furthermore, when creating a workflow, parameters can be changed and users can make revise when re-running the same analysis on different input data sets.
Currently, detailed execution traces are captured or collected by different provenance tools ( WfMSs, short for Workflow Management Systems) such as Taverna, Kepler, VisTrails, Galaxy, eScience Central, Pegasus, and so on. In the previous research, several standard models have been developed to help capture and publish provenance of artifacts, and two main models are called OPM (Open Provenance Model) and W3C PROV. But these two models have some limitations, which can only be used to track resources on the web instead of data products. As a result, different provenance tools adopt different models. Using different tools means different models have been adopted to specify the workflows, which made it even harder for scientists to compare execution traces and reproduce scientific results. PROV ONE was established for solving these problems in the context of DataONE project.
PROV ONE is defined as an extension of the W3C recommend standard PROV, aiming to capture the most relevant information concerning scientific workflow computational processes, and providing extension points to accommodate the specificities of particular scientific workflow systems.
and I am still working on learn more detailed about this model…
Hope you all have a good weekend.
Lim, C., Lu, S., Chebotko, A., & Fotouhi, F. (2010, July). Prospective and retrospective provenance collection in scientific workflow environments. In 2010 IEEE International Conference on Services Computing (pp. 449-456). IEEE.
Cao, Y., Jones, C., Cuevas-Vicenttın, V., Jones, M. B., Ludäscher, B., McPhillips, T., … & Walker, L. ProvONE: extending PROV to support the DataONE scientific community.