Week 4 – Around the world… – DataONE Notebooks

This week’s report is actually a combination of three 1/3 weeks, since I attended a conference in Germany and due to the visa issue, I stayed in China for two weeks after the conference. So I did travel around the world. The conference is called ProvenanceWeek 2014, which took place in Cologne, Germany. The two top workshops in the domain of provenance: IPAW and TAPP was co-located and took place during ProvenanceWeek. I presented a poster titled “Improving workflow design using abstract provenance graphs”. At the same time, I learned a lot from other scientists and even found some tools and papers that is related to and can facilitate the summer project, i.e., Capturing provenance in ipython notebooks and PROV to Ipython Notebooks. The first tool captures function provenance in iPython notebooks and visualizes it. The second one generates iPython notebook for provenance information from a workflow system.

Many of the members of ProvWG attended the conference and I had a meeting with my mentors during the conference. After the conference, I spent some time with my family in China but still working on the project though the “The Digital Great Wall” hit my productivity a little bit.

Anyway, for this week, I studied and got able to run the Prolog rules for querying provenance produced by NoWorkflow system. The information captured for querying only contains function activation and file access but if we use the CurationWF_v1 as an example, we’re able to ask questions like whether certain actor is called after another actor or whether certain file has been changed after activation. I also identified some other questions I think it’s interesting and implemented some of them. We’ll think about whether we want to capture more provenance information if some questions we want to ask cannot be answered.

For the another front of translating Matlab to Python, I took part of the simplified script and translated into a Python script. But the Python script I have is only part of the script, i.e., only for reading a file and processing it in a simple way. It’s good for studying the structure of the provenance information and I’ll continue making it more complete. At the same time, the size of the input dataset is large enough to cause significant delay during testing, I’m running some tests to see what size of the dataset is suitable for testing.

Leave a Reply Cancel reply