Week 3 – Capturing provenance

The goal of this week is implementing the use cases identified last week, running NoWorklfow system on it and analyzing the result.

Implementing the first use case (MsTMIP script in Matlab) is a little bit challenging because 1). making exactly mapping from Matlab to Python is difficult and time-consuming or even not compatible sometimes (e.g. NaN in Matlab) and 2). I think the script has some small problems or I didn’t the script correctly. Another concern is that the dataset this script deals with is relatively large and highly repetitive, so analyzing the provenance is a little bit challenging.

The second use cases (CurationWF) is implemented (or being implemented) in different levels in order to understand the structure of the provenance information: v1, all the actors are dummy (except for “reader”), with no actual logic implemented within; v2, all the actors are implemented as the way they should be but without any remote service access; v3, full implementation.  After running NoWorkflow system on the scripts, the result can be exported as Prolog facts (e.g. function activation and file access). Alternatively, all the information is captured in a SQLite database and is available for querying.

Example provenance of CurationWF_v1:

% FACT: activation(id, name, start, finish, caller_activation_id).

activation(1, ‘/home/tianhong/data/iPython/curationWF_v1/CurationWF_v1.py’, 1401815318.552702, 1401815318.554696, nil).
activation(2, ‘CSVReader’, 1401815318.552924, 1401815318.553718, 1).
activation(3, ‘open’, 1401815318.553038, 1401815318.553404, 2).
activation(4, ‘reader’, 1401815318.553550, 1401815318.553566, 2).
activation(5, ‘list.append’, 1401815318.553609, 1401815318.553618, 2).
activation(6, ‘list.append’, 1401815318.553668, 1401815318.553678, 2).
activation(7, ‘list.append’, 1401815318.553699, 1401815318.553707, 2).
activation(8, ‘GeoRefValidator’, 1401815318.554007, 1401815318.554020, 1).
activation(9, ‘DateValidator’, 1401815318.554285, 1401815318.554297, 1).
activation(10, ‘CSVWriter’, 1401815318.554563, 1401815318.554669, 1).

% FACT: access(id, name, mode, content_hash_before, content_hash_after, timestamp, activation_id).

access(1, ‘2011.csv’, ‘rb’, ‘e8c55d72da1ee331a818ad2ca41e763537ae0b91’, ‘e8c55d72da1ee331a818ad2ca41e763537ae0b91’, 1401815318.553052, 3).

Leave a Reply

Your email address will not be published.