Week 6 – The new tool – DataONE Notebooks

This week, I spent some time on the new tools as I mentioned last week, however, we ran into some difficulties that sadly we may need to change the target since we cannot run the tool.

The tool is called Prov-O-Matic, which intends to capture provenance of iPython notebooks and we plan to use it to capture provenance information of our own iPython notebooks. There are couple of reasons why we cannot run the tool as we expected:

1. This tool is not well documented and in particular, there is no solution provided upon running into errors. Also there are not a decent amount of documentations on how to work with an iPython extensions.

2. As stated by the author in the readme file, the tool is still quite experimental so I suspect there are still some problems that haven’t been solved. And based on what I understand by looking at the source code, in my opinion, I think the code can be improved in terms of implementation because certain portion of the code is commented out and still remains in the “release”, there are lots of “print” statements, and most importantly, some part of the code has no comment which makes it very difficult to understand.

3. I tried to contact the author but I haven’t gotten responds yet.

The fatal problem that puts an end to our attempt is the following error that I cannot identify a solution. Since “load_ipython_extension (InteractiveShell)” function is the API been called after importing an extension and “InteractiveShell” instance is passed as the only argument. While in the source code of this tool, there is a line calling the “event” method of “InteractiveShell” class which doesn’t exist. So I sense either the code is mis-implemented or the version or some configuration of iPython is mismatched with the code.

Though we probably are not able to run the tool in near future but at least we can get some insights on what the tool does in terms of capturing provenance and how it’s implemented. So the following is what I found after looking at the source code:

The provenance information is captured as a graph while each node is a function or variable and the information will be constructed in the way which is function oriented. For each node, the name, description, input, output and dependency (which node it depends on) is captured and those information will be refactored to PROV model. Another tool called “Prov-O-Viz” can be integrated with this tool in order to visualize the PROV formed provenance. There are certain APIs or functions this tool used to interact with iPython notebook, I think we can use those to capture some of the provenance information of out own notebook. Or we can do a step further that we can implement our own iPython extension to capture provenance information of iPython notebooks. Though it’s somewhat overlapped with this tool, we can add new functionality or simplify in the way that meets our requirements. Anyway, I think we need to discuss about that.

Leave a Reply Cancel reply