Week 6 report – DataONE Notebooks

Hi All,

Last week, we start with a Postgres based system to implement RPQ queries. This week, we are trying to apply it to an example. So Yaxing, one of our group member, provided couple of conceptual workflows to query. They are from the actual “scientific” world, we pick the example 3: model driver processing, as a start point. In order to encode the information in a uniformed way, we choose to use Json (since xml is a little “fat” on size). Then we use a Python module to serialize the json file and convert into csv file in order to feed into our system.

Yaxing provided couple of queries he want to ask. I think the most interesting one is “if there is a new version of GPCP data, what process steps need to be rerun and who shall be notified to do the rerun?”. The corresponding RPQ to the first one is “Query.reference.(used.wasGeneratedBy)* (After adding self loop to GPCP)”. We are able to show the result highlighted in the original graph as a output of the system.

Also couple of improvement of the output graph. With the help of different kinds of labels, we can distinguish different type of nodes with colors and get a nicer layout.

Next week, we’ll focus on implementing RPQ in Datalog, possibly CRPQ (conjunctive RPQ). The queries Yaxing provided seems quite trivial, we’ll come up with some more challenge queries.

Best,
Tianhong

Leave a Reply Cancel reply