After looking through the final list of queries that have to be implemented in PBase project, I realized that some of them are based on nodes (actors and data entities) properties (e.g. type, version, …) and also, some of the queries refer to multiple runs of a workflow. After a discussion with my mentors and some other members of ProvWG, we decided that the immediate next steps are: (1) the Java code for converting the provenance traces in PROV-XML to Geoff, Cypher creation commands has to be modified to export all properties of the nodes into the Neo4j repository, (2) since some of the properties may be null in the traces, more traces has to be imported into to make sure we have not null values for each property, (3) in some cases, generating some properties and values by random can be helpful in initial test of the queries, (4) in order to differentiate traces of different workflows and different runs of a workflow from each other, wfID and runID have to be added as new properties to all of the nodes in the graph.
I looked through the Java code for running Cypher queries in Neo4j tutorial, and after applying some small modifications, now the code can be used to execute arbitrary queries on a local database. Also, I checked out some of the other available provenance traces: ProvGen (for extracting traces), VisTrails workflow tool. To keep consistency and not getting lost in the process of converting traces of different format, we decided to only use VisTrails traces for now. I am still working on importing more node properties into Neo4j database by modifying the Java code for converting traces from PROV-XML to Geoff, Cypher format.