{"id":1345,"date":"2013-06-28T22:40:00","date_gmt":"2013-06-28T22:40:00","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=1345"},"modified":"2013-07-10T23:37:08","modified_gmt":"2013-07-10T23:37:08","slug":"week-2-provwg-meeting-and-plans-for-next-actions-on-pbase","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/pbase\/week-2-provwg-meeting-and-plans-for-next-actions-on-pbase\/","title":{"rendered":"Week 2: ProvWG Meeting and Plans for Next Actions on PBase"},"content":{"rendered":"

This week I attended ProvWG meeting at NYU-Poly. This was a two day meeting with the main focus on PBase and D-PPROV [MDB+13], a recent project going on in ProvWG that some of its ideas might be related to PBase project.<\/p>\n

D-PPROV is an extension to\u00a0W3C PROV\u00a0<\/a>provenance model\u00a0aimed at representing process structure. \u00a0The main theme of the discussion was what are the challenges with the existing PROV model that motivate a new version of it, what kind of questions the users want to pose on provenance traces, and how to map those questions to the model. After a discussion on different classes of users (and their respective use cases), the group concluded that PROV is not sufficient to answer some of these questions and discussed on how D-PROV should be modeled to address those.<\/p>\n

The second half of the meeting focused more specifically on PBase project. The main goal of this project is that given a repository of traces presented as data graphs (DAGs), which queries on it can be useful in terms of provenance and how to answer them in an efficient and scalable manner (DAGs can be shown by \u3008X, l, Y\u3009 triples to show\u00a0 node X is connected to Y via an edge labeled l). In other words, PBase stores provenance traces with a new format in order to provide more query capability.<\/p>\n

The figure below shows a simple architecture for PBase.\u00a0Based on this figure, the main phases for PBase project are as follows:<\/p>\n

\"PBase<\/a>
PBase Architecture<\/figcaption><\/figure>\n

Phase 1: Translate.<\/strong> take provenance data packages as input and produce some databases of our choice.\u00a0 A number of databases were suggested as the PBase database: MongoDB, PostgreSQL, SPARQL, RPQ\/2, and Neo4j (a graph database). And there were different suggestions for Tripe-store: Jena, Sesame, 4store, Virtuoso, AllegroGraph.\u00a0The main criteria in selecting a Triple Store and database for PBase are of compatibility with the existing tools in ProvWG, simplicity, and performance.<\/p>\n

Neo4j<\/a>, an open source, fully transactional, eenterprise-grade NoSQL (Not only SQL) graph database, was selected as the PBase storage because its simplicity makes answering some of the graph queries quite straightforward. Neo4j stores property graphs with nodes that form paths in the graph, directed relationships between them, properties for each node (key and a value), and indexes for look-ups.<\/p>\n

“A Graph – records data in -> Nodes – which have -> Properties”<\/p>\n

” A Graph Database – manages a -> Graph and – also manages related -> Indexes”<\/p>\n

Neo4j is known to be \u201cwhiteboard friendly\u201d meaning that if you can draw the design as boxes on a whiteboard, you can store them on Neo4j [RWC12].<\/p>\n

Phase 2: Provenance Queries.<\/strong> working on a set of provenance test queries that PBase should be able to answer.<\/p>\n

There are several languages that interoperate with Neo4j: Java code, REST, Cypher, Ruby console, and others. The one that we use for this project is Cypher because as it is said on Neo4j tutorial for Cypher<\/a>,<\/p>\n