{"id":1345,"date":"2013-06-28T22:40:00","date_gmt":"2013-06-28T22:40:00","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=1345"},"modified":"2013-07-10T23:37:08","modified_gmt":"2013-07-10T23:37:08","slug":"week-2-provwg-meeting-and-plans-for-next-actions-on-pbase","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/pbase\/week-2-provwg-meeting-and-plans-for-next-actions-on-pbase\/","title":{"rendered":"Week 2: ProvWG Meeting and Plans for Next Actions on PBase"},"content":{"rendered":"<p>This week I attended ProvWG meeting at NYU-Poly. This was a two day meeting with the main focus on PBase and D-PPROV [MDB+13], a recent project going on in ProvWG that some of its ideas might be related to PBase project.<\/p>\n<p>D-PPROV is an extension to\u00a0<a href=\"http:\/\/www.w3.org\/TR\/prov-dm\/\" target=\"_blank\">W3C PROV\u00a0<\/a>provenance model\u00a0aimed at representing process structure. \u00a0The main theme of the discussion was what are the challenges with the existing PROV model that motivate a new version of it, what kind of questions the users want to pose on provenance traces, and how to map those questions to the model. After a discussion on different classes of users (and their respective use cases), the group concluded that PROV is not sufficient to answer some of these questions and discussed on how D-PROV should be modeled to address those.<\/p>\n<p>The second half of the meeting focused more specifically on PBase project. The main goal of this project is that given a repository of traces presented as data graphs (DAGs), which queries on it can be useful in terms of provenance and how to answer them in an efficient and scalable manner (DAGs can be shown by \u3008X, l, Y\u3009 triples to show\u00a0 node X is connected to Y via an edge labeled l). In other words, PBase stores provenance traces with a new format in order to provide more query capability.<\/p>\n<p>The figure below shows a simple architecture for PBase.\u00a0Based on this figure, the main phases for PBase project are as follows:<\/p>\n<figure id=\"attachment_1557\" aria-describedby=\"caption-attachment-1557\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/notebooks.dataone.org\/wp-content\/uploads\/2013\/06\/11.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1557\" alt=\"PBase Architecture\" src=\"https:\/\/notebooks.dataone.org\/wp-content\/uploads\/2013\/06\/11-300x163.jpg\" width=\"300\" height=\"163\" srcset=\"https:\/\/notebooks.dataone.org\/wp-content\/uploads\/2013\/06\/11-300x163.jpg 300w, https:\/\/notebooks.dataone.org\/wp-content\/uploads\/2013\/06\/11.jpg 620w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-1557\" class=\"wp-caption-text\">PBase Architecture<\/figcaption><\/figure>\n<p><strong>Phase 1: Translate.<\/strong> take provenance data packages as input and produce some databases of our choice.\u00a0 A number of databases were suggested as the PBase database: MongoDB, PostgreSQL, SPARQL, RPQ\/2, and Neo4j (a graph database). And there were different suggestions for Tripe-store: Jena, Sesame, 4store, Virtuoso, AllegroGraph.\u00a0The main criteria in selecting a Triple Store and database for PBase are of compatibility with the existing tools in ProvWG, simplicity, and performance.<\/p>\n<p><a href=\"http:\/\/www.neo4j.org\/\">Neo4j<\/a>, an open source, fully transactional, eenterprise-grade NoSQL (Not only SQL) graph database, was selected as the PBase storage because its simplicity makes answering some of the graph queries quite straightforward. Neo4j stores property graphs with nodes that form paths in the graph, directed relationships between them, properties for each node (key and a value), and indexes for look-ups.<\/p>\n<p>&#8220;A Graph &#8211; records data in -&gt; Nodes &#8211; which have -&gt; Properties&#8221;<\/p>\n<p>&#8221; A Graph Database &#8211; manages a -&gt; Graph and &#8211; also manages related -&gt; Indexes&#8221;<\/p>\n<p>Neo4j is known to be \u201cwhiteboard friendly\u201d meaning that if you can draw the design as boxes on a whiteboard, you can store them on Neo4j [RWC12].<\/p>\n<p><strong>Phase 2: Provenance Queries.<\/strong> working on a set of provenance test queries that PBase should be able to answer.<\/p>\n<p>There are several languages that interoperate with Neo4j: Java code, REST, Cypher, Ruby console, and others. The one that we use for this project is Cypher because as it is said on Neo4j <a href=\"http:\/\/www.neo4j.org\/learn\/cypher\">tutorial for Cypher<\/a>,<\/p>\n<ul>\n<li>it is human readable and expressive<\/li>\n<li>MATCHes patterns in the graph<\/li>\n<li>is about the what not how<\/li>\n<\/ul>\n<p><strong>Phase 3: Test.\u00a0<\/strong>There are a number of provenance traces repositories that can be used in the test phase of the project. The main suggestions are: Vistrail Provenance traces [CFK+13] ,\u00a0<a href=\"https:\/\/sites.google.com\/site\/provbench\/provbench-at-bigprov-13\">ProvBench<\/a>, &#8230;.<\/p>\n<p><strong>Phase 4: Visualization of the Output. <\/strong>For this phase, we can use <a href=\"http:\/\/www.graphviz.org\/\" target=\"_blank\">Graphviz<\/a>, <a href=\"http:\/\/www.neo4j.org\/develop\/visualize\" target=\"_blank\">Neo4j graph visualization<\/a>, as well as other graph visualization tools.<\/p>\n<p>In general the meeting was a good experience in that after talking with ProvWG members, I feel like I have a more clear idea about the PBase project goals and how I am going to contribute to PBase project. I will be in contact with my mentors and other ProvWG members by attending ProvWG\u00a0weekly video conferencing meetings.\u00a0I left NY city on\u00a0Tuesday evening\u00a0with\u00a0a bad cold as a souvenir\u00a0but I was lucky enough not to miss my connecting flight to Sacramento after a delay in my flight from NY city.<\/p>\n<p>For the second half on this week I was on a leave from PBase project to attend some events in San Jose.\u00a0I took an early train from Sacramento and directly headed toward Fairmont San Jose where USENIX <b><a href=\"https:\/\/www.usenix.org\/conference\/wiac13\" target=\"_blank\">WiAC&#8217;13<\/a>\u00a0<\/b>conference\u00a0was held. This is an annual conference with the main goal of discussing some of the challenges women face in the professional computing world, as well as, networking and sharing ideas.\u00a0On the rest of my stay in San Jose, I attended <a href=\"http:\/\/www.truststc.org\/wise\/\">WISE<\/a> (a series of mentoring workshops and talks on privacy and security) at San Jose State University.<\/p>\n<p>In addition, this week I tried to familiarize myself with Neo4j and <a href=\"http:\/\/www.neo4j.org\/learn\/cypher\">Cypher<\/a> by going through some online tutorials\/sample queries.<\/p>\n<p>Next week I am going to work on collecting some Wf traces, methods for translating those into\u00a0Neo4j format, and set up the development environment.<\/p>\n<p><strong>References.<\/strong><\/p>\n[MDB+13] Missier, Paolo, Saumen Dey, Khalid Belhajjame, Victor Cuevas-Vicenttin, and Bertram Ludaescher. &#8220;D-PROV: extending the PROV provenance model with workflow structure.&#8221; (2013).<\/p>\n[CFK+13] Chirigati, Fernando, Juliana Freire, David Koop, and Cl\u00e1udio Silva. &#8220;<a href=\"http:\/\/www.edbt.org\/Proceedings\/2013-Genova\/papers\/workshops\/a47-chirigati.pdf\">VisTrails provenance traces for benchmarking<\/a>.&#8221; In\u00a0<i>Proceedings of the Joint EDBT\/ICDT 2013 Workshops<\/i>, pp. 323-324. ACM, 2013.<\/p>\n<div id=\"gs_cit2\">[RWC12] Redmond, Eric, Jim R. Wilson, and Jacquelyn Carter. <i>Seven databases in seven weeks: A Guide to modern databases and the NoSQL movement<\/i>. Pragmatic Bookshelf, 2012.<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week I attended ProvWG meeting at NYU-Poly. This was a two day meeting with the main focus on PBase and D-PPROV [MDB+13], a recent project going on in ProvWG that some of its ideas might be related to PBase project. D-PPROV is an extension to\u00a0W3C PROV\u00a0provenance model\u00a0aimed at representing <a class=\"more-link\" href=\"https:\/\/notebooks.dataone.org\/pbase\/week-2-provwg-meeting-and-plans-for-next-actions-on-pbase\/\">Continue reading <span class=\"screen-reader-text\">  Week 2: ProvWG Meeting and Plans for Next Actions on PBase<\/span><span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":40,"featured_media":1557,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"_links":{"self":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/1345"}],"collection":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/comments?post=1345"}],"version-history":[{"count":59,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/1345\/revisions"}],"predecessor-version":[{"id":1598,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/1345\/revisions\/1598"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/media\/1557"}],"wp:attachment":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/media?parent=1345"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/categories?post=1345"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/tags?post=1345"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}