{"id":3368,"date":"2019-05-24T20:41:31","date_gmt":"2019-05-24T20:41:31","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=3368"},"modified":"2019-05-24T20:41:31","modified_gmt":"2019-05-24T20:41:31","slug":"week-1-exploring-dataone-and-provenance","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/prov-self\/week-1-exploring-dataone-and-provenance\/","title":{"rendered":"Week 1 \u2013 Exploring DataONE and Provenance"},"content":{"rendered":"\n
Hello everyone, this is Yilin, the intern working on Project 2: Provenance for Self or Others? A Study with Hands-on Experiments<\/em>. I am very glad to work with you guys in DataONE and hope all of us have a great experience this summer. Research is always a charming thing for me and pursuing a Ph.D. degree has become my next goal. So if you have any questions related to research, feel free to discuss with me. Regarding the project, lots of works have been done this week. The Meeting with my mentor Bertram Lud\u00e4scher went really well where we clarify the objectives of this project as well as further steps we might take in the following weeks.<\/p>\n\n\n\n Big picture for project 2<\/strong><\/p>\n\n\n\n Project 2 is divided into two parts. The first part is “an environment scan” of current researches on data provenance. The goal of this part is to solve the major question \u201chow people use data provenance and what kind of data provenance tools have been used in the academic discipline\u201d. The outcome for this part is an annotated bibliography. As for the second part, hands-on research and programming will be launched, with a report as the output for this part.<\/p>\n\n\n\n What is provenance?<\/strong><\/p>\n\n\n\n Provenance is a quite new concept. However, people encounter provenance almost every day. The definition of provenance differs from different fields. The definition of provenance in dictionary Merriam-Webster is \u201c ORIGIN, SOURCE\u201d or \u201cthe history of ownership of a valued object or work of art or literature\u201d. Regarding OPM (Open Open Provenance Model), the article (Luc Moreau et al.2008) illustrates that “Provenance is well understood in the context of art or digital libraries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object\u2019s life cycle.\u201d While provenance in W3C is defined as a record that describes entities and processes involved in producing and delivering or otherwise influencing a certain resource. In the new discipline \u201cblockchain\u201d, provenance also has its particular meaning. Data provenance, which combined blockchain, is more likely a \u201cData Identity\u201d, showing when the data was created, who collected the data, what kinds of operations had been launched on the data, etc. Nobody can change information of \u201cData Identity\u201d and researchers in the future can easily track information of this data to assess its authenticity and do reproducibility. <\/p>\n\n\n\n Type of provenance<\/strong><\/p>\n\n\n\n Herschel et.al (2017) explained provenance and classify provenance into four main types, namely Provenance meta-data, Information system provenance, Workflow provenance, and Data Provenance. The method to differentiate each type form others could be explained as follows. Meta-data itself can be regarded as provenance and operations related to it can also be seen as provenance. General meta-data tend to assign meaning on the data while provenance meta-data focused more on data derivation process. Based on the definition of provenance in W3C, data like the size of a file is not the provenance while the date of creation is the provenance. When we limit the context of provenance to information system, this kind of provenance could be called Information system provenance. Furthermore, by restricting the type of production process to so-called workflows which helps scientists conceptualize and manage the analysis process at each step, provenance becomes workflow provenance. Sometimes, scientists could use provenance to track the processing of individual data items, then this kind of provenance is called data provenance.<\/p>\n\n\n\n Application of provenance<\/strong><\/p>\n\n\n\n Provenance is currently widely used in many fields, and the application of provenance could be summarized based on paper Herschel et.al 2017). The table below shows the summarization.<\/p>\n\n\n\n