This is my first week for DataONE summer intern. I am glad to meet the mentors Bertram Ludaescher, Timothy McPhillips and Paolo Missier. I am also happy to work on the project with another intern, Linh Hoang, who is a PhD student in iSchool of UIUC. She is warm-hearted and I have learned some tips of doing research from her.

Above all, the task of this intern project has been clarified to focus on YesWorkflow only. The objective is representing YW annotions, YW workflow models, and YW-reconstructed retrospective provenance information in RDF, and developing SPARQL queries simialr to existing Prolog/Datalog queries. During this week, I explored both the YW-prototype and YW-idcc-17 GitHub repositories, especially the simulate_data_collection example. I installed YW toolkit and its relevant dependent softwares on my PC and successfully executed the simple example task following README file. Then, I executed the simulate_data_collection example but came across some issues. With the help of my mentors and Linh, I solved most problems but some still remained to be solved. Generally, Linh and I got familiar with the whole process and structured commands of YW toolkit, and we designed a conceptual flow chart for YW toolkit to better understand it.

In addition, I read some relevant published articles about this project and online tutorials about PROV, ProvONE, RDF and SPARQL. Based on the prior experience from Data Cleaning class of my mentor, I basically understood some concepts including prospective provenance, retrospective provenance, programs (@begin, @end), ports (@in, @out, @param), special recon annotations (@uri, @log), and others (@desc, @as, @return). Besides, I learned about the relations between PROV and ProvONE for scientific workflows provenance.

I suppose the core task is to relate labelled elements in YW with ProvONE standard in RDF format. I will set about corresponding classes and associations with YW graph nodes and directed edges in the following weeks. Then, I am going to explore the extract_facts, model_facts and recon_facts in details.


Leave a Reply

Your email address will not be published.