Week 3: Reproducibility of Script-Based Workflows

For this week, I continued to work on implementing queries by defining some generic queries, e.g.: for each sink node Y, compute all source nodes X that are in the lineage of Y similar: for each output node, compute all input nodes on which the output depends. Then I practiced more sophisticated queries such as are the outputs that do NOT depend on ALL inputs? or queries for LCA (Lowest common ancestor) problems: for two given data products X and Y, determine the node(s) that are “lowest” common ancestors of X and Y (“most recent common ancestor”).

Additionally, I also exploited noWorkflow which uses abstract syntax tree (AST) analysis, reflection, and profiling, to collect provenance without the need of a version control system and enable scientists to avoid using naming conventions to store files originated in previous executions. After that I tried some examples in noWorkflow, then started investigating some queries in this model.

Next week, I will discuss more about Yin-Yang which combining YW( allowing the user to define a wf model implicit in a script) & NW (capturing retrospective provenance from Python scripts).

Leave a Reply Cancel reply