Sunday: Readings – DataONE Notebooks

I found a document in my DataONE folder with a list of around thirty or so articles that I had not download; I have got them now. Among them was an article mentioned in the Groth paper that we read for last week, Analyzing the Gap Between Workflows and their Descriptions. Groth et al. quoted: “but [workflows] have an underlying parallel structure that can typically be described with a high-level pattern of a few components. [8]” In general, this paper merely talks about how to approach non-natural language descriptions in order to take workflows to the grid.

So, I went on to another one, Scientific Workflow Management and the Kepler System, with one of the co-authors being one of my mentors, Bertram Ludäscher. It identifies three main types of workflows: knowledge discovery workflows, automative/reengineering workflows, and high-performance data/computing workflows. I don’t think this is enough of a description – for instance, what does knowledge discovery entail? Does that mean just that it runs some stats? Or does it visualise the output for you? Or does it relay information? Or does it display information online for your peers and to broadcast results? Are the results even all delivered at the same time? Are they understandable by a human actor? That leaves a lot of questions open.

The paper also identifies requirements for good workflows:
• Access
• Service composition
• Scalability
• Detached execution
• Reliability
• User-interaction
• Smart re-runs
• Smart semantic links
• Data provenance
• Intuitive GUI
• Workflow granularities

Hmm. Alright. But those aren’t the only requirements…