May 30: Pegasus.

Today I read a bit out of the Workflows for E-Science book I found in the library (one of the only books around on workflows, at all.) I managed to somehow delete the contents of my DataONE folder on my computer, but that’s not an issue because I had everything backed up online.

I then did some more background research, looking at that YouTube video suggested by Bertram on data curation and workflows. I looked at the website for it, but there’s little to do but note that human components can be added into a workflow. I’m increasingly unsure of the ways that we’re thinking of defining workflows – does it have human interaction? what level of recursion is involved? how deep is the embedding? is it a shim? – I feel that all of these are too binary, and I’d like to see a more overarching process used to define workflows. Especially ones that might have all of these, but don’t have to. But I’m pretty sure that’s Karthik’s work – he’s looking at workflow systems, I’m merely trying to come up with a way of categorising them.

After doing some lovely productive work, I decided to see if I couldn’t get to the bottom of installing Pegasus. My bash skills are not up to scratch, it seems. One has to install condor, as well, with Pegasus, because it is meant for multiple clusters. Condor took me ages. Partially because I got so fed up after an hour and a half of messing around in the terminal, took a break for an early dinner, and only managed it after I had cooled down a bit, with the help of a friend in one spot.

Once I installed Pegasus, I noticed that this was the first one that was unlike the others – the diagrams on the website are just that, diagrams. The actual work is all done by code. I ran a couple of example workflows. The whole process is incredibly opaque, which doesn’t necessarily mean it’s more powerful than any other WS. I then read all of the documentation on the site – there’s some pretty interesting stuff going on, but nothing I can say at the moment. I’ll have to look into the articles written about it. But it looks like Pegasus, especially when it is used with Kepler, is the most powerful WS out there. Which means it is going to have the most data that I personally can use. So, all in all, a productive day. Tomorrow – research, plan for the meeting, run through the past blog posts for any leads I didn’t follow.