June 2: Research – DataONE Notebooks

The past two days have seen a fair amount of normal life cutting into my work time, which is why I haven’t posted.

What I did manage to do is get an abstract ready for the Open Knowledge Foundation Conference, which I hope to send off in the next couple of hours. Hopefully, I’ll be able to go there and present on it. It’d be a great way to take a look at what I’ve been doing from a different perspective, and to really clarify where we are and what I know. Our plan is to decide what we’re going to be looking at next week, and then really start attacking a large number of workflows in the coming month. So, all that is good.

I also managed to talk to Bertram and Karthik for a good long time today, as they weren’t on the call two days ago. I have a better idea of where I’m going, which is good.

My to-do list for this week and weekend:

Become Familiar with SPARQL and mining myExperiment for data.
Grab 6-10 Kepler + Taverna + Others – to look at complexity.
Look over these, run them myself, and see how they work. Then, we’ll go over them next week to see if we can come up with any interesting things regarding their classification and complexity. I’ll try and grab ones with images, as those will be easier to go over in the group call.
We’ll then pick out a dataset, and the next month will be analysing and looking at that dataset. This will be the main information part for the publication. So, what we need to do is decide what we’re going to be looking at based on the example workflows, and then go from there.
It would be useful to come up with a template for workflows, the deviations from which will help classify ones in the future. For instance, what about the number of branches? Or the level of recursion? This will be similar to wetland ecology, which uses classifications for streams. Also, qualitative differences – is there a difference between little use and more use? Or is it only publication and documentation?

Talk to you tomorrow.