Wednesday: Scrape Scrape – DataONE Notebooks

Today I wanted to see if it would be possible to get more data off of the site automatically, which might be useful.

It was.

scrapescrape.py = The new code for my scraping of myExperiment.

I figured out that I could, in processors, get the amount of embedded workflows in each Taverna workflow. I could also get their names, and their description. Furthermore, I could get the names of each beanshell, input, output, etc. I could also mine for types of beanshells being used, and the amount being used of those individually (say, stringconstants vs. beanshells). So, I’m currently mining all of that.

Added on to this, I figured that I could also mine the amount of versions. So, that’s being done.

I think some of these will have potentially very interesting things to say about the project. In any event, I just saved myself going through and counting sections manually, which makes me happy. To not mention having coded far more than I normally can in a day.