{"id":116,"date":"2011-07-13T17:29:39","date_gmt":"2011-07-13T17:29:39","guid":{"rendered":"http:\/\/notebooks.dataone.org\/workflows\/?p=116"},"modified":"2013-05-15T15:23:01","modified_gmt":"2013-05-15T15:23:01","slug":"wednesday-scrape-scrape","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/data-analysis\/wednesday-scrape-scrape\/","title":{"rendered":"Wednesday: Scrape Scrape"},"content":{"rendered":"
Today I wanted to see if it would be possible to get more data off of the site automatically, which might be useful.<\/p>\n
It was.<\/p>\n
scrapescrape.py<\/a> = The new code for my scraping of myExperiment.<\/p>\n I figured out that I could, in processors, get the amount of embedded workflows in each Taverna workflow. I could also get their names, and their description. Furthermore, I could get the names of each beanshell, input, output, etc. I could also mine for types of beanshells being used, and the amount being used of those individually (say, stringconstants vs. beanshells). So, I’m currently mining all of that.<\/p>\n Added on to this, I figured that I could also mine the amount of versions. So, that’s being done.<\/p>\n I think some of these will have potentially very interesting things to say about the project. In any event, I just saved myself going through and counting sections manually, which makes me happy. To not mention having coded far more than I normally can in a day.<\/p>\n","protected":false},"excerpt":{"rendered":" Today I wanted to see if it would be possible to get more data off of the site automatically, which might be useful. It was. scrapescrape.py = The new code for my scraping of myExperiment. I figured out that I could, in processors, get the amount of embedded workflows in Continue reading Wednesday: Scrape Scrape<\/span>