{"id":116,"date":"2011-07-13T17:29:39","date_gmt":"2011-07-13T17:29:39","guid":{"rendered":"http:\/\/notebooks.dataone.org\/workflows\/?p=116"},"modified":"2013-05-15T15:23:01","modified_gmt":"2013-05-15T15:23:01","slug":"wednesday-scrape-scrape","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/data-analysis\/wednesday-scrape-scrape\/","title":{"rendered":"Wednesday: Scrape Scrape"},"content":{"rendered":"<p>Today I wanted to see if it would be possible to get more data off of the site automatically, which might be useful.<\/p>\n<p>It was.<\/p>\n<p><a href=\"https:\/\/github.com\/RichardLitt\/Understanding-Workflows\/blob\/master\/scrapescrape.py\">scrapescrape.py<\/a> = The new code for my scraping of myExperiment.<\/p>\n<p>I figured out that I could, in processors, get the amount of embedded workflows in each Taverna workflow. I could also get their names, and their description. Furthermore, I could get the names of each beanshell, input, output, etc. I could also mine for types of beanshells being used, and the amount being used of those individually (say, stringconstants vs. beanshells). So, I&#8217;m currently mining all of that.<\/p>\n<p>Added on to this, I figured that I could also mine the amount of versions. So, that&#8217;s being done.<\/p>\n<p>I think some of these will have potentially very interesting things to say about the project. In any event, I just saved myself going through and counting sections manually, which makes me happy. To not mention having coded far more than I normally can in a day.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today I wanted to see if it would be possible to get more data off of the site automatically, which might be useful. It was. scrapescrape.py = The new code for my scraping of myExperiment. I figured out that I could, in processors, get the amount of embedded workflows in <a class=\"more-link\" href=\"https:\/\/notebooks.dataone.org\/data-analysis\/wednesday-scrape-scrape\/\">Continue reading <span class=\"screen-reader-text\">  Wednesday: Scrape Scrape<\/span><span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":20,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[109],"tags":[],"_links":{"self":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/116"}],"collection":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/comments?post=116"}],"version-history":[{"count":5,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/116\/revisions"}],"predecessor-version":[{"id":1104,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/116\/revisions\/1104"}],"wp:attachment":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/media?parent=116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/categories?post=116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/tags?post=116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}