Open Source Code – Screen Scraper

After three hours of madly cycling around the city looking for keys to my friend’s flat who I had lent my computer to without getting the code off of it which I needed to upload to the SQL server, I finally was able to upload the python code that Steve and I (but mostly and Steve) wrote for the myExperiment screen mining process.

workflow-screen-scraper.py This code uses Beautiful Soup to drag everything that can be dragged off of the front end of a workflow on myExperiment into a .csv. This includes:

  • Workflow URL
  • .svg url
  • Title
  • Date uploaded
  • date updated (if any)
  • User profile
  • Workflow system
  • Description
  • Tags
  • Amount of Views
  • Amount of Downloads
  • # of credits
  • # of attributions
  • # of tags
  • # of favourites
  • # of ratings
  • # of reviews
  • # of comments
  • # of authors
  • # of titles
  • # of descriptions
  • # of inputs
  • # of processors
  • # of outputs
  • # of beanshells
  • # of datalinks
  • # of corrdinations

In that order. It does so at a request per second, so that it doesn’t destroy the server. As I’m going through this, I realise that I can get more information – the types of outputs, the types of beanshells. I’ll upload an edited code when I get the chance to edit this and run it again. But here it is for now. As it says in the code, this is as free as rock pythons in the everglades. Have at it. (Just don’t publish before we do.)