Web Scraping with Python Libraries

The previous two notebook entries established some methods of text processing. The point is to create a file system populated with HTML formatted text documents. This permits the directory to be crawled and scraped. A common way to do this is with Python and Beautiful Soup. Beautiful Soupย is a Python Continue reading Web Scraping with Python Libraries

Harvesting @DataONEorg Twitter Mentions via Topsy

The previous notebook entry concerned mentions of @DataONEorg on Twitter. I established the following: The oldest tweet is from 2 years ago. It is dated July 29, 2012. This tweet is accessible from here: http://topsy.com/s?q=%40DataONEorg&window=a&type=tweet&sort=date&offset=990 The very first re-tweet of @DataONEorg was March 15, 2011. This was 5 months after Continue reading Harvesting @DataONEorg Twitter Mentions via Topsy