Text Processing Methods for Data Extraction (PDF to HTML conversion)

Tried for mac: VeryPDF “PDF to Any Converter” Did not like it.  PDF to HTML was not good.  PDF to Excel was ok, but one complaint is the documents are placed into a new folder Might be useful: http://sourceforge.net/projects/pdftohtml/. From the first freeware/trial ware software I tried, I’m definitely dog-earing Continue reading Text Processing Methods for Data Extraction (PDF to HTML conversion)

Consolidating Year 1 – Year 4 @DataONEorg Tweets

I am continuing quality control efforts today. From looking at checksums for the files, some of the 147 appear to be the same. This concerns me due to the possibility of human error (my error) in creating the files, since I scraped tweets manually with a browser extension, rather than Continue reading Consolidating Year 1 – Year 4 @DataONEorg Tweets

Continue Scraping, Introduce Quality Control with Hashes

Continuation and completion of harvesting with quality control / assurance exploration using hashes and checksum software. 5 months agoReplyRetweetFavorite1 more Start 97 – 77 97 contains year 3 and offset 450 Start at 12:05 Save text file Topsy-97-77 End at 12:21 New File Topsy-76-56 56 ends at Y3040 Expand to Continue reading Continue Scraping, Introduce Quality Control with Hashes

Scraping @DataONEorg Tweets Off the Web with Browser Extensions

An earlier method I tried was unable to harvest tweets mentioning @DataONEorg using the Google Chrome Browser extension, “Scraper” Scraper is a simple data mining extension for Google Chrome™ that is useful for online research when you need to quickly analyze data in spreadsheet form. Reviewing some of the software Continue reading Scraping @DataONEorg Tweets Off the Web with Browser Extensions

Harvesting @DataONEorg Twitter Mentions via Topsy

The previous notebook entry concerned mentions of @DataONEorg on Twitter. I established the following: The oldest tweet is from 2 years ago. It is dated July 29, 2012. This tweet is accessible from here: http://topsy.com/s?q=%40DataONEorg&window=a&type=tweet&sort=date&offset=990 The very first re-tweet of @DataONEorg was March 15, 2011. This was 5 months after Continue reading Harvesting @DataONEorg Twitter Mentions via Topsy

DataONE Community Engagement via Twitter

DataONE has been around since 2010. It’s an NSF project so it’s continually evaluated for performance. One metric could be reach and engagement on social media, as a measure of awareness about DataONE. Since I’ve looked at open science sentiment analysis  before, I volunteered to poke around a bit on Continue reading DataONE Community Engagement via Twitter

#OpenScience Sentiment Analysis via Twitter Data

In earlier post I mentioned that I would like to look at positive sentiments such as “I like @figshare” or “I prefer @figshare” or “I use @figshare” across twitter. A quick Web search on Google for “archive of past tweets” (without quotes) brought my attention to this September 4, 2013 article on Mashable: Continue reading #OpenScience Sentiment Analysis via Twitter Data

Who Follows Whom? Exploring Open Science Social Networks

A few items to comment on. First, my personal e-mail was sent a notification from twitter that suggested certain other organizations to follow, apparently based on my interest in figshare – the title of the e-mail was “Suggestions based on figshare” Twitter suggested “Similar to figshare” includes: PLOS (@PLOS) Open Continue reading Who Follows Whom? Exploring Open Science Social Networks

Exploring #OpenScience Communities via Twitter

Continuing some initial explorations on how to explore data sharing practices among users of an online repository such as figshare. As an active (or more accurately, formerly active) twitter user (@mountainsol), I recall finding that some of my information was stored by a service called “favstar.” Twitter has the ability Continue reading Exploring #OpenScience Communities via Twitter

#OpenScience Social Networks: Facebook and Google+

Continuing to look at figshare today. Facebook has some limited data available concerning users. It is not possible to see “who likes figshare” on Facebook because that information is private, unlike twitter, which allows any user to view the “followers” and “friends” of a twitter user. The limited data that Continue reading #OpenScience Social Networks: Facebook and Google+