Note: the following content is posted in the DataONE Data Science Open Notebook as part of an ongoing research effort concerning “Open Notebook Science.”
From: Jessel, Tanner Monroe
Sent: Tuesday, November 13, 2012 1:36 PM
To: Mitchell, Chad Matthew
Subject: RE: document info
I am still working on setting up my computer in Hoskins.
First thing will be to install Gephi,
Then install an OCR / text extraction tool to pull out names and convert to a spreadsheet from these docs.
There is also a tool that will pull out proper names from uploaded text – if you want to go that route. Kind of data mining.
The problem is it won’t distinguish between a proper name in a cited paper (from let’s imagine, someone who died already but is nonetheless influential like charles darwin).
We should talk about how to contact vieglass on getting the registered users from docs.dataone.org.