DataONE Notebooks

2014 Summer Internship Program

Posted on May 8, 2014 by Amber Budden — No Comments ↓

The 2014 DataONE Summer Internship research activity will launch shortly. Check back for updates at the end of May / beginning of June.

Web Scraping with Python Libraries

Posted on April 16, 2014 by Tanner Jessel — 1 Comment ↓

The previous two notebook entries established some methods of text processing. The point is to create a file system populated with HTML formatted text documents. This permits the directory to be crawled and scraped. A common way to do this is with Python and Beautiful Soup. Beautiful Soup is a Python Continue reading Web Scraping with Python Libraries→

Text Processing Methods, Continued (PDF to HTML Conversion)

Posted on April 14, 2014 by Tanner Jessel — No Comments ↓

I am continuing evaluation of some text processing tools that I began in an earlier open notebook post on the same topic. I also had an idea that perhaps I should open my PDF documents in Word, then re-save them as HTML. That workflow might standardize the formatting to something less Continue reading Text Processing Methods, Continued (PDF to HTML Conversion)→

Text Processing Methods for Data Extraction (PDF to HTML conversion)

Posted on April 10, 2014 by Tanner Jessel — No Comments ↓

Tried for mac: VeryPDF “PDF to Any Converter” Did not like it. PDF to HTML was not good. PDF to Excel was ok, but one complaint is the documents are placed into a new folder Might be useful: http://sourceforge.net/projects/pdftohtml/. From the first freeware/trial ware software I tried, I’m definitely dog-earing Continue reading Text Processing Methods for Data Extraction (PDF to HTML conversion)→

Data Management for Research Output with OpenWetWare Wiki

Posted on April 2, 2014 by Tanner Jessel — No Comments ↓

Although I want to be a professional data manager and have extensive training in data management, in practice I have realized it’s pretty tough to do, even for a small data analysis project like the Figshare users’ survey. I did data analysis for that on another computer, I was in Continue reading Data Management for Research Output with OpenWetWare Wiki→

Early adopters of open research output: a study of the motivations and opinions of Figshare.com users (Poster)

Posted on February 28, 2014 by Tanner Jessel — No Comments ↓

This 3.8 MB poster (which actually exceeds the file size that may be uploaded onto this WordPress open research notebook) was presented at The University of Tennessee’s College of Communication and Information Research Symposium. It is a first public look at some research effort that has been discussed in other Continue reading Early adopters of open research output: a study of the motivations and opinions of Figshare.com users (Poster)→