Text Processing – DataONE Notebooks

Text Processing Methods, Continued (PDF to HTML Conversion)

Posted on April 14, 2014 by Tanner Jessel — No Comments ↓

I am continuing evaluation of some text processing tools that I began in an earlier open notebook post on the same topic. I also had an idea that perhaps I should open my PDF documents in Word, then re-save them as HTML. That workflow might standardize the formatting to something less Continue reading Text Processing Methods, Continued (PDF to HTML Conversion)→

Text Processing Methods for Data Extraction (PDF to HTML conversion)

Posted on April 10, 2014 by Tanner Jessel — No Comments ↓

Tried for mac: VeryPDF “PDF to Any Converter” Did not like it. PDF to HTML was not good. PDF to Excel was ok, but one complaint is the documents are placed into a new folder Might be useful: http://sourceforge.net/projects/pdftohtml/. From the first freeware/trial ware software I tried, I’m definitely dog-earing Continue reading Text Processing Methods for Data Extraction (PDF to HTML conversion)→