Text Processing Methods for Data Extraction (PDF to HTML conversion)

Tried for mac: VeryPDF “PDF to Any Converter” Did not like it. ย PDF to HTML was not good. ย PDF to Excel was ok, but one complaint is the documents are placed into a new folder Might be useful: http://sourceforge.net/projects/pdftohtml/. From the first freeware/trial ware software I tried, I’m definitely dog-earing Continue reading Text Processing Methods for Data Extraction (PDF to HTML conversion)

Continue Scraping, Introduce Quality Control with Hashes

Continuation and completion of harvesting with quality control / assurance exploration using hashes and checksum software. 5 months agoReplyRetweetFavorite1 more Start 97 – 77 97 contains year 3 and offset 450 Start at 12:05 Save text file Topsy-97-77 End at 12:21 New File Topsy-76-56 56 ends at Y3040 Expand to Continue reading Continue Scraping, Introduce Quality Control with Hashes