About Tanner Jessel

I am a graduate research assistant funded by DataONE and pursuing a Masters in Information Sciences with an Interdisciplinary Graduate Minor in Computational Science. I assist scholarly research efforts supporting the Sociocultural, Usability and Assessment, and Member Nodes working groups within DataONE. I am based at the Center for Information and Communication Studies at the University of Tennessee School of Information Science in Knoxville, Tennessee.

Web Scraping with Python Libraries

The previous two notebook entries established some methods of text processing. The point is to create a file system populated with HTML formatted text documents. This permits the directory to be crawled and scraped. A common way to do this is with Python and Beautiful Soup. Beautiful Soup is a Python Continue reading Web Scraping with Python Libraries

Text Processing Methods, Continued (PDF to HTML Conversion)

I am continuing evaluation of some text processing tools that I began in an earlier open notebook post on the same topic. I also had an idea that perhaps I should open my PDF documents in Word, then re-save them as HTML.  That workflow might standardize the formatting to something less Continue reading Text Processing Methods, Continued (PDF to HTML Conversion)

Text Processing Methods for Data Extraction (PDF to HTML conversion)

Tried for mac: VeryPDF “PDF to Any Converter” Did not like it.  PDF to HTML was not good.  PDF to Excel was ok, but one complaint is the documents are placed into a new folder Might be useful: http://sourceforge.net/projects/pdftohtml/. From the first freeware/trial ware software I tried, I’m definitely dog-earing Continue reading Text Processing Methods for Data Extraction (PDF to HTML conversion)

Data Management for Research Output with OpenWetWare Wiki

Although I want to be a professional data manager and have extensive training in data management, in practice I have realized it’s pretty tough to do, even for a small data analysis project like the Figshare users’ survey. I did data analysis for that on another computer, I was in Continue reading Data Management for Research Output with OpenWetWare Wiki

Early adopters of open research output: a study of the motivations and opinions of Figshare.com users (Poster)

This 3.8 MB poster (which actually exceeds the file size that may be uploaded onto this WordPress open research notebook) was presented at The University of Tennessee’s College of Communication and Information Research Symposium. It is a first public look at some research effort that has been discussed in other Continue reading Early adopters of open research output: a study of the motivations and opinions of Figshare.com users (Poster)

Consolidating Year 1 – Year 4 @DataONEorg Tweets

I am continuing quality control efforts today. From looking at checksums for the files, some of the 147 appear to be the same. This concerns me due to the possibility of human error (my error) in creating the files, since I scraped tweets manually with a browser extension, rather than Continue reading Consolidating Year 1 – Year 4 @DataONEorg Tweets

Continue Scraping, Introduce Quality Control with Hashes

Continuation and completion of harvesting with quality control / assurance exploration using hashes and checksum software. 5 months agoReplyRetweetFavorite1 more Start 97 – 77 97 contains year 3 and offset 450 Start at 12:05 Save text file Topsy-97-77 End at 12:21 New File Topsy-76-56 56 ends at Y3040 Expand to Continue reading Continue Scraping, Introduce Quality Control with Hashes

Scraping @DataONEorg Tweets Off the Web with Browser Extensions

An earlier method I tried was unable to harvest tweets mentioning @DataONEorg using the Google Chrome Browser extension, “Scraper” Scraper is a simple data mining extension for Google Chrome™ that is useful for online research when you need to quickly analyze data in spreadsheet form. Reviewing some of the software Continue reading Scraping @DataONEorg Tweets Off the Web with Browser Extensions

Harvesting @DataONEorg Twitter Mentions via Topsy

The previous notebook entry concerned mentions of @DataONEorg on Twitter. I established the following: The oldest tweet is from 2 years ago. It is dated July 29, 2012. This tweet is accessible from here: http://topsy.com/s?q=%40DataONEorg&window=a&type=tweet&sort=date&offset=990 The very first re-tweet of @DataONEorg was March 15, 2011. This was 5 months after Continue reading Harvesting @DataONEorg Twitter Mentions via Topsy

DataONE Community Engagement via Twitter

DataONE has been around since 2010. It’s an NSF project so it’s continually evaluated for performance. One metric could be reach and engagement on social media, as a measure of awareness about DataONE. Since I’ve looked at open science sentiment analysis  before, I volunteered to poke around a bit on Continue reading DataONE Community Engagement via Twitter