#OpenScience Sentiment Analysis via Twitter Data

In earlier post I mentioned that I would like to look at positive sentiments such as “I like @figshare” or “I prefer @figshare” or “I use @figshare” across twitter.

A quick Web search on Google for “archive of past tweets” (without quotes) brought my attention to this September 4, 2013 article on Mashable:

You Can Now Search for Any Tweet in History

I think it would then make sense to map the sentiments via a crosswalk to the existing survey responses on use of @Figshare by early data management adopters.

It is also possible that there are some favorable or positive features that were not captured in the survey responses.

Navigated to the Topsy site – http://topsy.com/

There is a free trial of Topsy Pro Analytics.  I might use that after I get an idea of how Topsy’s basic features work.  I understand there are more advanced analytical abilities for “Pro” users.

I searched for “@figshare” under the “influencers” search option.

Figshare-Twitter-Analysis

Similar to the recommendations twitter sent to my personal e-mail when I initially followed @figshare,  I see @datadryad and @markhahnel.

The other top users are unfamiliar to me, but it is important to bear in mind the “free” version of topsy (probably) has analytics for the past 30 days.

I’ll now try a tweet version of the search, directing browser to http://topsy.com/tweets

I’ve typed in “I like @figshare” (without quotes)

It appears true that the data is limited – in fact, it appears to be limited to “Past 25 Days.”  I’m preserving a screen capture of that.

The first three results say explicity (as in has the explicit order of terms) “I like @figshare.”  However, after the first three, some do not have that order, although they do have the word “like” in it.

I’m given the option to view “I like @figshare” results on “Topsy Analytics.”

http://topsy.com/analytics?q1=I%20like%20%40figshare&via=Topsy

This gives me a view of “tweets per day” for the period September 11 – October 11.  I am taking a screen capture of that. Note that the data output is copyright 2013 Topsy Labs, Inc. I am hopeful that my research constitutes “fair use.”

Figshare-Topsy-Analysis

The maximum number of tweets per day, presumably for any tweet that contains both “like” and “@figshare” and also I, is 2.

Topsy promises me I can “get the full picture with trends, geo, sentiment, and more” if I upgrade.

For fun, I removed “I Like” and just left @figshare in the search.

I’m impressed – there are 1,327 replies to “@figshare” for the same period.  I’m also saving that screen output as figshare-only.png.

Figshare-only

For fun I added “@dataone” – which has 52 tweets during the same time period.

Now I’d like to add “#openscience” hashtag.

This greatly impresses me.  There are 3,173 tweets with the hashtag #openscience during the same time period.

I’m interested in that there were over 125 tweets concerning @figshare from September 15 to the 18th.  I had speculated in an earlier post this “spike” might be related to the RDA conference in Lisbon, Portugal, during that same time period.

The #openscience hashtag seems to be consistently higher than the @figshare mention – but also experiences a spike – over 350 uses – during the 09/15 to 09/18 time period. I’m saving this screen capture as figshare-openscience-compare.png.

figshare-openscience-compare

I’m now removing the “#openscience” hashtag to just look at @figshare. I’m curious about the spike. Unfortunately there’s no way to look at the actual tweets using the basic, non-paid version of topsy.

I just clicked “Advanced search.” It appears that there are some options with “Operators” – not quite Boolean.

Importantly, there is the option to search for an exact phrase, by using the familiar operator of quotations surrounding the phrase.

As I already demonstrated, it is possibly to query using a hashtag.

OR is another operator that might be useful.

And there is the option to use “site”

This might be useful for looking at references to the actual domain, figshare.com, or perhaps a “short” URL version.

For example, one of the more recent @figshare tweets concerns a presentation made by a researcher in environmental science.

The link is below:

http://figshare.com/articles/Coexistence_the_maintenance_of_biodiversity_and_its_consequences_for_ecosystem_functioning/805172

Note: figshare suggests “cites coming soon.”

Consistent with what appears to be figshare’s social networking priorities – I’m given the options to “share” via facebook, “tweet” (obviously via twitter), or “+1” on Google+.  I also have the option to embed.

What I’m interested in here are the short links.

The most obvious short link is that I am given a DOI link – which sadly I cannot cut and paste with much ease – here’s how it turned out (no modification to the formatting – just a direct cut and past from the figshare site using firefox to my DataONE open notebook browser editing pane:

                                                                                                Coexistence, the maintenance of biodiversity and its consequences for ecosystem functioning. Sean Tuck.                                                                 figshare.
http://dx.doi.org/10.6084/m9.figshare.805172
Retrieved  21:10, Oct 11, 2013 (GMT)

So let me just type it out:

http://dx.doi.org/10.6084/m9.figshare.805172

I’ve confirmed this link works and that I’ve typed it correctly.

I now clicked “share” – note that I’m already logged into my personal facebook. The link that has been given to me is: https://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fshar.es%2FE7XMb&t=Coexistence%2C+the+maintenance+of+biodiversity+and+its+consequences+for+ecosystem+functioning#

There is also a QR code posted as my image thumbnail.

I’ve saved a screencapture as figshare-fb-share.png.

figshare-fb-share

When shared it, it now has 1 share – but the link was unchanged.

I just deleted it from my timeline. Does not appear to impact the total shares – but anytime I re-load the page, the views of the item increases.

Now I am not logged in to twitter – but when I click “tweet” I now hat this:

Coexistence, the maintenance of biodiversity and its consequences for ecosystem functioning http://shar.es/E73PL via @figshare

So the key here for me is that the URL is indeed shortened – and it’s notable that “via @figshare” is appended. I’m not signing in to complete the tweet, but I might as well take a screen capture.  I’m saving the screen capture as figshare-tweet-share.png.

figshare-tweet-share

Next one to look at is “google plus.”  I’m not logged in to my Google account at the moment. To share via Google Plus requires a login – I’m not just publicly “plus one-ing” the item.

I’m now logged in – and it seems to have automatically completed the +1. Doing this publicly recommended the item to my circles “Public” and “Friends.” – I’m told “I publicly recommended this as Tanner Jessel. I have the option to add a comment.

The link that is shared via Google Plus is apparently the DOI version – http://dx.doi.org/10.6084/m9.figshare.805172

The title and the QR code are visible. It’s more difficult to take a screen capture because you must “hover” the mouse over the g+ icon, but I saved a screen capture as figshare-gplus-share.png.

figshare-gplus-share

I’ll now look at “embed.” In very light print, I just noticed the following text:

*The embed functionality can only be used for non commercial purposes. In order to maintain its sustainability, all mass use of content by commercial or not for profit companies must be done in agreement with figshare.

This might be an interesting divergence in use from other hosting / sharing platforms for things like presentations – slideshare for example, compared to figshare.

Note that slideshare has an affiliation with linkedin, see article “Linkedin Acquires Professional Content Sharing Platform for $119 M.” Figshare is affiliated with a publishing company, Digital Science.

I’ve clicked on the “embed” icon and now have a screen offering customization and the code to pull out to embed.  Might as well copy it here and past it.  I’ve saved a screen capture as figshare-embed-share.png.

figshare-embed-share
<iframe src=”http://wl.figshare.com/articles/805172/embed?show_title=1″ width=”568″ height=”502″ frameborder=”0″></iframe>
I’ve inserted the code into the plain text editor of my open notebook – it should appear in the space below the two horizontal rules immediately below:


While I’m observing this – I also want to point out that there are options to export to Ref. Manager, Endnote, and Mendeley. I think it’s worth revisiting that at a later point.

However: the point of the preceding foray into how sharing via figshare might  affect the URL should not be lost, particularly concerning how it is related to twitter data:

Sharing natively via the “tweet” feature will produce and publish to the twitterverse a short URL that is preceded by “share.es” – NOT a custom “figshare” specific domain (such as “goo.gl” – Google’s “vanity” web shortener).

The share.es function DOES append “via @figshare” and unless the user deletes the “credit” (which seems unlikely if the person sharing is a data sharing early adopter), then this should provide a reasonable way of tracking shares of figshare content via twitter – and hopefully insights into what exactly is being shared when considered en masse.

It is also important for me to know the various ways that a URL referencing figshare might show up in the literature search, which is the second part of my analysis that will also require a more methodical approach to be a true “meta analysis of early adopter implementation of figshare” or something in that vein.

I will continue that exploration in a subsequent post.

For the moment – I’ll return to using the quotations to modify my initial search – I like @figshare becomes “I like @figshare”

I also realized I can search for “all time”

Perhaps because it is limited as a free service – I get 10 results with no total count of the data.  There is also no export feature.

However there are 10 results per page, and 20 pages, for an expected 20 tweets in all time with the exact phrase “I like @figshare.”

It’s important to consider that some of these are RT (retweets), expressing the same information, however, here’s an example from twitter user “Jaime Headden @jaimeheadden

@rmounce @figshare If all I needed was a repository to share tables, notes, figures, raw data, then it works. But it’s not the paper itself.

This is difficult to pull apart – I just hovered my mouse over the “retweet” option to get the tweet number for the original tweet – and since I have no physical notebook at my desk I just wrote it on a receipt!

250984507371565057

Should be the number.

I want to embed the tweet or link to the status.

Instructions here:

https://dev.twitter.com/docs/embedded-tweets

Apparently wordpress will do this for me – I just need to paste the tweet URL. Did I transcribe the tweet’s number correctly? Yes.  Confirmed the link works.

https://twitter.com/jaimeheadden/statuses/250984507371565057

Ok maybe pulling out the status is not that hard:

https://twitter.com/ethanwhite/status/235552921775923200

Getting the hang of pulling out individual statuses…

https://twitter.com/PhilippKwon/status/235506844137844737

Here’s another potential search – but only a few potential comments:

“I love @figshare” – 4 comments from all time with that exact phrase.

“@figshare is good” – no tweets found

“@figshare is great” – 1 tweet found for all time.

“@figshare is ideal” – no tweets found

“@figshare is perfect”

Topsy might be tired of me searching these phrases – got hung up.

Might be worth coming up with some potential affirmative phrases and then systematically going through and executing them – I’ve done this before as sort of the “poor-man’s-hack” by kind of scripting a URL search – possible using Topsy’s syntax:

http://topsy.com/s?q=%22%40figshare%20is%20perfect%22&type=tweet

I just did this – http://topsy.com/s?q=%22via%20%40figshare%22&window=a&type=tweet

That’s the “Via @ figshare”

For all time, there are definitely 100 results.

Next 10 pages once, definitely 200 – and probably pointless to look for anymore yet we can “substitute” offset=20 for something very high to see what happens:

http://topsy.com/s?q=%22via%20%40figshare%22&window=a&type=tweet&offset=20

http://topsy.com/s?q=%22via%20%40figshare%22&window=a&type=tweet&offset=90

Still going strong at <http://topsy.com/s?q=%22via%20%40figshare%22&window=a&type=tweet&offset=200> – and the data only goes to 5 months ago.

So, with access to the full dataset, sentiment analysis or simply citation quickly becomes a true “Data Science” topic.  It will require some methodical looking around, compared to the poking around I’ve done immediately.

About Tanner Jessel

I am a graduate research assistant funded by DataONE and pursuing a Masters in Information Sciences with an Interdisciplinary Graduate Minor in Computational Science. I assist scholarly research efforts supporting the Sociocultural, Usability and Assessment, and Member Nodes working groups within DataONE. I am based at the Center for Information and Communication Studies at the University of Tennessee School of Information Science in Knoxville, Tennessee.

One Reply to “#OpenScience Sentiment Analysis via Twitter Data”

Leave a Reply

Your email address will not be published. Required fields are marked *

*