Although I want to be a professional data manager and have extensive training in data management, in practice I have realized it’s pretty tough to do, even for a small data analysis project like the Figshare users’ survey.
I did data analysis for that on another computer, I was in a rush, and did not bother to organize things as I created the output. I did not create metadata or document how I did things and what the output was. So even if I wanted to bother with my other computer, I don’t trust my workflows or the output because I can’t review what I did. And if I can’t evaluate my workflow or output easily, then another person would have even greater trouble.
I would also like to curate the output using the Figshare service, to better understand the Figshare service.
For those reasons I am attempting to reproduce my own analysis of the original dataset to pair with more complete documentation. So here goes.
Original document was obtained via e-mail on 9/29/13.
E-mail was from survey creator Ben Birch.
Documents attached for download were:
- Figshareuse Users Survey.docx
Created a new document folder, within the folder “DataONE”
Changed the default download in Safari to this new folder.
Clicked both to download.
Confirmed files present.
Changed file name “Figshare%20Users%20Survey.docx” to “2014-09-figshare-user-survey.docx”
.sav file is a “PASW Statistics data document” and 380 KB.
Opened “FIGSHAREUSERS_9_11_13.sav” using JMP Pro 10 for Mac.
JMP® Pro 10.0.2
Created folder “figshare-article-data-analysis” within “Documents/DataONE/2013-09-29-Figshare-Data” folder.
Exported .sav dataset as tab-delimited end of field and <LF> end of line (due to commas within open-ended text) “FIGSHARE_9_11_13_tab-dl” to “Documents/DataONE/2013-09-29-Figshare-Data/figshare-article-data-analysis” folder.
Operation results in file: FIGSHARE_9_11_13_tab-dl.dat
File path: /Documents/DataONE/2013-09-29-Figshare-Data/figshare-article-data-analysis/FIGSHARE_9_11_13_tab-dl.dat
This is a 35 KB file.
Saved as .jmp file (FIGSHARE_9_11_13_jmp); This is a 95 KB file.
I think I would like to generate files row-by-row. So I will also generate an excel spreadsheet because this is the easiest way I know to copy out all the first row at once (as JMP gives me the option to export as excel).
Interesting: I got this message – This data set is larger than the maximum allowable Excel 4.0 worksheet size (256 columns by 16,384 rows). It will be truncated.
Let’s see how it is truncated!
The file name is: FIGSHARE_9_11_13_xls.xls
I copied out the data from row 1 and pasted them into text wrangler to format them as a list. I did a “find” and replace for all “tab” content, and replaced them with a “return.” to get each column into a new row. Text wrangler replaced 255 instances of a tab with a return (as in, new line).
I saved the resultant file as <user-survey-data-headings.txt>
I think I could make a spreadsheet to keep track of the figures I create, filenames, and perhaps the DOI if I end up uploading a particular figure to figshare. Should I upload every figure separately? I am curious if I could create a file set.
I feel like a wiki is better suited for this type of effort than the wordpress open notebook. DataONE has an OpenWetWare wiki. I have added my name to <http://openwetware.org/wiki/DataONE:People>.
I created a new notebook within the DataONE project here:
This is coming along well. I am now ready to upload images.
It is possible to use <http://commons.wikimedia.org/wiki/Special:UploadWizard> although I am not sure that “Wikimedia Commons” is really the right place for storing and sharing research output.
I also don’t know if it is possible to “embed” content from Figshare into a wiki entry, but that is worth looking at.
Also not sure if I want to make a separate page for results – for example <http://openwetware.org/wiki/DataONE:Open_Research_Output_Early_Adopters_Study/Results>
This is useful: http://en.wikipedia.org/wiki/Wikipedia:Tutorial/Editing
This has some help for formatting the wiki:
Uploading a file to OpenWetWare is covered here:
Preferred file types: pdf, png, svg, swf, gif, jpg, jpeg, eps, ogg, doc, xls, ppt, sxc, pdf, py, zip, txt, xlsx, pptx, docx, tif, ogv, oga.
Prohibited file types: html, htm, js, jsb, mhtml, mht, php, phtml, php3, php4, php5, phps, shtml, jhtml, pl, py, cgi, exe, scr, dll, msi, vbs, bat, com, pif, cmd, vxd, cpl.
Aso note ability to create a “Wikibase” – http://www.mediawiki.org/wiki/Extension:Wikibase
I think for now I will create the spreadsheet to track what data I create (Also again I get why R is used to do this for large datasets).
Reference earlier notebook content:
I saved the resultant file as <user-survey-data-headings.txt>
I used the data there to populate a new Google Drive spreadsheet, “Figshare-User-Survey-Figures”
For now I’m using these titles:
Question will group them, heading will map to item.
The spreadsheet is done – there are 21 questions with pre-determined responses, and 1 open ended, so I expect to create about 20 images. However, I should point out there are 111 rows, so I think that means about 90 possible answers in the whole survey.
Since I’m using JMP, basically I am going down my list, doing a distribution analysis on the relevant column heading, saving as Q[number]-response with embedded table, then exporting as text and again as .svg.
Note: the SVG files did not work in OpenWetWare, so I switched to .png files.
Because I have submitted the article for publication, I’m not sure how much of the figures and output prepared for the manuscript can be published online – in fact my academic advisor cautioned me about posting things to the blog. So I find myself in a situation where I have data, but want to hold off on sharing it until publication.
I could upload the .dat file to Figshare (FIGSHARE_9_11_13_tab-dl.dat). Also, I did not technically create the dataset; I just did the analysis of the data. Still in the future, someone might ask me for the data. It would be easier to share the data if it were stored online.
However, I’m finding my motivation to do so is low after having completed the manuscript and submitted it for review.
In the future, I think I’ll use Figshare to manage data along the way, and keep it private if needed until later.