OKCon: Recap – DataONE Notebooks

On June 29th through July 1st I attended the Open Knowledge Conference in Berlin, as part of this project. The Open Knowledge Foundation is a small non-profit dedicated towards open data, open science, open government, and virtually every other noun phrase beginning with the word ‘open’. The OKCon is one of their main activities, and this year over 400 people from across the world (although mostly from Western Europe) attended the event. Rather than give a rundown of the variety of speakers, which has been done better both here and here, or rehashing the extensive programme, I’ll go over the various talks I went to, and what I gained by attending.

I myself presented twice – once as part of DataONE, and once on my own, suggesting that workflows might be useful in the linguistics and other social sciences. The abstract (which I’ve already posted here) for the this project’s presentation is here, and in case you’re interested, here are the slides.

-§-

The day before the conference started, I attended the Open Science Workshop, where I worked with a couple of others on outlining a tool that might be used for collaboratively providing quality assurance and control to open databases and excel sheets. The notes that came out of this can be seen here. We didn’t have the time to code or present a working model of it, but it’s not a bad idea, and I very much suspect that something similar will be rolled out soon by Google or other collaborative online web developers. That night I polished up and sent off my first publication, as well, although that isn’t connected to this project.

-§-

The first day I attended the Open Science Panel, (somehow missing out on Richard Stallman, who was probably the biggest name at the conference), which covered a lot of the key issues going on in Science at the moment – such as publication, releasing data sets, legal rights. After lunch, I helped chair Workshop I; the talk by Cameron Neylon on Open Research was particularly well presented and very interesting, but not as much as the next one by Konrad Förstner, who presented on an idea for making collaborative platforms for streamlining workflows. He expressed a desire for an infrastructure for workflows to allow for better reuse and sharing of them – similar to what myExperiment is doing with Taverna-based workflows (tagging them for information based on their complexity and amount of inputs, and so on), and similar to what the Workflows for Ever (Wf4Ever) group are working on. A couple of talks after, Björn Brembs gave a fantastic talk on what is wrong with the money flow and journal hierarchies in current publishing – I highly suggest even a cursory look at the slides, which, while picture-heavy, do get across what he was saying.

After this I went to the Open Linguistics workshop, which was mostly about open NLP and LOD, which went over my head quite a bit. Sebastian Nordhoff gave a good exposé of his work at the MPI Leipzig, making an open bibliography called Glottolog, that looks to be the largest in the world of all linguistic data, coming out soon. I presented my own work, as well – little work at all has been done on standardising workflows (there might be some in shell scripts, but I’ve not run across any) in Linguistics, and this is something that I would like to work on over the next couple of years. Afterwards, I went home and polished up my slides for the next day. Incidentally, I also graduated with an MA (Hons) in Linguistics from the University of Edinburgh on this day, although I didn’t attend my graduation, being in Berlin and not Edinburgh.

-§-

On July 1st I attended the most interactive talk, on Open publishing in Poland, where drawing on sheets of paper in small groups was encouraged. There was then a presentation on Knowledge for All, another bibliographic system being developed (this one with help from the Canadian government). Throughout the conference, there were many comments about the legality of mining bibliographic references. On the whole the dialogue went like this:

Q: So, how are you going to get around the legal issue?

A: Well, we don’t think that we’ll be sued for this. No one has been.

Q: They have, actually. Hire some lawyers.

I saw this happen at least three times. What is needed is a database of Open Knowledge-relevant lawsuits.

I then presented my talk, which as far as I am aware, was completely legal in that regard. There were a few good comments – mostly clarification comments, but also some good ones from Caspar Addyman, another social scientist I met in the Open Science Workshop, and from Konrad Förstner about the applicability of workflows, and how to follow up on this research.

After this, I went to another great talk by Guo Xu on the need for a repository not only of open data normally locked away sideways (literally) in .pdfs, but also for regression analyses data from various sources, as studies are often conflicting. Guo is one of the guys in charge of the Open Economics workgroup (which I am now a member of), which aims to make governmental and economic data transparent and easily mined by researchers. After this, there was an interesting talk on a Oracc.org, a new site dedicated towards presenting cuneiform in annotated formats for public use (much like the Perseus project). Ross Mounce’s talk on open palaeontology was for me the most exciting, as it proposed that paleontological data really needs to be put into the public sphere to encourage research, and that any data gathered under public funding belongs in the public domain. I say it was exciting for me because Ross’s research looks into phylogenetics, particularly involving morphology, and I have spent a considerable amount of effort battling this issue (particularly involving postcranial data in primates) in my private research this year.

After coffee, I attended a talk by Ulrich Herb on the legal limits of Open Access, which was interesting but largely over my head (although he is from Saarbrücken, where I’ll be living next year, so I hope to learn more about this in the fall). Afterwards, there were two final talks, one on openshakespeare.org about the first gatherings of data about the future of editing shakespeare and how people take to open annotations, and one on open banking and ways of implementing completely open banks.

After this, I went out with the Open Economics group to discuss different possibilities for future work, before heading home the next day. Overall, it was a very interesting conference, even if there were few bioinformatics folk and the majority of the information passed around was labelled ‘Open’ and not ‘Kepler’. There were several people there in this field, however, and I think that on the whole it was a very valuable experience.