Week Four Update

Week Four Goals

The goals for week four of LOD4DataONE were to move forward with extracting DataONE data, KNB and ORNL DAAC, and exposing it in RDF with links to related URIs.

Results

I focused first on KNB data and given the path of this week’s research, I did not approach ORNL DAAC data yet. KNB has a large, well defined, XML description language for describing data, thus I needed to decide how far to go wrt mapping this to RDF. Furthermore, KNB has different data and metadata access tools, i.e., OAI-PMH and Metacat. I considered using OAI-PMH to establish a consistent mechanism for extracting metadata as I did with the Dryad data. Unfortunately, I could not get very much data so I used the Metacat API to access the data I had selected. As I extracted data, I focused on particular fields for expressing the records in RDF but the real challenge was getting these to work under one RDF browser. Simple changes would either not be seen in a particular browser or would break it. I do believe though that the Open Data Explorer has a good interpretation of the data; the RDF records are viewed with respect to how they relate to What, Who, Where, and When tabs. For example, Where shows a map and displays specific locations for the records that express map data. Other browsers have similar qualities but I have not identified the specific RDF predicates or resources, i.e. vocabulary, they are looking for. I will be contacting the RDF browser providers directly to get some direction on this.

As a result, I have rebuilt RDF but I am focused on building RDF that is specific to obtaining the Who, What, Where and When in the data. The FOAF ontology, for example, is a Who quality. Zitgist has an RDF template that reads FOAF records and displays a FOAF view. Zitgist seems to have a Map template as well, but my efforts in exposing the spatial data with the GML ontology or the spatial WSG84 Vocabulary of the W3C Semantic Web Interest Group have caused errors in the Zitgist browser. The statements do not fail in the Tabulator RDF Browser but they have no effect on the map view.

As of yet, in order to participate in a linked open data cloud, I have learned that DataONE will either have to provide a server for dereferencing URIs and negotiating content or form an alliance with tools that extract non-RDF into RDF (e.g., OpenLink). In addition, the DataONE repositories will need to facilitate linking data to specific dereferencable URIs, assuring that the proper links are defined when the data is published. It also seems that, in order to serve their broader community, they will need to provide some mechanism for browsing DataONE data in a useful context. For example, a tab focused on gene data within the ODE. Finally, the RDF must enable some level of integration with the cloud. It seems though, that this can be exhibited by loading RDF from multiple sources into a Browser and using similar integrated views like a Map or Timeline. I am expecting to show this from next week’s efforts.

Observations

  1. It seems possible that the DataONE data providers make simple changes by exposing RDFa within their web pages. In this case, there are already spongers that can access the data. The question to answer is what RDFa and what links better expose and integrate the fields of a record with the cloud.
  2. Just as browsers will have an impact on the usefulness of RDF data, data publishers such as KNB, Dryad and ORNL DAAC will have an impact on the linkability of data. Making it easier for authors to select appropriate dereferencable URIs that produce RDF content and relating the URIs to fields as part of the curation process will facilitate cloud integration.
  3. In my efforts to extract KNB data using the Metacat API, I was amused to find an InsufficientKarmaException … I am trying to avoid that one 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

*