Week Three Update – DataONE Notebooks

Week three goals

This third week for LOD4DataONE was focused on understanding links within the datasets and browsing them. I am currently using the RDF browser at zitgist.com. This work is still more focused on the Dryad data because it allowed me to push forward with understanding how to load and link data.

Results

Overall, the process seems straightforward, I am using OAI-PMH services to access Dublic Core structured meta-data for a published dataset, this is being mapped to RDF using a DCType class I have created (http://rio.cs.utep.edu/ciserver/ciprojects/sdata/DryadTypes.owl) along with some other RDF definitions. I create RDF for all records and define URI’s for them. The result is that for each URI, the references to them are accessible to other RDF triples and they describe their content in RDF. You can imagine this to be the task of a DataONE server that would take a URI and resolve the content by looking in the DataONE datastores to generate the result RDF. A DataONE server would do this dynamically, I am doing it statically by putting all the RDF data in a single file (http://rio.cs.utep.edu/ciserver/ciprojects/sdata/DryadData.owl). The java code that does this is located on Github at:
https://github.com/hlapp/LOD4DataONE

The function that mimics resolving the RDF content; by creating the DryadData.owl file that is published it on my server is:
D1DryadOAIPMHMapper:harvestMetadataToRDF()
The goal is to automate this more, in particular taking advantage of links and exposing relevant dataset data.
mainC was provided as an example and to show what I am uploading.

I have provided a short Powerpoint slideshow at http://rio.cs.utep.edu/ciserver/ciprojects/udata/LOD4DataONEWeek3EX.ppsx. This presentation gives a short overview of what has been done so far in this research. Please let me know if this is a useful way of explaining this work. I often feel that these lengthy text entries are never as effective as a demo.

I have two followup steps for next week: align this with the KNB and ORNL-DAAC datasets and identify as many relevant dereferenceable URI’s that generate RDF content. It will be interesting to understand how to take advantage of those sites that are related to the LOD Cloud.

Observations

1) Although there are a lot of RDF-based tools, and the count continues to grow, the stability of the Semantic Web environment is still quite sensitive. Throughout this week I experienced services down, webpages no longer available, simple modifications to a file rendering a tool useless, … these are issues with the Web as well but in order for a semantically automated environment to be successful, there definitely needs to be more error detection, error recovery and redundancy to avoid failures.
2) One important aspect to the LOD environment are the browsers. It will be interesting to see how browsers take advantage of RDF to make it useful. For more specific usefulness, as I noticed in the Openlink Data Explorer I was using last week, there is more dependance on specific vocabularies; understanding and conforming to these vocabularies can complicate using and understanding the browsers.
3) Finding linkable URIs, that Web server support to return browsable RDF data, is not as easy as I expected. One solution is to depend on RDFizer components within RDF browsers to sponge out as much information as possible. This is rather superficial because links and specific vocabularies can not be leveraged.

Any input provided would be great in particular in considering the datatypes and vocabularies for the exposed RDF data.

Week three goals

Results

Observations

Leave a Reply Cancel reply