{"id":860,"date":"2011-06-18T08:23:10","date_gmt":"2011-06-18T14:23:10","guid":{"rendered":"http:\/\/notebooks.dataone.org\/lod4dataone\/?p=59"},"modified":"2013-05-17T20:43:28","modified_gmt":"2013-05-17T20:43:28","slug":"week-two-update","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/linked-data\/week-two-update\/","title":{"rendered":"Week Two Update"},"content":{"rendered":"
This second week for LOD4DataONE was focused on writing code to extract data from datasets on the three repositories, creating RDF and loading the RDF triples in an RDF explorer.\u00a0 Initially I was focused on applying this process to all three repositories but realistically I could only work the process through to the end with the Dryad datasets.\u00a0 Not a surprise really, all three repositories work so differently with respect to getting data and metadata out.\u00a0 I was able to start focusing more on using the OAI-PMH interface to extract metadata then use specific site-based tools to extract data and RDFize (create RDF triples) it.\u00a0 A sample RDF file for the Dryad data can be found at http:\/\/rio.cs.utep.edu\/ciserver\/ciprojects\/udata\/dryadRDFWeek2.rdf<\/a>.\u00a0 I chose this location for placing the data because I have the APIs to upload the content automatically, making it easier to modify, view and share the results. I was able to open the RDF data file using the Openlink Data Explorer (ODE) Add-on in Firefox, this uses URIBurner as I had originally planned. ODE uses Virtuoso, a multi-model data server that works with relational, RDF and XML data. Given a URL, it will attempt to generate RDF via a tool called a ‘sponger’. The sponger does not require that a URL page be expressed in RDF because it tries to make sense of the page to create RDF triples. The week ended with comparing the RDF I created to the RDF created by the RDFizer in ODE and assessing the overall process with a use case. The use case can be found in the Use Cases<\/a> page for this research effort.<\/p>\n The RDFizer in ODE was able to pull a lot of information out of the Dryad metadata pages and I could see relevant data in the Categories. In total the ODE sponger, the internal ODE tool that extracts all it can from URL pages, extracts 1085 triples. When I ran certain scenarios, e.g., search for “hunting”, I am able to find several relevant records. From an initial standpoint this might be ok but I am not sure what IS in there from the default Dryad metadata page. Considering data misuse, it seems important to provide relevant data triples and assure that unexpected data is not exposed in RDF to the ODE sponger.<\/p>\n My RDFizing of the Dryad data generates an initial set of 95 triples. There were issues with 2 of the 6 datasets so I just focused on extracting 4 datasets successfully to complete this week’s goals. When loading this RDF in ODE, ODE does not seem to be able to read the RDF. Further testing showed that the RDF I generated had no errors in the W3C RDF Validator<\/a> and I was able to query the data in a separate RDF SPARQL query tool called Twinkle SPARQL Query<\/a> which uses a different RDF triplestore. In the end, by using the online Virtuoso Query Tool<\/a> to load the RDF file I determined Virtuoso can make no use of the RDF I have generated. I can not run a simple query, e.g., select * where { ?a ?b ?c . } which should return all triples.<\/p>\n Week two goals This second week for LOD4DataONE was focused on writing code to extract data from datasets on the three repositories, creating RDF and loading the RDF triples in an RDF explorer.\u00a0 Initially I was focused on applying this process to all three repositories but realistically I could only Continue reading Week Two Update<\/span>Results<\/h2>\n
Observations from week two:<\/h2>\n
\n