Ontology Coverage – DataONE Notebooks

Journal paper

Posted on July 20, 2013 by digiuseppe — No Comments ↓

This week I am writing our journal paper for the ESIN journal. I also wrote an abstract for the AGU conference. So far I’m about half way through the journal paper with included figures. The abstract for the AGU is complete. While I won’t describe the entire paper (as it should Continue reading Journal paper→

Final Results

Posted on July 12, 2013 by digiuseppe — No Comments ↓

So this week I finished the subtopic matching software and after some testing, ran it. Basically my goal was to answer the question, does coverage decrease significantly if you remove “topically” unrelated documents from the corpus? I found that surprisingly yes. While the SWEET ontologies are small, and attempt to Continue reading Final Results→

Subtopics

Posted on July 5, 2013 by digiuseppe — No Comments ↓

At the end of last week I was trying to find a related ontology to enable subtopic matching. However, after some failed attempts, it became clear that until we made an ontology from the whole corpus and not just a part (remember a few weeks ago I talked about memory Continue reading Subtopics→

SWEET Ontologies and Coverage

Posted on June 28, 2013 by digiuseppe — No Comments ↓

This week I finished the coverage analyzer tool and have run tests on over 200 popular ontologies (i.e., the SWEET ontologies)! As this is the main part of the project (the coverage tool) and getting this data is over a week ahead of time, I’m quite pleased. The coverage tool Continue reading SWEET Ontologies and Coverage→

Ontology Generation and Coverage

Posted on June 21, 2013 by digiuseppe — No Comments ↓

This week I finished the code and test cases for the automatic ontology generation I started last week, and began work on the coverage algorithm. The now finished ontology generation comes with a readme, the ability to add individual words, or merge two existing ontologies. In this way, it should Continue reading Ontology Generation and Coverage→

Part of Speech Tagger, ontology generation

Posted on June 14, 2013 by digiuseppe — No Comments ↓

This week my main goal was to get a Part of Speech (PoS) tagger up and running. After some searching and testing I decided to use the Natural Language Toolkit (NLTK.org). While it has to be installed (as opposed to running in a jar or python egg) it runs quickly Continue reading Part of Speech Tagger, ontology generation→

Normalizing data

Posted on June 7, 2013 by digiuseppe — No Comments ↓

This week had me spending most of my time normalizing my two data sets (my corpus and ontology). This normalizing process (for the corpus) was fairly straight forward and involved a few steps, 1) remove punctuation 2) force lower case 3) remove stop words 4) stem each word 5) remove Continue reading Normalizing data→

Gathering dataset

Posted on May 31, 2013 by digiuseppe — No Comments ↓

This week my focus was on meeting with my mentors, understanding my specific project requirements and gathering my datasets. My initial work was about creating a week-by-week plan to create a meaningful, generalizable ontology coverage tool for OWL ontologies. I spent some time looking over existing scripts and testing them Continue reading Gathering dataset→

Welcome to the Intern Notebooks

Posted on May 30, 2013 by Amber Budden — No Comments ↓

The DataONE summer 2013 internship program is underway and very soon, you can expect to see exciting developments blogged in place of this generic message.