Gathering dataset – DataONE Notebooks

This week my focus was on meeting with my mentors, understanding my specific project requirements and gathering my datasets. My initial work was about creating a week-by-week plan to create a meaningful, generalizable ontology coverage tool for OWL ontologies.

I spent some time looking over existing scripts and testing them to ensure the results were what I expected. Using these scripts I acquired the corpus I will be using in my research this summer. However, I also wrote some small scripts that will parse existing documents so that it will be easier to create ontologies (to allow for further testing).

My second major step was gathering the ontologies. My mentors provided a link to some likely candidates, and we agreed to use the OWL API. However, as I was completely unfamiliar with the this API, I spent some time using tutorials and walking through example code to understand it. At this point, I have written some code that can read in existing ontologies, add various features to those ontologies, and check for whether features exist within an ontology. I have also written scripts that download a series of ontologies (to create a sufficient dataset for my research for the rest of the summer).

I currently have both a corpus and all the SWEET ontologies. However, with my current scripts, it would be somewhat trivial to acquire a larger dataset (e.g., more corpses and ontologies) if that proves necessary.

Leave a Reply Cancel reply