This week I finished the code and test cases for the automatic ontology generation I started last week, and began work on the coverage algorithm.
The now finished ontology generation comes with a readme, the ability to add individual words, or merge two existing ontologies. In this way, it should be easy to expand an existing ontology. However, due to the space complexity of generating an ontology (using the OWL api) from a corpus, a machine with a large amount of memory is required (e.g., a document with 10,000 unique words will need to create somewhere between 80,000 — 160,000 unique java objects that each often contain their own data structures within them).
As for the coverage analyzer, this uses the algorithm introduced in the paper by Yao et al entitled “Benchmarking ontologies: bigger or better?”. The basic idea is to check whether each class type, and each subclass and equivalent relationship within the ontology under test exists within the ontology generated from the corpus. Then, add a weighted score for each entry hit. The weighted score allows for specifying whether a particular type of data component is more valuable. For example, giving a higher weight to subclasses than to equivalent classes relationships tweaks the score to represent more hierarchical relationships than lateral ones. The author calls this type of coverage algorithm a “Breadth” score.
So far, the basic code to implement this algorithm is finished and some simple test cases are created. In this case, its important to have tests that can be manually evaluated to ensure that the algorithm is working properly (i.e., that it can be evaluated by hand).
While generating the ontology from the corpus was time consuming and memory intensive, I believe that this coverage evaluation process should be fast and efficient. I have implemented it to make an constant time check (i.e., O(1)) for each element being evaluated. Thus, even for large ontologies under test, a Breadth score should be computed fairly quickly.