My plan for this week was to implement algorithms for matching abbreviations to their full forms and identifying units. However, even before using the algorithms, it is important to evaluate the performance of the existing algorithms so that we can obtain a baseline that will help us to decide whether additional components increased or decreased the performance. Hence, I decided to accomplish this task first. For evaluation purpose, out of a total of 1215 training data, we have a set of 830 manually annotated ontologies. Hence these 830 ontologies were treated as “source” and a predefined set of 10 ontologies were used as “target”. Since we only have the URLs for the source ontologies, I wrote a script that would take in each URL, pass it as a parameter to a python code for converting EML to OWL and save the resulting OWL file. For the target ontologies, I manually saved the owl files. These source and target ontologies were then sent to the ontology matching algorithm and the matches along with the similarity score are recorded. Currently, I am running the algorithm for the entire dataset. Once this gets done, we can analyze the kind of matches that have been found by the algorithm and infer what augmentation needs to be made to the algorithm. I plan to compare the results obtained with the expected outcome (present in the manual annotations) for determining precision, recall and f-score values.
Apart from this, I have also been exploring the Linkipedia code. In particular, I tried adding/removing some of the attributes that have been sent as parameters and evaluated their effect. It turned out that including the definitions instead of description increases precision, recall and fscore by a fair amount. Next week, I plan to analyze the results from ontology matching and will also implement algorithm for matching abbreviations. I will simultaneously work on improving the accuracy of Linkipedia.