{"id":2858,"date":"2016-07-22T00:16:46","date_gmt":"2016-07-22T00:16:46","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=2858"},"modified":"2016-07-22T00:21:38","modified_gmt":"2016-07-22T00:21:38","slug":"week-9-documentation-and-finishing-up-with-evaluation","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/semantic-evolution\/week-9-documentation-and-finishing-up-with-evaluation\/","title":{"rendered":"Week 9: Documentation and finishing up with evaluation"},"content":{"rendered":"
Since the past few weeks, I was working on evaluating the ontology matching algorithm on our dataset. After completing the code for performing evaluation and running it, 0 values for precision, recall and f-score were obtained. Manual inspection of the source ontologies revealed that\u00a0 the \u201cclass name\u201d were uninformative and hence I decided to use the “label” for making comparisons. This week, I started out by modifying the AML code so that for the source ontologies, instead of using the \u201cclass name\u201d as the unique identifier of the class, “label” is\u00a0 used. After making this change in all the required portions, I found out that AML was overwriting the contents of all the classes whose “label” are same. I undid all the changes that I had made earlier and started searching for a way to deal with this issue. The easiest solution that I came up with was using regular expressions to identify the classes whose \u201cclass name\u201d matches the format used to provide names to the classes from the source ontology in the eml2owl python script. For the classes that are identified as belonging to the source ontology, I provided more weightage to “label” compared to \u201cclass name\u201d. After making these changes and running the evaluation program again, the precision and recall were no longer 0 but very few matches were found.<\/p>\n
There was also an interesting thing that I found out. Out of the all the classes that are present in the source ontologies, all of the ones that are subclass of “oboe:measurementType” (we are only interested in classes that are subclass of “oboe:measurementType” for analysis) have same labels. Hence, even after using “label” instead of “class name”, all the classes of\u00a0 source ontology were getting matched to a single class from the target ontology. Later, my mentor mentioned that there was some bug in the eml2owl script and provided me with an updated script as well as an updated merged.nt file. In order to make use of these files for evaluation, I had to again go through the process of generating owl files from eml files for the source ontologies and creating merged.owl file from the merged.nt file. After doing these, I ran the AML program on source and target ontologies. I was instantly able to figure out that the performance was much better because the number of matches were more than what I had obtained previously. Once the program finishes running on the entire dataset, I will be able to obtain the final values for precision, recall, and fscore.<\/p>\n
Since this is the last week of my internship, I have simultaneously started preparing a document that will explain about all the code, how to use them, and the possible improvements that can be made to achieve better results. I hope this will come handy to anyone who wants to work further on ontology matching using AML. I have had a wonderful experience working on this project since the past 9 weeks. I would like to thank my mentors Prof. Deborah and Jim for the constant guidance and constructive feedbacks.<\/p>\n","protected":false},"excerpt":{"rendered":"
Since the past few weeks, I was working on evaluating the ontology matching algorithm on our dataset. After completing the code for performing evaluation and running it, 0 values for precision, recall and f-score were obtained. Manual inspection of the source ontologies revealed that\u00a0 the \u201cclass name\u201d were uninformative and Continue reading Week 9: Documentation and finishing up with evaluation<\/span>