Week 3: Initial efforts for improving annotation accuracy

After exploring the Linkipedia code thoroughly last week, I was able to identify the tasks performed by each module and the format of input sent to them. Specifically, I found out that the python code takes in a name and description, uses Stanford dependency parser on the description to extract the terms that hold specific dependencies between them, finds synonyms of the terms extracted in previous step using Wordnet and then sends each of those terms (the original terms and the synonyms) along with the description to Linkipedia. I started by making some changes in the python code. Currently, only 3 dependency types are accepted by the python code. I added more dependency types to the list and evaluated the accuracy of the system before and after making the changes. It was evident that the addition helped in increasing the precision by a small amount but the recall decreased significantly.

Similarly, in the โ€œannotateโ€ endpoint of Linkipedia, I added some conditions that will consider two terms to match even if either one of them is contained in the other. However, while testing the performance of the algorithm after this addition, I started getting โ€œdivide by zeroโ€ error. This error was still there even after using the unaltered version of the code. Moreover, I setup virtual machine in two other computers, obtained Linkipedia code from GitHub and installed everything again but the error is still persistent. After debugging the error, I found out that the problem was in querying the graph produced from ontologies. I still need to explore further and resolve the issue.The weird thing though is that the same code was functioning properly before in my own machine.

After dealing with this issue, I have plans of extracting noun phrases and using Stanford dependency parser on the description to identify potential candidates. I also plan to begin setting up the tools required for ontology matching.

Leave a Reply

Your email address will not be published. Required fields are marked *

*