This week I explored the existing code for Linkipedia. Linkipedia consists of several endpoints that can be used to run it and I tracked down the workflow through all of them. The main objectives of this analysis were:
- to understand the functioning behind each module of the system
- to identify the specific module that generates candidate entity mentions so as to refine it.
Currently, the “annotate” endpoint segments the query into sentences, extracts n-grams (n=1 to 5) from the sentences, and queries the index to find matches for any of the extracted terms. To this list, I plan to add the normalized version of existing terms; along with the word pairs that are seen to be dependent upon each other in a dependency parse tree. Even though the python code for testing Linkipedia already has a component that does parsing; it is unclear as to whether Linkipedia receives the original query or the terms obtained after parsing. Hence, the first task for next week is to determine the kind of input sent by the python code to the “annotate” endpoint of Linkipedia. Depending upon this, I will apply different approaches to refine the existing list of terms and evaluate the performance of the system for each approach.