I have put together a list of literature reviews the tools that I am using. Please see the attached:
I am doing the experiment on the topic model based approach. The experiment takes relatively longer time than the TF-IDF — Training 300 topics with 1000 iterations could take a day using the whole corpus from the 4 archives (DAAC, Dryad, KNB, and Treebase). I will post further findings during the weekends here as well.
Update: The experiment on the Topic Model (TM) based approach is done. Please see the slides below for the results and discussion.