Hello Guys,
This week is full of programming. Through the analysis process, several interesting points comes out.
Tag Analysis
Column “tags” is a highly interesting part for us because it contains keywords related to provenance research, for example, “reproducibility”. The objective of this project is to find out the current usage of provenance tools in academia and this column is a good point to start with. As for the composition, column “tags” contains manually added ones, each of which has its own meaning defined by Galaxy Project Group, and those automatically generated by Zotero.
1.1 All Tags
According to the definition of tags given by Galaxy Zotero Group,tags are composed by two parts –> some are manually added by Galaxy project group and others are automatically generated by Zotero. As a result, this analysis would be divided into two parts. Furthermore, among the manually added tags, tags started with “+” represent Galaxy Specific tags and each of them has its own definition.  Tags which start with “>” are named by public Galaxy platform.
| Manually Added Tags | Galaxy specific tags (“+”) | 20 | 
| Public Platform tags (“>”) | 168 | |
| Automatically Generated by Zotero | – | 6381 | 
| Total number of tags unique | – | 6569 | 
1.2 Manually Added Tags Analysis
1.2.1. Analysis of Galaxy Specific Tags
1.2.2. Analysis of Public Platform Tags
For the public platform tags, three public platforms are frequently used including “>Huttenhower” “>RepeatExplorer” and “>workflow4metabolomics”
Huttenhower: metagenomic and functional genomic analyses, intended for research and academic use
RepeatExplorer: Graph-based clustering and characterization of repetitive sequences, and detection of transposable element protein coding domains.
Workflow4metabolomics: A collaborative portal dedicated to metabolomics data processing, analysis and annotation.
1.3 Automatically Generated Tags Analysis
Happy to see some provenance related keywords: Reproducibility, Workflow
Papers Reading
Paper tag “reproducibility” 316
Paper tag “workflow” 117
The number of papers under these chosen tags(‘+Methods’, ‘Reproducibility’)is:5
The number of papers under these chosen tags(‘Reproducibility’, ‘Workflow’)is:7
Next step for our research is to read the papers contained a combination of certain tags. Additionally, figuring out how the Galaxy group collected and tagged papers is necessary to ensure the reproducibility of our project.
Have a nice weekend.
