This was my last week of the internship and major task for this week was to create the documentation for the tool created and a poster for the DataONE-AHM.
- Wrapper for execution: A shell script was created as a wrapper for execution of the tool which takes the file as input. The tool uses 3 jars for successful execution, hence we wrapped that into a script which is easily callables from the command line.
- ReadME: Created the readme according to the working of the tool. A detailed explanation of the output and how to add new file formats and custom fields were incorporated. While working on the readme we found an area of improvement for the tool, and we created a config.Properties file for adding new file formats without any major modifications in the code.
- config.Properties: A file containing a key-value pair of dataone file format id and XPath values for the configuration file was created. This helped in the automatic identification of the XPath values once the file type is detected. Earlier, the file format and XPath were hard coded for the dataone file format.
- Poster: Created a poster which summarises. There are two deliverables for the internship, one is the libmagic for identifying the DataONE file format and the other is the DataONE metadata parser tool for file identification and content extraction.
Thank you all for making this a great fun learning experience. I really appreciate the efforts of my mentor Dave, for his advice, help, and experience which have been tremendously helpful throughout the internship.
Thanks again, and Have a great weekend!
Resource links: Github-file_identification, Github-DataONE Parser, Project Plan, Poster