Hi All,
My name is Pratik Shrivastava and I’m the intern working on the Project 4: Extending Libmagic for Identification of Science Resources. The goal of this project is to extend the capabilities of the Linux (or equivalents on OS X and Windows) file command to allow automatic identification of common science metadata and data formats.
This is my first week, and I’ll be posting my project updates weekly using this blog. In this week, I got to meet my mentor Dave Vieglais (virtually) and we setup a Project Plan for achieving the goals of the project. The daily progress and updates are also posted on the hpad and we have a daily call for the updates. For this week, I spent time on understanding working of the file command and libmagic. What are the magic numbers, and how they are used by the file command for identification of the file type and tried my hands on by creating toy examples. Dave, also shared some of the example files as well, which needs to be identified using the file command by creating the magic files for them.
For testing purpose, a python script is also created which identifies the file type using custom magic files with the help of the magic library in python. We were able to create the custom magic files for identifying the 7 examples file correctly out of 12. Below are the file formats that we are trying to correctly identify.
- http://ns.dataone.org/metadata/schema/onedcx/v1.0
- http://datadryad.org/profile/v3.1
- FGDC-STD-001-1998
- FGDC-STD-001.1999
- http://purl.org/ornl/schema/mercury/terms/v1.0
- http://www.isotc211.org/2005/gmd
- http://www.isotc211.org/2005/gmd-pangaea
- http://www.isotc211.org/2005/gmd-noaa
- eml://ecoinformatics.org/eml-2.1.1
- eml://ecoinformatics.org/eml-2.1.0
- eml://ecoinformatics.org/eml-2.0.1
- eml://ecoinformatics.org/eml-2.0.0
In the coming weeks, I’ll continue to create the magic files for the rest of the examples and setting up the environment for the packaging our magic files. This will help in easy installation and usage for the users. We will be developing the unit tests in python too, using the unittest library for functional testing.
That’s all for now, see you all next week!
Have a great weekend!
Resource links: Github,Project Plan