Week 6: Parser, Metadata Mapper Using Apache Tika

Hi All, This blog is in follow-up with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. After resetting our goals for rest of the project in the previous week. The goal is to extract metadata from different file formats using Apache Tika. Since we want Continue reading Week 6: Parser, Metadata Mapper Using Apache Tika

Week 5: Finalizing data extraction, writing, and analysis

In week 5, Rob and I shifted gears from data extraction to data analysis. Our final database has a total of 80 articles, 40 identified from the list of NCEAS-authored publications and 40 identified from our Web of Science search. In total, we extracted over 500 rows of data (i.e. Continue reading Week 5: Finalizing data extraction, writing, and analysis

Week 6 – Improvements of Transparency and Reproducibility

Hi all, In my week 6, I have mainly improved the scripts for the cases I introduced last week. To remind, there are two cases we defined: 1) transparency (understanding of what has happened and validate the information given) and 2) reproducing (intermediate) outputs. For transparency, an improvement done is Continue reading Week 6 – Improvements of Transparency and Reproducibility

Week 5: Parser in Apache Tika for DataONE file Format.

Hi All, This blog is in follow-up with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. In this week, we shared our progress with other developers by giving a short demo. We shared the working of file command and Apache Tika for custom detection of Continue reading Week 5: Parser in Apache Tika for DataONE file Format.

Week 4 – Data extraction

Following our refinements to our database of data sources and the lessons of last week, we dove further into the pool of data-synthesis articles we identified previously from NCEAS and Web of Science. Data extraction is (probably) the part of a systematic review that takes the most effort. It is Continue reading Week 4 – Data extraction

Week 3 – Database refinement and data extraction

Continuing with our collaboration and joint post from last week, our main goal for week #3 was the extraction of data source information from papers employing data syntheses. We anticipated the need to refine our database (e.g., fields and categories) along the way. In reviewing the abstracts of papers identified Continue reading Week 3 – Database refinement and data extraction

Week 4: Creating Parser in Apache Tika for onedcx file format

Hi All, This blog is in conjunction with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. Continuing from the last week, we explored Apache serve functionality for detecting the Custom mime types for the DataONE file format. The httpd.conf file of the server is Continue reading Week 4: Creating Parser in Apache Tika for onedcx file format

Week 5 – Transparency and Reproducibility

Hi all, This is my week 5. Since I have had an actual scientific use case, I have distinguished and generated two different cases 1) transparency (understanding of what has happened and validate the information given) and 2) reproducing (intermediate) outputs. 1) Transparency This case mainly shows how to capture Continue reading Week 5 – Transparency and Reproducibility

Week 3: Custom mimetypes/magic file for the DataONE file formats for identification using Apache Tika/Apache web server

Hi All, This blog is in conjunction with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. In the last week we were able to create the magic file for the file command and the repository admins of it also accepted and committed the changes in the Continue reading Week 3: Custom mimetypes/magic file for the DataONE file formats for identification using Apache Tika/Apache web server