Week 6: Parser, Metadata Mapper Using Apache Tika

Hi All, This blog is in follow-up with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. After resetting our goals for rest of the project in the previous week. The goal is to extract metadata from different file formats using Apache Tika. Since we want Continue reading Week 6: Parser, Metadata Mapper Using Apache Tika

Week 5: Parser in Apache Tika for DataONE file Format.

Hi All, This blog is in follow-up with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. In this week, we shared our progress with other developers by giving a short demo. We shared the working of file command and Apache Tika for custom detection of Continue reading Week 5: Parser in Apache Tika for DataONE file Format.

Week 4 – Data extraction

Following our refinements to our database of data sources and the lessons of last week, we dove further into the pool of data-synthesis articles we identified previously from NCEAS and Web of Science. Data extraction is (probably) the part of a systematic review that takes the most effort. It is Continue reading Week 4 – Data extraction

Week 4: Creating Parser in Apache Tika for onedcx file format

Hi All, This blog is in conjunction with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. Continuing from the last week, we explored Apache serve functionality for detecting the Custom mime types for the DataONE file format. The httpd.conf file of the server is Continue reading Week 4: Creating Parser in Apache Tika for onedcx file format

Week 3: Custom mimetypes/magic file for the DataONE file formats for identification using Apache Tika/Apache web server

Hi All, This blog is in conjunction with my earlier blogs for the Project 4: Extending Libmagic for Identification of Science Resources. In the last week we were able to create the magic file for the file command and the repository admins of it also accepted and committed the changes in the Continue reading Week 3: Custom mimetypes/magic file for the DataONE file formats for identification using Apache Tika/Apache web server

Week 2 – Revising database, outline methods, begin data extraction

The majority of this week’s work was very collaborative, so today’s blog post is also a collaboration. Our main goals for Week #2 of our internship revolved around fine tuning our database for the systematic review. Last week, Rob’s blog post highlighted one way to reduce bias in a systematic Continue reading Week 2 – Revising database, outline methods, begin data extraction

Week 2: Created, tested and Committed magic file with the Libmagic library.

Hi All, This is the second week of my internship, and below are the tasks that were completed during this week. Adding the patterns for the rest of the file formats into the dataone magic file. Continuing the work from the last week, we were able to create additional patterns for Continue reading Week 2: Created, tested and Committed magic file with the Libmagic library.

Week 1 – Search terms, identifying articles and literature review

Hi, my name is Rob Crystal-Ornelas and I’m one of the interns for Project #2: Supporting Synthesis Science with DataONE. I’m looking forward to the experience of working on a systematic review with a team of researchers, including my co-intern this summer, Giancarlo Sadoti. This summer, we’ll be working on Continue reading Week 1 – Search terms, identifying articles and literature review

Week1 – What is file command, libmagic library and how they work.

Hi All, My name is Pratik Shrivastava and I’m the intern working on the Project 4: Extending Libmagic for Identification of Science Resources. The goal of this project is to extend the capabilities of the Linux (or equivalents on OS X and Windows) file command to allow automatic identification of Continue reading Week1 – What is file command, libmagic library and how they work.

Outreach Products from the Internship – DataONE Messaging Week 9

The time has come to say goodbye… My experience from this summer has been invaluable. As one of my goals from the summer was to develop marketing materials that I could use as examples in future job applications, I’d say the journey was a success. My four outputs from the Continue reading Outreach Products from the Internship – DataONE Messaging Week 9