Week 2 – Use cases and more – DataONE Notebooks

Last week, we have made a plan and this week’s task is focusing on the use cases. So several use cases have been identified:

1. An example simulation data processing script from EVA WG. The script works a set of simulation datasets, performs several diagnoses on the data based on different models and visualizes the result. A simpler version of the script has only one model, no external defined functions and simpler code structure. A challenge of studying this use case is that the original script from EVA WG is in Matlab, so re-implementate in Python is needed. It turns out the translation on the simpler version of the script is not that time-consuming, but the testing and debugging is more challenging since the dataset is large and Matlab and Python are different enough that it’s little bit difficult to make the two scripts compatible.

2. Simple curation workflow. Curation workflows are built to automate data curation pipelines, consist of several data processing steps (actors) and the connections among them indicating how data flows. For this use case, the simple workflow has four actors: CSVReader, GeoRefValidator, DateValidator and CSVWriter.

Based on my experience on iPython notebook in the past weeks, iPython notebook is a good tool to actually write a notebook with source code and running result, much like a software documentation. An example notebook of how to process NetCDF file in Python is here, it clearly demonstrates some key features and examples along with actual running results. So I think iPython notebook is really easy to use and learning curve of learning how to use this tool is really flat. However, iPython notebook is not built to be a programming tool, especially for coding in large scale. The notebook UI is simple but lack of some programming support like auto completion, error detection, indexing, etc. I guess the tool is still under development and I hope there will be a feature like “switching to pro mode”.

Leave a Reply Cancel reply