Improving Existing Modules Based on Scenario Studies

As previously planned, I started to work on the Daymet scenario this week. Previous development mainly focuses on basic modules in the model data comparison package. Most of the functions are pretty general. As we started to build sample workflows for scenarios, more and more detailed needs start to jump out.

For the Daymet scenario, in addition to basic statistics and temporal aggregation, climate scientist usually want also get long-term summary of a climate variable. Current Vistrails/UV-CDAT doesn’t have any modules for this purpose. So I developed a new module for calculating long term summary. Similar to temporal aggregation, users just need to specify a temporal granularity (year, season, month etc), the module will compute long term mean.

The statistics module in current package is general-purpose. It can compute statistics along any axis or axis combination. However, most of the time climate scientist who are not familiar with NetCDF files may have difficulty to choose axis. To facilitate their use, two new modules are added: one is used to compute statistics along temporal axis (e.g. temporal mean, temporal sum) and one is to conduct statistics along spatial axis (e.g. mean of a ecological region).

Besides new modules, I also improved existing modules based on the needs from scenario studies. When I started to build workflow that links different module together for the Daymet scenario (Dayment data download → mosaic → regrid → temporal aggregation/ long-term mean), I found the “mosaic” becomes the bottle neck as Daymet data are quite large. I spent quite some time to improve the mosaic modules to solve this issue. I found it is very slow to calculate mosaic result location by location. Instead, reading and mosaicking data in blocks is much faster. The problem is that it is hard to get average for overlapped area if we using block based processing. However, this is not a big problem as input data usually do not have overlap and even if they have, putting values from one of the them into the final mosaicking file is also a valid solution. This trade-off could speed up the mosaicking quite a bit.

The scenario study not only can be used as a test of the developed package but also help to find gaps of our current package and real-world needs. With more detailed scenario studies, I believe more improvement will be made to our current package.

Another work that I have been working on this week is to build help documents for our modules. Though we will create some pre-defined workflows, it’s still very important to provide convenient help documents so that climate scientist can easily build their own workflows using the basic building blocks. I made use of Python document functionality to create document both in Vistrails/UVCDAT and a separate html file. Users can easily see the document for a module when they use it or they can open the html document in the web browser.

Leave a Reply Cancel reply