Metadata? I thought you were in charge of that.

CC-BY-NC-SA by Frank Maurer via flickr
CC-BY-NC-SA by Frank Maurer via flickr

Ecologists, as a group, seek out adventure, natural wonder, and, let’s face it, sometimes hardship. Rather than being deterred by remote locations and inhospitable environments, they are inspired. Speak to an ecologist, and you’re likely to find out very quickly about the particular location in which they conduct their research and hear amazing stories about the difficulties they endure to collect their data.

A relatively more rare breed of ecologist chooses (at least sometimes) to forego the adventure, wonder, and frustration of collecting field data for an entirely different (and likely less obvious) set of challenges and exhilarating successes; these brave and adventurous ecologists choose to work with data that have already been collected by other researchers. Members of a research team studying climate change effects on the lakes of the world are great examples of this type of intrepid ecologist.

Buoyed with optimism and excitement about what they would find when analyzing data from lakes all over the world, the researchers set out to collect as many lake data sets as they could. Because they already knew about some of the challenges involved in working with existing data, they made a plan for how to organize and manage all of the data they were collecting. Each member of the team was in charge of contacting a few other researchers who might have data that could be used for the project. When researchers offered to share their data and a new data set was received, the team used a dedicated DropboxTM folder to share and archive the data.

The project was shaping up quite nicely, with a good number of data sets collected and ready for analysis. They scheduled to give a conference presentation on the project, excited to have the chance to talk about the work with their colleagues, and, as is common, also relying on the conference deadline to motivate them to move forward with the analysis. It was during the last-minute preparation for the conference that the first twinges of anxiety began to surface. The team member who was performing the analyses began to notice that many of the data sets were missing metadata.  Metadata, or ‘data about data’, include information about how, by whom, and for what purposes the data were created and to what exactly the data values refer. Without such metadata, it is impossible to understand exactly how a data set can be used. For the analysis of the lake data sets, the researchers needed information about exactly where the lakes are located (e.g., longitude and latitude), and the depth and surface area of each lake from which data were collected.

As anxiety about the missing metadata began to transform to panic under the weight of the presentation deadline, the ecologist contacted his collaborators and requested that they help out with some emergency ‘googling’ to see how much of the metadata they could find online. They were able to compile enough metadata to successfully go through with the presentation, but unfortunately some of the lake data had to be left out of the analysis because they were unable to find all of the information they needed. Although the presentation went well, the team was secretly a little disappointed that they hadn’t been able to include all of the data they had collected. When they returned from the conference, the set to work to contact the data contributors to request the missing metadata so that all of the available data could be included in future analyses.

How did the research team get into this situation? They had been proactive about setting up a procedure and platform for collecting, sharing, and managing the data for the study. How did the absence of crucial metadata slip past them? What could they have done to avoid this oversight?

Please share your ideas about what they could have done differently in the comments section below. Click here to read more about what the researchers did to retrieve the missing metadata and how they plan to do things differently in the future.

It turns out that no one on the project had specifically been assigned to manage the project metadata, and everyone assumed that someone else was doing it. Even though there was an agreed-upon data sharing platform and delegation of responsibility for collection of different data sets, planning for metadata collection and organization was overlooked.

One way to avoid this situation would have been to designate a specific team member who would manage the project metadata. Not only would this have ensured that someone would be worrying about the metadata earlier in the data collection process, it would have likely meant that original requests for data would have included requests for relevant metadata.

And what of the intrepid research team? The researchers are still in the process of trying to contact some of the data contributors to complete the metadata for a few remaining lakes.  While confident that they will eventually be able to recover the missing metadata for all of the data sets they had previously collected, backtracking to do metadata collection has slowed progress on the project. Overlooking metadata collection at the time of data collection has had tangible consequences beyond the scramble in the lead-up to the conference presentation. The extra time spent re-contacting data contributors has delayed analysis and preparation of a manuscript by at least a month.

Although needing to go back to recover missing metadata has made the data collection phase of this project more time-consuming than it would have been had data and metadata been requested at the same time, the experience has helped the team learn to anticipate the need for planning metadata management for their next projects. Many first-time field researchers assume that they’ll remember things about their data a lot later, so they won’t organize and keep track of their data in a way that allows them to look back later and know exactly what everything means. In the case of these researchers, the miscalculation was not related to thinking that their memories are stronger than they really are; rather, they neglected to anticipate the types of metadata they would need and make explicit plans for how they would collect and manage these metadata. Intrepid and adventurous they are, but backtracking to collect metadata is an activity they plan to avoid in the future. They have learned a lot about what it means to think ahead when it comes to planning for metadata. Now on to the next data re-use adventure!


Story contributed by Dr. Derek Gray with additional information from Kara Woo.

Leave a Reply

Your email address will not be published. Required fields are marked *