A First Foray into Data Sharing

CC-BY-NC by Johanna Madjedi
CC-BY-NC by Johanna Madjedi via flickr

Acquiring and reusing data can seem intimidating for even the most experienced researchers. For graduate student Roger Dole, it was a new adventure altogether. Roger had spent the last two years of his studies specializing in global butterfly migration. With hopes of finding a paper topic that would really jumpstart his career in entomology, he sifted through journals and pored over every applicable article he could get his hands on. Finally, after much searching and many late nights cocooned in a secluded corner of the library, Roger got the break he had been waiting for – a collection of articles on butterfly swarms in Mexico by Lydia Ames. Although this was the first time Roger had heard of Lydia or her work, the analyses described in the articles referred to data that appeared to include information about exactly which species were observed in the swarms. This was exactly what he had been looking for – he had to contact her! From his research, Roger gathered that Lydia had moved on to other projects since publishing the articles over a decade previously. But suppose she still had the dataset tucked away somewhere and would be willing to share? He had to find out.

Roger dared not let his excitement show or his hopes swell. While he very much wanted to include the dataset in his analyses, he knew it would likely be a slow process. It seemed to Roger that all the stories he had heard involving efforts to acquire and re-use data had ended in one of two ways: they either failed to elicit a response from the data creator or ended in bitter disagreements about credit allocation. Fully aware that his best efforts might not yield the data that so interested him, Roger prepared himself for the possibility of rejection. Low expectations, he had found, had a way of making disappointment more bearable.

Roger drafted a polite email to Lydia. He decided to keep it brief in deference to Lydia’s likely busy schedule and focused on introducing himself, explaining his research objectives and why he was interested in her dataset. The email concluded with the big question – would she be willing to send the dataset and grant him permission to use it?

Concerned that he had asked too much of a perfect stranger, Roger busied himself with other tasks in preparation for an agonizing wait. But in less than a week, he received a reply. Lydia was surprised at Roger’s interest in her long ago findings, but she was perfectly willing to give him access to her dataset so that he could incorporate it into the analyses for his paper. What’s more, she was ready to send him the dataset at once!

Delighted with the outcome and shocked at how quickly Lydia was able to provide the dataset and relevant metadata, Roger wondered how he had come to be so lucky. A nearly-forgotten dataset, perfectly preserved and with the metadata intact? The stories that filled his head with data sharing nightmares had left him unprepared for what he thought an unlikely outcome. What was Lydia’s secret to taming a supposedly terrifying process and making it look so effortless? And were there other researchers who practiced such craft?

As it turned out, the secret to Lydia’s success was remarkably simple, yet effective. During her data collection, she did something that perhaps too few researchers consider until after the opportunity has passed: she came up with a data management plan and carried it through to the end of her research. Rather than rely on handwritten lab notebooks or other physical data storage, Lydia organized her hard-won data into electronic files. Each file was partnered with notes to jog the memory and other metadata that helped make sense of how it all fit together. With all of the necessary pieces in place, the data waited ready for a time when Lydia (or some other lucky researcher!) might need it again.

If Lydia had been less deliberate in her data collection and organization, she might have wished away Roger’s request or insisted she didn’t have time to sort through a dataset overlaid with ten years of figurative dust. But with the desired information just a few mouse clicks away, she was both willing and able to lend her data to a researcher that would put it to good use.

As is often the case when data are reused, Lydia’s butterfly data needed to be transformed before they would be useful for Roger’s study. Lydia had recorded how many individuals of each species she had observed in each swarm in a way that was useful for her behavioral analyses. Roger, on the other hand, wasn’t interested in how many individuals of each species were present, but simply whether or not each species was observed, and in which locations and during which times of the year. Over a series of emails, Roger described what he needed and Lydia graciously provided it. By communicating clearly about which data he had on file and suggesting which figures would be most useful for his analyses, Roger made his requests easy for Lydia to meet. The student’s outstanding communication skills and obvious gratitude impressed Lydia and made her wonder how many more scientists could benefit from the data she had so far kept to herself. She continued sending Roger material, and in the process even created a derived dataset that would prove more relevant to his work. This task (which could have materialized as a major headache) was reduced to a minor chore under the ordered structure of Lydia’s data management plan. With only a few hours of work, Lydia was able to prepare a condensed dataset and send it on its way to a very appreciative Roger Dole.

Two years later, when preparing an article based in part on analysis of the shared dataset, Roger insisted on naming Lydia as a co-author. Encouraged by his overwhelmingly positive experience with data sharing, Roger had contacted other researchers and combined data from several contributors – all credited in the final publication. Lydia also found the exchange to be very rewarding and decided to finally submit her data, made useful again in the hands of another researcher, to a repository where it could be more easily discovered and accessed by researchers.

Roger and Lydia’s experience may seem remarkable to those that have lived through the confusion and disappointment of failed data sharing. Thankfully, as norms and knowledge of data management and sharing continue to evolve, this happy ending is becoming a more familiar one.

Leave a Reply

Your email address will not be published. Required fields are marked *