The Long and Winding Road to Public Data

CC-BY-NC-SA by DJOtaku via flickr
CC-BY-NC-SA by DJOtaku via flickr

Dr. Watson was accustomed to seeing dead things. As a wildlife ecologist, he had made a career out of investigating animals and their untimely demise under the rumbling engines of motor vehicles. Animal road mortalities had a reputation for being difficult to track because the majority of incidents went unreported, so Watson had to come up with creative alternatives for obtaining data. To answer the questions he was most interested in, Watson needed to know exactly where these accidents were happening. The limited data he could access through the police department, however, rarely contained this level of detail. Instead, he had learned to make use of contacts at local agencies and share data back and forth as needed. Watson had developed a reputation for being a friendly, trustworthy collaborator, and his colleagues were happy to help where they could.

Manuel, a graduate student at the university where Watson taught, also had a fascination with the ecological impact of road traffic. Manuel was planning a thesis on road mortality patterns in deer, and was anxious to find more data… but he was stuck. He had neither the resources for collecting new data nor the connections or institutional know-how to locate existing sources. He asked Watson if he knew of any datasets from New England that might contain the data he needed. Though he listened patiently to Manuel and sympathized with his dilemma, Watson informed him that the data simply did not exist.

But Manuel was not to be put off. In his native country, data on animal collision fatalities were plentiful – drivers were required by law to report all accidents involving wildlife. Each year, the body of data grew (especially during times of increased animal activity, like mating seasons). Surely someone, somewhere in New England, recognized the value of that data and was maintaining the stats he needed for his work? If so, couldn’t Dr. Watson help locate it? After all, what did Dr. Watson have to lose?

Seeing that Manuel would not be deterred, Dr. Watson agreed to ask around and determine if such a dataset could possibly exist… but he was more than skeptical. Deer were frequently the victims of car accidents all over the northeast; their grisly remains littered roadsides across the region, and it was highly unlikely that anyone was keeping tally on what was merely an unfortunate fact of life. “I’ll make some inquiries,” Watson promised, “but let’s not get our hopes up.” Manuel nodded and made his exit, though Watson was sure he would hear from him again soon.

Who would he turn to for help? Dr. Watson had of course made many contacts over the years, so he had a few candidates in mind. He would email those that might have an interest in or know of such a dataset (if it even existed). They would also have to be well-connected within their various agencies; ten years on the job had taught him that many of the state organizations operated as silos, wrapped up in their own affairs and cut off from one another’s efforts. He would need to cast a wide net – several, in fact – if he hoped to get a glimpse of this mythological dataset.

After clicking through his email contact list a few times, Dr. Watson concluded his search with a grand total of five names. Four people were managers of their agencies, and the remaining person was a researcher with an outstanding history of cross-collaboration.

Once composed, the message was short, casual, and to-the-point. Watson described what he was looking for and what his graduate student wanted with the data. His requests were minimal: that they contact him if anything turned up, and that they forward the request on to others in their network. When they arrived, the responses were precisely what Watson had imagined. All were friendly and willing to support the search, but none offered even a glimmer of hope for the increasingly-lost cause.

“I would love to see a dataset like this,” the lone researcher replied, “but I just don’t think anyone is working on it right now.” Each respondent wished him well and promised to write back with any news, but that was all. Even Watson’s friend at the Department of Transportation turned up empty-handed. The man had spent years as the point person for a multi-agency research project on reducing animal-vehicle collisions; if he couldn’t point Watson to the dataset, it simply was not to be found. He had done what he could, but Watson knew it was time to give up the ghost hunt. Manuel would be disappointed, but he would understand.

Watson sat down at his desk to write a consolatory email to Manuel. “Well, we gave it our best shot,” he began weakly. No sooner had he clicked SEND than a message appeared in his inbox – from the size of it, something big. It was from a woman named Charlotte from some obscure division at the Department of Transportation. She had heard of his inquiry from a friend of a colleague of a colleague some ways up the grapevine, and she thought she might be able to help. While they had never met, Charlotte knew of Watson by reputation. What’s more, she was an alumna of the very university where Watson worked!

When he opened the impossibly large attachment, Watson let out a whoop of excitement. It was an Excel spreadsheet with over 27,000 geo-referenced deer road mortalities – the very thing Manuel was looking for.

Watson had gone to the Department of Transportation. He had searched through its online files and poked around its various divisions. All of his searching had convinced him that this dataset, the one sitting in his very inbox, did not and had never existed. So what made this seemingly-doomed expedition an unexpected success? Watson was tempted to attribute his and Manuel’s good fortune to serendipity, but knew that luck alone had not delivered the dataset into their hands. He suspected he could learn a thing or two from Manuel’s dogged optimism.

And yet, the whole situation had a foul air about it which had little to do with deer carcasses. When Watson really thought about the circumstances of the data acquisition, it seemed a little silly that so much serendipity, persistence and luck were needed to unearth The Spreadsheet. After all, shouldn’t all data collected using public funds be openly shared to begin with? Why should such a useful resource be locked away in agency fortresses, waiting for the day that a determined graduate student and his advisor finally sniffed it out of obscurity? Research policies were evolving to embrace public data sharing and open access, but not fast enough for other aspiring researchers like Manuel. In the meantime, many of them resigned themselves to more conventional data collection projects after finally accepting that the data they needed did not exist – a hard truth, perhaps, but one that hopefully would not last.

Leave a Reply

Your email address will not be published. Required fields are marked *