Exploration of Search Logs, Metadata Quality and Data Discovery: Week 6

For week six of my internship, I’ve been diving into the spatial search history of DataONE. The DataONE search interface includes a map that allows users to restrict search results to a spatial area on the map. We were curious about what areas of the Earth are the most common targets for DataONE search. Of the approximately 1.6 million search events in the logs, about 63,000 included explicit spatial restrictions. These searches occur at different scale levels, depending upon how far the map interface is zoomed in during the search.

I extracted the geographic data from the query logs and combined them with the ID of the session in which they occurred and the date/time of the query. This means that we can potentially follow a session and see how the spatial search was refined by steps, although I haven’t done that analysis, yet. What I have done is produced some basic maps showing the hotspots where users search for data in DataONE.

First, a global scale map. We can see from this map that the highest concentration of geographic search activity (the red shaded boxes) is in North America, with elevated search activity in South America, Western Europe, and Southeastern Asia. You can see that the size of the grid cells appears variable, but they are actually all the same size. This distortion is caused by the map projection.

Level 2 Map

Next, here is a map showing smaller-scale date overlaid on the global map. We can see that the searches in North America aren’t uniformly distributed across the continent; there are hotspots in the east and west. We can start to see more search activity in Africa and Australia and more detail to searches in Europe and the Arctic.

Level 3 Map

Finally, a closer view of the continental United States and Central America. The grid cells shown here are too small to show clearly on a global-scale map, but they work pretty well at the continent scale. We can see clearer definition of hotspots in North America. The west coast in general, and particularly the area between San Francisco and Los Angeles is high activity. The Denver area, and Chicago area have high activity, and the corridor from Boston through New York and Washington stands out as well.
Level 4 CONUS Map
We can create similar maps for other areas of interest, but I’m leaving them out of this post for brevity. Beyond this scale level, the grid cells become more appropriate for county- or city-scale display, so I won’t be going in to that level of detail.

That’s it for this week. As always, take a look at the GitHub repository for the project as well as the hpad for more technical details about the analyses shown here. Also, if you haven’t seen it already, take a look at Megan Mach’s update for this week. She’s taken some of the data and graphics I’ve been working on and turned them into some nice (and prettier!) materials for her messaging internship.

Leave a Reply

Your email address will not be published. Required fields are marked *