Documenting data quality mechanisms from citizen science project websites – Week 6

Based on the academic papers collected from Scopus and the citizen science project list, my work for this week includes first, organizing paper coding results by creating a table containing the information that for each unique citizen science project, what data quality mechanisms this project adopted and what details of these mechanisms mentioned in one or multiple academic papers. For examples, for the project of eBird, there are 6 papers reporting 10 different data quality mechanisms in total. As eBird is one of the most famous pioneer citizen science projects, the number of papers and data mechanisms found in the papers for eBird are more than any other projects mentioned in the papers we coded. But what need to be noted is that, the data quality mechanisms reported in the papers for each project might not be considered as comprehensive mechanisms adopted by each project. Those projects might have done more for assuring and controlling their data quality than that has been written down and published.

Second, as we have tried documenting data quality mechanisms of citizen science project from reviewing published academic papers, I want to try how we can document data quality mechanisms from more projects who might not have a published paper about their data quality. So I randomized the order of the projects in the citizen science project list, and then analyzed the first twenty by carefully reviewing their website content and trying their data collection tasks if I can. I found that although the project websites could provide some information about what data quality mechanisms adopted by these projects, the information is very limited. There is more information on “before data collection” mechanisms than “during and after data collection” mechanisms, which seems make sense that the project scientists, managers, or staffs’ primary motivation of building the websites is to attract more people from the public to contribute to their projects, rather than show everybody that how they deal with data afterward. The most frequently mentioned mechanism I observed so far is participant training, which is not surprise. Some projects provide aggregated results as feedback to volunteers on their websites, showing that they are using some statistical methods and computer visualization technologies to handle the data. I infer that if a project uses visualization technologies, it is very likely that it adopts mechanisms like data normalization or data mining (need to be discussed with my teammates and more supported evidences).

The information about data quality mechanisms we can get from academic papers and citizen science project websites has some limitations as I mentioned above, in order to document citizen science project data quality mechanisms more comprehensively, we could consider sending survey or interviewing a few project coordinators to get more information. Therefore, I developed a survey draft.

Next week, I will continue working on documenting data quality mechanisms from citizen science projects in the citizen science project lists, but as we discussed in our project weekly meeting, I will more focus on three specific domain: water quality, biodiversity, and geography.

Leave a Reply Cancel reply