Documenting data quality mechanisms from citizen science project websites (2) – Week 7

This week I continue documenting data quality mechanisms by visiting and trying the first randomly chosen 100 citizen science projects‘ websites in the citizen science project list I compiled earlier. The initial findings include first, in terms of project discipline, it is similar to what we found among the papers searched from Scopus (see my post for week 3). More than half of the projects (N = 52) are related to biology and biodiversity conservancy, ten projects focus on water quality monitoring, and the rest of projects are about astronomy, chemistry, language, archaeology, or are platforms for helping building citizen science projects. I did not find a project focusing on the domain of geography. However, many projects related to biodiversity includes geographic information as one important part of their data.

Second, in terms of data quality mechanism, it seems that there are domain specific preferences of data quality mechanism for water quality monitoring projects. All ten projects about water quality adopt the mechanism of participant training, and the mechanism of quality assurance plan is found only among the water quality monitoring project. This phenomenon partially answers my question, or proves my guesses mentioned in the end of my week 4 blog post. However, for projects in other disciplines, I did not observe any obvious data quality mechanism preferences so far.

Third, when trying those project websites by myself, I observed that the level of technologies adopted by different projects varies a lot. For example, for data submission, it varies from purely offline submission, to email submission, to simply online submission form, to advanced online submission with smart filter. We can see that there is undoubtedly a development trend that computer technologies will be more and more adopted to assure and control data quality.

Fourth, as there is water quality project specifically emphasizing the difference between data collection (i.e., collect water sample in offline environment) and data entry (i.e., digitize water quality sample and input the data into database), and pointing out the importance of maintaining quality of the database by paying special attention on limiting data entry to “a very few volunteers” (http://www.brodheadwatershed.org/SWVolunteerpage.html), I was inspired to rethink the relationship between “collect” and “assure” in data life cycle, under the circumstance that human as sensor and data collector. Should data entry belong to “collect” or “assure”?

This week, we also start to think about one/multiple potential paper(s), based on the very first draft written by our mentor, we will mainly work on this draft next week, while I will continue documenting data quality mechanisms from citizen science project websites.

Leave a Reply Cancel reply