Finished the first round coding – Week 4

My major accomplishment in this week is that I finished the first round coding. Fifty-six papers were coded.

I wrote in my blog last week that I found that most citizen science projects mentioned in those papers I coded in previous two weeks (the key words we used to search papers are “citizen science” and “data quality”) are closely related to nature source and life science. And those citizen science projects are relatively diverse, not matter for the topics (e.g., birds, plants, marine animal, water quality), the project size, or funding source, and etc. The papers I coded in this week (the key words we used to search papers are “volunteered geographic information” and “data quality”) gave me very different impressions. Because “volunteered geographic information” is a key words from geography, the scope of papers is narrowed down to the field of geography and geographic information system. So all the projects mentioned in the papers focus on solving geography related problems. The most frequently studied project in those papers is OpenStreetMap. Unlike the diversity of the citizen science projects I saw in previous two weeks, about half of papers I coded this week is about OpenStreetMap. OpenStreetMap is not a pure citizen science project, it is more like a wikipedia style platform.

Another thing I noticed is that unlike the papers I coded in previous weeks that talked diverse data quality mechanisms, the papers I coded this week really focus their attention on data quality assessment, given that they have already gotten very rich data from OpenStreetMap or other comparable sources. Most of them do not care too much about how the raw geographic data is collected and how to increase the data quality when it is about to be collected, but care how to get more useful and credible results from those raw data. Maybe because on the one hand, there has already been professional standards defining what is qualify geographic data (e.g., IOS) existing for a long time; on the other hand, the raw data are collected by digital sensors which have high credibility. Large amount of human source (i.e., crowdsourcing), is considered by GIS researchers as a way of evaluating the accuracy of the raw data or other contributors’ data, rather than collecting the first-hand scientific geographic data.

From coding these papers, I felt that the concern of data quality contributed by crowds is a widely shared concern among researchers across academic fields, but how they deal with this concerns, in other words, the methods of assuring, increasing, or controlling data quality adopted by those researchers in what stage of the data lifecircle, could be significantly different from each other. Whether there is domain-specific data quality concerns and data quality mechanisms? This is a question I want to explore in next week.


Leave a Reply

Your email address will not be published. Required fields are marked *