Some analysis from the coding and new paper collection from Google Scholar – Week 5

By the last week, I had completed the first round of coding for meta-analysis, and I reviewed the previous coding during this week. For the last two weeks, out team members have discussed the criteria of paper acceptance and mechanisms for data quality assurance. While reviewing the previous coding results, I focused on applying the newly established coding scheme through the discussion.

According to the coding results, data quality assurance mechanisms were most frequently used in biology, ecology, computer science, and geographic information science. Biology has 9 papers which have been categorized in the discipline, ecology has 8 papers, computer science has 7 papers, and geographic information science has 8 papers. The most papers from biology, ecology and ornithology had been searched by the keywords of “citizen science”, and the most papers from geographic information science had been searched by the keywords of “volunteered geographic information science”. In terms of paper from computer science and design science, they had been retrieved from both keywords.

The most dominant mechanism for data quality assurance was participant training. Researches from 17 papers trained participants for increasing data quality. Participant training was mostly used in citizen science. Only two papers in the field of volunteered geographic information (VGI) trained participants, although our paper collection may not be extensive enough. Data normalization was used frequently in the second place. Data collected by different participants or different methods was adjusted to fit a same scale or to filter unusual reports by using standard and advanced statistical techniques in 13 papers. There was no bias of usage in data normalization, thus it was evenly used for citizen science and VGI. Expert review was also used often, accordingly there were 10 papers utilizing experts’ knowledge and experience. These are only few examples of statistical analysis from the coding results. More extensive analysis will be done within next few weeks and the result will be shared.

I have also collected a new list of papers from Google Scholar. I have tested existing codes and programs which extract citation information from Google Scholar, and found a program called as “Publish and Perish”. This program performed the best in terms of its usability and collected the most amounts of data. The program can be downloaded from It is known that the quality of citation information from Scopus and Web of Science (WOS) is higher than the citation information from Google Scholar. However, we only could collect 67 papers from Scopus. With the Publish and Perish, we could collect 1,000 papers with the keywords of “citizen science” and “data quality”, and could collect 550 papers with the keywords of “volunteered geographic information” and “data quality”. We may need to clean the data, as some redundant papers can be included in the list. However, this extensive coverage of citation information will give us far better understanding on the terrain of citizen science and its efforts for assuring data quality. In the next week, we will review the list from Google Scholar to see how we can clean the data and will investigate methodologies on how we will integrate the result from Google Scholar with the result from Scopus.


Leave a Reply

Your email address will not be published. Required fields are marked *