During this week, I have carefully reviewed on how the eBird project has adopted and developed data quality mechanisms in its history. To do this, I have collected 25 papers and read them prudently. Among these papers, I have found 16 papers are relevant and describing quality assurance mechanisms in the eBird project. Thus, I have applied coding scheme that our project team has been developed so far.
After coding the eBird related papers, I could find that there are four distinct trends of changes in how the eBird project has applied quality assurance mechanisms. First, temporal point in which the eBird applied the quality mechanisms has been changed. In the beginning of the project, it mainly applied the quality mechanisms while collecting data. For instance, if anomalous data had been entered, quality filter had been flagged so that experts could carefully review them. However, now the eBird more focuses on the posterior process. For instance, they developed the SpatioTemporal Exploratory Model (STEM) to make predictions at unsampled locations and times.
Second, in the early phase of the project, more human resources such as experts had been used to filter out unusual report. These regional experts are still valuable foundation for the quality mechanism, however, more machine-based approaches are utilized these days. For instance, STEM, a kind of machine learning algorithm, improves the predictive performance of eBird by guiding the sampling process and combining density information and information-theoretic measures.
Third, the researchers in the eBird had more interests in domain knowledge on birds to filter out unusual reports in the early days, however, they are actively utilizing spatial-temporal statics for the quality mechanisms. Final trend in the quality mechanisms is that the project is adapting not only homogeneous data sources, but also utilizing heterogeneous sources to predict un-sampled area in terms of spatial and temporal space. For instance, land use information from remotely sensed satellite images, which is not direct observation by citizen scientists, can support to discern whether a specific area is appropriate for specific bird species or not.
Our team started to develop a draft of a paper and we will keep on working to get ideas for making the best result from our experiences in the DataONE summer internship for the next week.