For my second week of the internship at the DataONE, I continued meta-analysis by calibrating a sample Excel worksheet for the analysis and reading papers. By communicating through threads of emails, our team shared a variety of ideas to improve the worksheet. As for me, I have read 25 papers and scored them about whether the paper should be included in our meta-analysis and what kinds of mechanisms have been used to improve the data quality in the study. For further discussion, we left extra notes to keep our ideas of decision whenever we got confused. These reading and proto-evaluation will be great assets for inducing agreement of coding for inclusion or exclusion. For instance, whether we need to include simple review papers which have not proposed any new idea or application will be critical decision that we should make in near future. The same dataset or project, sometimes, have been reviewed by a lot of different authors, thus allowance of duplicated materials will be another important issue in our research.
I also keep exploring the tools for scrapping citation information from Google Scholar. As I talked with our group, I found that Google sometimes blocked out my IP address of the machine in which I am running a python code. I have not found the whole pipelines of extracting citation information yet, but there can be issues of incomplete list of citation, as the block by Google cannot be managed by us. I also started to read a manual of R, a programming environment for statistical data analysis and graphics, and run sample codes, as we will use R for analysis and graphing our meta-analysis results. I think our group made a good start and the conversations with our members has been always motivating and productive.
In case it’s relevant, you might have a look at a data quality evaluation framework that I’ve been working on, to allow consumers to gather up others’ perspectives on the datasets that they’re interested in. It exposes those evaluations as Linked Data (though, the datasets themselves need not be Linked Data…).
HTH, https://github.com/timrdf/DataFAQs/wiki.
Regards,
Tim