I have done a cursory exploration of some potential starting points for this project. Here’s my notes… which are a bit scattered. Let me know if you have questions.
Ideas on where to start:
- Literature search on other estimates of data
- Definitions of data
- NEON: how they define data, estimate how much data they will produce, etc.
- Think about timeline: how much will amount of data change over time? What was it like in the past?
- Individual researchers: how much data do they generate? How much of the data is published versus how much is generated?
- Search for databases, database capacity
- Google search for definitions of data
- Comp sci textbooks?
- Data management (or database management) textbooks?
- Ecology/envi bio textbooks?
- from libraries? (Dartmouth Example)
Websites to check out:
- http://www.lesk.com/mlesk/ksg97/ksg.html
- http://news.bbc.co.uk/2/hi/technology/3227467.stm
- http://www.optitek.com/
- http://itknowledgeexchange.techtarget.com/whatis/how-much-digital-data-is-there-in-the-world-soon-to-pass-the-zettabyte-mark/
- Study at Berkeley; estimated amount of new info created each year
- Wiki entry on exabyte
- Something I found on google
Papers to check out:
- [Bell 1994]. Alan Bell; IBM Academy Digital Library Workshop (Sept 12-13, 1994).
- [Census 1995]. United States Census Bureau Statistical Abstract of the United States Government Printing Office (1995).
- [Fargion 1996]. G. S. Fargion, R. Harberts, and J. G. Masek An Emerging Technology Becomes an Opportunity for EOS From the online file; see the URL: http://ecsinfo.hitc.com/cdwg/datamining/overview.html.
- [Landauer 1986]. T. K. Landauer; “How much do people remember? Some estimates of the quantity of learned information in long-term memory,” Cognitive Science,10 (4) pp. 477-493 (Oct-Dec 1986).
- [Louis 1996 ]. Steve Louis Cooperative High-Performance Storage in the Accelerated Strategic Computing Initiative 5th NASA Goddard Conference on Mass Storage Systems and Technologies (Sept. 17-19, 1996 ). As reported by Ron Van Meter, http://www.isi.edu/~rdv/conferences/goddard96.html .
- [Markoff 1997]. John Markoff; “When Big Brother is a Librarian,” The New York Times pp. 3, sec. 4 (March 9, 1997).
- [Mauldin 1995]. Matt Mauldin, “Measuring the Web with Lycos,” Third International World-Wide Web Conference, April 1995.
- [Mills 1996]. Mike Mills; “Photo Opportunity,” Washington Post pp. H01 (January 28, 1996)
- [Radding 1990]. Alan Radding; “Putting data in its proper place,” Computerworld pp. 61 (August 13, 1990).
- [Tenopir 1997]. Carol Tenopir, and Jeff Barry; “The Data Dealers,” Library Journal pp. 28-36 (May 15, 1997).
- [UNESCO 1995]. UNESCO Statistical Yearbook Bernan Press (1995).
- [Wells 1938]. H. G. Wells World Brain Methuen (1938).
- The World’s Technological Capacity to Store, Communicate, and Compute Information (Martin Hilbert and Priscila López) Science 1 April 2011: 60-65.Published online 10 February 2011 [DOI:10.1126/science.1200970]
These papers are all on the server for you to access in a folder called “Reprints”. Their titles are [Last name of first author][Last two numbers of publication year]. Eg. Bollier10.pdf
- [1] D. Bollier. The Promise and Peril of Big Data. Technical report, The Aspen Institute, 2010.
- [2] S. Carlson. Lost in a sea of science data. The Chronicle of Higher Education, 52(42):A35, 6/23/2006 2006.
- [3] C. Doctorow. Big data: Welcome to the petacentre. Nature, 455(7209):16–21, Sept. 2008. PMID: 18769411.
- [4] P. B. Heidorn. Shedding light on the dark data in the long tail of science. Library Trends, 57(2):280–299, 2008. Volume 57, Number 2, Fall 2008.
- [5] D. Howe, M. Costanzo, P. Fey, T. Gojobori, L. Hannick, W. Hide, D. P. Hill, R. Kania, M. Schaeffer, S. S. Pierre, S. Twigger, O. White, and S. Y. Rhee. Big data: The future of biocuration. Nature, 455(7209):47–50, 2008.
- [6] C. Lynch. Big data: How do your data grow? Nature, 455(7209):28–29, 2008.
- [7] O. J. Reichman, M. B. Jones, and M. P. Schildhauer. Challenges and Opportunities of Open Data in Ecology. Science, 331(6018):703–705, February 2011.
- [8] V. S. Smith. Data publication: towards a database of everything. BMC research notes, 2(1):113, 2009.
