Understanding how scientists analyze data
Scientists use a wide variety of tools and techniques to manage and analyze data. However, to our knowledge no one has taken a systematic look at how scientists do their work. In this project, we will examine a large number of the scientific workflows that have been constructed. We will develop a way of categorizing workflows based on their complexity, types of processing steps employed, and other factors. The goal is to develop new and significant understanding of the scientific process and how it is being enabled by science workflows.
In particular, the research being done will look at the use, complexity, and user-base of the workflow programs Taverna and Kepler, through using openly available data and a literature review. Much work is being done using open-source depositories of existing workflows, such as can be found on the Kepler and Taverna websites, and on other sites such as myExperiment.
For more information on what workflows are and why and how scientists use them, consult this page on the Taverna site.
This work will be undertaken by several established scientists in the field of bioinformatics and ecological science. There are several researchers working on this project:
- William Michener – Primary Mentor (University of New Mexico)
- Bertram Ludäscher – Mentor (University of California, Davis)
- Rebecca Koskela – Mentor (University of New Mexico)
- Karthik Ram – Mentor (University of California, Berkeley)
As well, there is one full-time intern taking part in the experiment: Richard Littauer, a graduate of the University of Edinburgh, going on to the Universität des Saarlandes next year. He is responsible for much of the research, and the content and upkeep of this site.
The program began on May 23rd, and will continue until July 30th. Work will be presented at the DataONE All-Hands-Meeting, at the University of New Mexico, on October 18-20th.
The Data Observation Network for Earth (DataONE) is a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation. For more information, consult the DataONE Site.
DataONE is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license.
The Summer Internships are supported by The National Science Foundation: “INTEROP: Creation of an International Virtual Data Center for the Biodiversity, Ecological and Environmental Sciences” (NSF Award 0753138) and “DataNet Full Proposal: DataNetONE (Observation Network for Earth)” (NSF Award 0830944).
For More Information
For information about this project, please direct emails to richard [dot] littauer [at] gmail [dot] com.