On 2016-01-28, Raul Palma presented Aspects of Reproducibility in Earth Science at the Dagstuhl Seminar 16041 Reproducibility of Data-Oriented Experiments in e-Science:

The “Earth Science Research and Information Lifecycle” can be regarded as a continuous, iterative and ongoing process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations that lead to the development of new and innovative ideas, concepts, techniques and technologies, which ultimately benefit both science and society. The life cycle can be briefly summarized into four main phases that involve multiple categories of stakeholders:

  1. scientists access information and (usually) share results;
  2. shared results and information are analysed and interpretative
    models are generated and discussed with other colleagues;
  3. discussions lead to novel
    ideas and concepts which might need validation through further experimentation or data
    acquisition;
  4. new results are validated and shared so that other scientists can access them
    and start the process again.

This presentation introduces the ongoing work of the EU project EVEREST that aims at establishing a Virtual Research Environment (VRE) e-infrastructure for Earth Science. The VRE is being validated in four communities: sea monitoring, natural hazards, land monitoring and supersites, and is applying the Research Objects concepts and technologies as the mean for sharing information and establish more effective collaboration in the VRE. Regarding the reproducibility in their domain, they have a slightly different vision as other disciplines like experimental science that often aims at testing a hypothesis. For instance Supersite community can be described as an historical science that is mostly based on past observations.

For such community, the main goals involve measuring geophysical parameters in the natural environment, derive information on the effects of the phenomena, model this information to generate space/time representations and provide these representations to risk management and other relevant stakeholders, and only complementary scientists may use this information to develop theories or confirm hypothesis.

Hence, in such communities, reproducibility is mainly concerned about the execution of common or community-agreed workflows for data analysis and modelling, and for testing algorithms and data products. Nevertheless, there are still several limitations for achieving reproducibility in these communities: they are not yet using formalised (computational) workflows, the data necessary is not always available or known, workflows usually require considerable human intervention, etc. These are some of the challenges currently being addressed in EVEREST.

Abstract by Raul Palma, distributed under Creative Commons Attribution 3.0 license, typographically adapted from Dagstuhl Report 16041 section 4.5 (https://doi.org/10.4230/DagRep.6.1.108) by Stian Soiland-Reyes.