2016-01-28 Aspects of Reproducibility in Earth Science

On 2016-01-28, Raul Palma presented Aspects of Reproducibility in Earth Science at the Dagstuhl Seminar 16041 Reproducibility of Data-Oriented Experiments in e-Science:

** Aspects of Reproducibility in Earth Science ** from Raul Palma

The “_Earth Science Research and Information Lifecycle_” can be regarded as a continuous, iterative and ongoing process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations that lead to the development of new and innovative ideas, concepts, techniques and technologies, which ultimately benefit both science and society. The life cycle can be briefly summarized into four main phases that involve multiple categories of stakeholders: > > > 1. scientists access information and (usually) share results; > > 2. shared results and information are analysed and interpretative models are generated and discussed with other colleagues; > > 3. discussions lead to novel ideas and concepts which might need validation through further experimentation or data acquisition; > > 4. new results are validated and shared so that other scientists can access them and start the process again. > This presentation introduces the ongoing work of the [EU project EVEREST](http://ever-est.eu/) that aims at establishing a _Virtual Research Environment_ (VRE) e-infrastructure for Earth Science. The VRE is being validated in four communities: sea monitoring, natural hazards, land monitoring and supersites, and is applying the Research Objects concepts and technologies as the mean for sharing information and establish more effective collaboration in the VRE. Regarding the reproducibility in their domain, they have a slightly different vision as other disciplines like experimental science that often aims at testing a hypothesis. For instance Supersite community can be described as an historical science that is mostly based on past observations. For such community, the main goals involve measuring geophysical _parameters_ in the natural environment, derive information on the _effects_ of the phenomena, _model_ this information to generate _space/time_ representations and provide these representations to _risk management_ and other relevant stakeholders, and only complementary scientists may use this information to develop _theories_ or confirm _hypothesis_. Hence, in such communities, **reproducibility** is mainly concerned about the _execution_ of common or community-agreed _workflows_ for data analysis and modelling, and for _testing_ algorithms and data products. Nevertheless, there are still several limitations for achieving reproducibility in these communities: they are not yet using formalised (computational) workflows, the data necessary is not always available or known, workflows usually require considerable human intervention, etc. These are some of the challenges currently being addressed in EVEREST.

_Abstract by Raul Palma, distributed under Creative Commons Attribution 3.0 license, typographically adapted from Dagstuhl Report 16041 section 4.5 (https://doi.org/10.4230/DagRep.6.1.108) by Stian Soiland-Reyes. _

Tags: Dagstuhl