On 2017-09-27 I met with Naomi Penfold and Giuliano Maciocci from the journal eLife, Oliver Buchtala and Michael Aufreiter working on Substance, and Nokome Bentley doing Stencila.

Our topic was the eLife/Substance/Stencila collaboration for a Reproducible Document Stack, and how that can relate to Research Objects and other Scholarly HTML initiatives.

This blog post gives a quick summary of our cross-presentations.

Stencila is an interactive editor for creating academic manuscripts using a combination of Markdown and live code (e.g. R). It is not dissimilar to Jupyter Notebook, but I guess a key difference is that the goal here is a paper-like manuscript rather than a collection of code – in Stencila the code is hidden by default, but can be revealed to explain how a figure was made. This YouTube video explains it better than me:

Substance’s Javascript library is is used by Stencila’s editor, and Substance developers are now looking at representing the embedded code as part of JATS XML, the markup standard commonly used by publishers when publishing research manuscripts, e.g. to create HTML, PDF, ePubs and searchable metadata:

 

There could be a series of such .xml files, as well as script source files, data and graphics in their native formats like .py, .csv and .png.

Now the question naturally arises – how do you move these things around? Their initial idea is that of an Open Container for Reproducible Portable Publications (RPP). Here there could be many overlaps with our Research Object Models and other ongoing research on scholarly communication, we therefore had a fruitful telcon with the eLife/Substance/Stencila folks, to look at reusing existing standards and formats like PROV and our RO manifest.

I then presented my slides on Research Objects:

I started with an overview of ongoing efforts in scholarly publication, like Scholarly HTML, RASH (https://doi.org/10.7717/peerj-cs.132), Dokieli (https://doi.org/10.1007/978-3-319-60131-1_33), Linked Research, and a recent hefty thread in the Scholarly HTML Community Group, and even ePub which is being reworked as part of W3C’s Web Publications for the Open Web Platform.

The second half of my presentation showed how Research Objects combine existing Web standards like JSON-LD, schema.org (and ELIXIR-backed bioschemas), W3C PROV and W3C Web Annotation Data Model and OAI-ORE model for aggregations. We formalized this in the Research Object ontologies (https://doi.org/10.1016/j.websem.2015.01.003) and how to serialize it to a archive file as an RO Bundle ZIP (https://doi.org/10.5281/zenodo.12586) file or a RO BagIt archive (https://doi.org/10.1109/BigData.2016.7840618).

We agreed that there are many benefits to working together, and so we intend to meet up again to flesh out the technical details like the manifest, while the eLife collaboration gather more requirements about what information that is desired from publishers as well as being achievable to capture from authors.