Workshop on Research Objects 2019

View the Project on GitHub ResearchObject/ro2019

Peer Review of RO-18

Review 1

Quality of Writing

Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author’s contribution?

Research Object / Zenodo

URL for a Research Object or Zenodo record provided?   Guidelines followed?   Open format (e.g. HTML)?   Sufficient metadata, e.g. links to software?   Some form of Data Package provided?   Add text below if you need to clarify your score.


Overall evaluation

Please provide a brief review, including a justification for your scores. Both score and review text are required.


The abstract describes an interesting real-world solution for the distribution of datasets that are managed via manifests. I think this poster will provide some interesting discussion topics, especially given that its design has been influenced by scientists in cell science. I think good figures and a running example will be useful for helping others understand the solution.

As the abstract is quite well structured and has 6 references it would be good if these also appeared as “Related identifiers” in the Zenodo record, or at least was hyperlinks from the PDF.

A link to the Dataverse software would also fit as a Related Identifier.

The Zenodo record do not specify the ORCID identifiers for any of the authors.

Overall evaluation

This abstract describes how Dataverse has been extended to support RDA data packaging recommendations for archiving along with recommended standards like BagIt. Structured metadata is examplified with the use of OAI-ORE combined with and other vocabularies.

Beyond checksums the text does not describe much about where the metadata comes from, presumably from Dataverse registration fields? Thoughts on how this could be used or expanded would be interesting, particularly considerations around if Dataverse could import such metadata from uploaded data packages as well (e.g. Dataverse-to-archive-to-Dataverse transfer).

I think this poster would be a strong fit for the RO2019 workshop, in particular for data packaging, infrastructure and metadata. It would be nice if the authors would also be able to demo the software during the unconference session.

Review 2

Quality of Writing

Research Object / Zenodo

[x] URL for a Research Object or Zenodo record provided? [x] Guidelines followed? [ ] Open format (e.g. HTML)? [ ] Sufficient metadata, e.g. links to software? [ ] Some form of Data Package provided?

The abstract is technically not Open Access, as it is uploaded to Zenodo with a custom BSD-like license that “prohibits redistribution and use for commercial purposes without further permission”. This is the same license as the described Quilt3Distribute source code, which I do not find on the list of OSI-approved open source licenses.

The authors should re-submit the abstract only under an Open Access license like CC-BY-4.0 that do not restrict commercial use of the text, and clarify within the text that the software is not Open Source (although the source code is available).

Overall evaluation

This abstract describes how datasets that are typically tabular also can be described in a tabular form, using local file name references to the “actual” datasets. (this kind of manifest is also found in ISATab and in SEEK data registration using RightField).

I am not sure if tabular here means CSV files or spreadsheets in formats like .xlsx, presumably the latter due to the mention of dealing with the distinction between integer and string values.

There are not many details provided about the Quilt Package manifest format produced, how packaging and data transfer is done practically, which metadata fields are commonly used or if any external vocabularies are re-used etc. This should be shown on the poster.

The Quilt software and its existing practical use does sound like an interesting piece to demo in Research Object workshop as it deals with producing data packaging in a “researcher friendly” way.

The abstract does not describe how the manifest/metadata is subsequently consumed/query - are they mainly for humans or are they also consumed programmatically across all datasets?