RO-Crate Profiles
Table of contents
- RO-Crate Profiles
While RO-Crates can be considered general-purpose containers of arbitrary data and open-ended metadata, in practical use within a particular domain, application or framework, it will be beneficial to further constrain RO-Crate to a specific profile: a set of conventions, types and properties that one minimally can require and expect to be present in that subset of RO-Crates.
Defining and conforming to such a profile enables reliable programmatic consumption of an RO-Crate’s
content, as well as consistent creation, e.g. a form in a user interface form firmly suggest the
required types and properties, and likewise a rendering of an RO-Crate can easier make rich UI
components if it can reliably assume for instance that the
Person
always has a affiliation
to a
Organization
which has a url
- a
restriction that may not be appropriate for all types of RO-Crates.
The RO-Crate profile can also lock down serialization expectation, for instance using a particular version of RO-Crate, JSON-LD context or particular packaging like .zip or BagIt.
For more examples, see the section Profiles of RO-Crate in use in the recent RO-Crate paper.
Workflow RO-Crate profile
Workflow RO-Crate is a profile of RO-Crate 1.1,
which requires at least one data entity that is a
ComputationalWorkflow. This workflow must also be indicated with mainEntity
from the Root Data set. The Workflow RO-Crate profile further
recommends how to accompany the native workflow definition with a
abstract CWL description, diagram, and
meaning of named subfolders test
and examples
.
Profiles can also provide recommendations, for instance Workflow RO-Crate has a list of recognized
workflow languages and
licenses. Serialization of Workflow RO-Crates are restricted to ZIP archives and with a filename
ending with .crate.zip
.
This profile of RO-Crate is used by the WorkflowHub both as downloadable archives of workflow entries and their metadata, but also for manual and programmatic upload of workflows, such as generated by galaxy2cwl. The associated service Life Monitor uses the profile as a submission format for workflow test suites, and the workflow execution service WfExS accesses Workflow RO-Crates through the hubs’ implementation of the Tool Registry Service API to pick up the workflow to be executed.
In summary the Workflow RO-Crate profile covers as of 2021-03-09:
Dataset
(RO-Crate Root)hasPart
:File,SoftwareSourceCode,ComputationalWorkflow
(required)File,SoftwareSourceCode,HowTo
"README.md"
:CreativeWork
?"test"
:Dataset
"examples"
:Dataset
mainEntity
:File,SoftwareSourceCode,ComputationalWorkflow
(required)File,SoftwareSourceCode,HowTo
license
:(Text|CreativeWork)
?"AFL-3.0"
"Apache-2.0"
"BSD-3-Clause"
- …
name
: Textdescription
: Textauthor
:(Person|Organization)
??keywords
: Text
File
,SoftwareSourceCode
,ComputationalWorkflow
programmingLanguage
:ComputerLanguage
(required)subjectOf
:File,SoftwareSourceCode,HowTo
image
:File,ImageObject
File
,SoftwareSourceCode
,HowTo
File
,ImageObject
ComputerLanguage
recommends instances:"@id": "#cwl"
"@id": "#galaxy"
"@id": "#knime"
"@id": "#nextflow"
"@id": "#snakemake"
An experimental Profile Crate has been created in preparation for formalized RO-Crate profiles (draft for next release of specification).
Workflow Testing RO-Crate profile
Workflow Testing RO-Crate is a specialization of Workflow RO-Crate used by Life Monitor to support the submission of test suites for computational workflows.
This profile is an RO-Crate extension that employs additional terms from the ro-terms test namespace:
[
"https://w3id.org/ro/crate/1.1/context",
{
"TestSuite": "https://w3id.org/ro/terms/test#TestSuite",
"TestInstance": "https://w3id.org/ro/terms/test#TestInstance",
"TestService": "https://w3id.org/ro/terms/test#TestService",
"TestDefinition": "https://w3id.org/ro/terms/test#TestDefinition",
"PlanemoEngine": "https://w3id.org/ro/terms/test#PlanemoEngine",
"JenkinsService": "https://w3id.org/ro/terms/test#JenkinsService",
"TravisService": "https://w3id.org/ro/terms/test#TravisService",
"GithubService": "https://w3id.org/ro/terms/test#GithubService",
"instance": "https://w3id.org/ro/terms/test#instance",
"runsOn": "https://w3id.org/ro/terms/test#runsOn",
"resource": "https://w3id.org/ro/terms/test#resource",
"definition": "https://w3id.org/ro/terms/test#definition",
"engineVersion": "https://w3id.org/ro/terms/test#engineVersion"
}
]
The most recent version of the context is at https://github.com/ResearchObject/ro-terms/tree/master/test.
A Workflow Testing RO-Crate is essentially a Workflow RO-Crate with additional specification on how to structure test suites and refer to them from the root dataset:
{
"@id": "./",
"@type": "Dataset",
"mentions": [
{
"@id": "#test1"
},
{
"@id": "#test2"
}
],
...
}
More details are available from the spec page.
Workflow Run Crate
Workflow Run Crate working group have developed three profile to capture the provenance of execution provenance of computational workflows and scripts. Join the RO-Crate community to help contribute!
- Process Run Crate can be used to describe the execution of one or more tools that contribute to the same computation;
- Workflow Run Crate is similar to Process Run Crate, but assumes that the coordinated execution of the tools is driven by a computational workflow
- Provenance Run Crate extends Workflow Run Crate with guidelines for describing the internal details of each step of the workflow.
Common Provenance Model RO-Crate profile
The Common Provenance Model (CPM) provides a baseline for distributed provenance chains based on W3C PROV. The Common Provenance Model RO-Crate profile specifies how to identify and handle CPM compliant provenance files and CPM compliant meta-provenance files in an RO-Crate. It is compatible with the Workflow Run Crate profile.
Describo profiles
Describo makes available all of Schema.org by default but users can also choose a pre-defined profile including an RO-Crate v1.1 profile, which informs the user interface which entity types and properties should be offered and requested when a user is editing an RO-Crate.
The RO Crate profile covers the recommendations in RO-Crate 1.1 specifications, adding UI details such as value type, labels and description for each field.
In addition, a Domain-specific profile can be created as JSON and used by Describo to customize the selection of types and properties, including adding additional schema.org types, third-party vocabularies and inline ad-hoc term definitions.
In summary the default profile covers:
Dataset
(e.g. RO-Crate Root)name
:Text
(required)description
:Text
license
:CreativeWork
datePublished
:Date
publisher
:Publisher
(required; ad-hoc type?hasPart
:(File|Dataset|Workflow|RepositoryCollection|RepositoryObject)
RELIANCE RO-crate profile
RELIANCE RO-Crates are a specialization of RO-Crate for packaging data cubes enabling access earth observation data, along with all the necessary and other related artifacts like documentation, images, related infrastructures, etc.
The RELIANCE project uses RELIANCE RO-Crates as an exchange format to package data cubes in Earth Science, used as import/export by the ROHub portal.
References:
- Metadata models and Research Objects for Earth Observation Data Cubes at EOSC Symposium 2021, 2021-06-18. [video recording]
- RELIANCE deliverable D5.1 RO Model Adapted to EOSC
Paradisec profile
The PARADISEC Describo profile is built in to Describo and is the basis for the PARADISEC RO-Crates exposed in the Modern PARADISEC demonstrator to annotate and expose digital cultural heritage records.
In summary the PARADISEC profile covers as of Describo 0.13.0:
Collection
additionalType
:Value
name
:Text
(required)description
:Text
(required)contributor
:Person
(required)comments
:Text
(ad-hoc term??)dateCreated
:Date
dateModified
:Date
depositFormReceived
:Date
(ad-hoc term)license
:License
(ad-hoc type)media
:Text
(ad-hoc term)orthographicNotes
:Text
(ad-hoc term)private
:Text
(ad-hoc term)contentLanguages
:Language
(PARADISEC term)subjectLanguages
:Language
(ad-hoc term??)contentLocation
:Place
Item
(unknown type)additionalType
:Value
name
:Text
(required)description
:Text
(required)contributor
:Person
(required)dateCreated
:Date
dateModified
:Date
adminComment
:Text
(ad-hoc term)dialect
:Text
(ad-hoc term)digitisedOn
:Date
discourseType
:Text
(ad-hoc term)external
:Text
(ad-hoc term)bornDigital
:Text
(ad-hoc term)citeAs
:Text
(ad-hoc term)ingestNotes
:Text
(ad-hoc term)languageAsGiven
:Text
(ad-hoc term)license
:License
hasPart
:File
contentLanguages
:Language
(ad-hoc term??)subjectLanguages
:Language
(ad-hoc term??)private
:Text
(ad-hoc term)originatedOn
:Date
(ad-hoc term)originatedOnNarrative
:Text
receivedOn
:Text
publisher
:Organization
contentLocation
:Place
GeoBox
(ad-hoc type)name
:Text
box
:Text
License
(ad-hoc type)name
: (selection of 4 values)description
:Text
This is a good example of how a specific profile can guide a user interface and extend RO-Crate with additional terms.
Note that some RO-Crates in the Modern PARADISEC demonstrator have evolved from this profile to conform with RO-Crate 1.1’s repository content types
ARC RO-Crate profile
A profile of RO-Crate for Annotated Research Contexts (ARC), developed by DataPLANT. An ARC consists of ISA metadata describing the experimental setup and computational workflows given in CWL. The current profile requires the crate to follow the ISA Investigation profile on the top level. In the future, the ARC profile will be extended to not only cover the ISA part of an ARC, but also computational workflows, following the existing profiles for this kind of data.
How such an RO-Crate can be generated from an ARC is described in the arc-to-rocrate repository, which also contains scripts to perform the conversion.
ISA Profile
A profile of RO-Crate for experimental data in plant sciences that is described by metadata following the ISA model.
Such datasets consist of three types of data entities: Investigation
, Study
and Assay
.
The profile adds requirements of the crate such that the data folders match the Investigation, Study and Assay objects of the ISA model.
The profile here describes the top-level Investigation
object (a dataset) and contained datasets following the Study
and Assay
profiles.
Profiles for other included types can be found in the full version.
ISA Investigation Profile
An Investigation
object describes the top-level meatadata of a scientific investigation, e.g. descriptions of the context, the title, authors and publications (see ISA model for details).
It SHOULD contain further datasets that follow the Study
profile.
Dataset
identifier
:Text
orURL
(required)headline
:Text
(required)description
:Text
(required)-
additionalType
:Text
(required) creator
:Person
(recommended)-
mentions
:DefinedTermSet
(recommended) dateCreated
:Date
orDateTime
(optional)datePublished
:Date
orDateTime
(optional)citation
:ScholarlyArticle
(optional)disambiguatingDescription
:Text
(optional)hasPart
:Dataset
(optional)
ISA Study Profile
A Study
contains information on the subject under study, its characteristics and any treatments applied(see ISA model for details).
It contexualizes further datasets that follow the Assay
profile.
Dataset
identifier
:Text
orURL
(required)headline
:Text
(required)additionalType
:Text
(required)hasPart
:Dataset
orFile
(recommended)about
:LabProcess
(recommended)description
:Text
(recommended)dateCreated
:Date
orDateTime
(recommended)-
dateModified
:Date
orDateTime
(recommended) datePublished
:Date
orDateTime
(optional)citation
:ScholarlyArticle
(optional)comment
:Comment
(optional)
ISA Assay Profile
An Assay
contains information about a test performed either on material taken from a subject or on a whole initial subject(see ISA model for details).
Dataset
additionalType
:Text
orURL
(required)creator
:Person
(required)identifier
:Text
orURL
(required)headline
:Text
(required)about
:LabProcess
(required)measurementMethod
:URL
orDefinedTerm
(required)-
measurementTechnique
:URL
orDefinedTerm
(required) hasPart
:File
(recommended)description
:Text
(recommended)variableMeasured
:Text
orPropertyValue
(recommended)-
dateModified
:Date
orDateTime
(recommended) dateCreated
:Date
orDateTime
(optional)citation
:ScholarlyArticle
(optional)comment
:Comment
(optional)
Electronic Lab Notebook (ELN)
The ELN file format has been defined as an archive format to capture Electronic Laboratory Notebooks (ELN). An archive is a ZIP file with the .eln
extension, containing a single root folder which is an RO-Crate. The ELN specification is based on the RO-Crate specification and is exported by Lab notebook software including eLabFTW and ELN archives can be previewed in Dataverse
RO-Crates-and-Excel
RO-Crates-and-Excel is a profile describing data of an Microsoft Excel spreadsheet within an RO-Crate, along with a converter tool.
This profile has been expressed
for Describo and
adds keys like number_of_rows
, stdev
, average_length
.
Note that this profile has not currently been updated to use JSON-LD terms.
Making an RO-Crate profile
As a starting point, an RO-Crate profile can be written down in structured human language, as exemplified by Workflow RO-Crate.
Consistent use of the key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL as described in RFC 2119 makes explicit what is a strict requirement of the profile, what are best practice recommendations, and what are open-ended extensions.
New profiles can use any https://schema.org/ terms, but are likely to need define or import
additional terms
that needs to be mapped in the @context
.
A more formal definition of the profile can take multiple forms, depending on the serialization requirements for the RO-Crate, and how open-ended or restricted the profile is intended to be:
- JSON Schema requiring a restricted JSON form of RO-Crate JSON-LD. May include restricted JSON forms for expressing selected data and contextual entities in a certain way.
- RDF Shapes expressed in ShEx or SHACL to
check graph patterns like
author
must be of@type: Person
and haveaffiliation
to a@type: Organization
that has aurl
to a valid URL - Hard-coded validator, e.g. checking expected folders like
test/
exists or that a file really is of declared media type.
Implementors making validators may also want to first check that the RO-Crate metadata file is:
- Valid JSON
- Has expected/supported JSON-LD
@context
- Valid JSON-LD Compacted form
- Valid JSON-LD
- Valid RDF triples
- Correct use of schema.org types/properties
Depending on requirements it may be beneficial to combine these approaches - for instance a hard-coded validator can rely on structural RO-Crate JSON checks before inspecting a particular data item in detail.
Formalizing RO-Crate profiles
Currently an RO-Crate profile is in the form of an informal documentation for humans rather than machine-readable.
Next version of the RO-Crate specification 1.2 will formalize RO-Crate profiles as a Profile Crate - an RO-Crate that defines the profile. See the Workflow Profile Crate as example.