Link Search Menu Expand Document

RO-Crate Profiles

Table of contents

  1. Workflow RO-Crate profile
    1. Workflow Testing RO-Crate profile
  2. Describo profiles
  3. Paradisec profile
  4. Making an RO-Crate profile

While RO-Crates can be considered general-purpose containers of arbitrary data and open-ended metadata, in practical use within a particular domain, application or framework, it will be beneficial to further constrain RO-Crate to a specific profile: a set of conventions, types and properties that one minimally can require and expect to be present in that subset of RO-Crates.

Defining and conforming to such a profile enables reliable programmatic consumption of an RO-Crate’s content, as well as consistent creation, e.g. a form in a user interface form firmly suggest the required types and properties, and likewise a rendering of an RO-Crate can easier make rich UI components if it can reliably assume for instance that the Person always has a affiliation to a Organization which has a url - a restriction that may not be appropriate for all types of RO-Crates.

The RO-Crate profile can also lock down serialization expectation, for instance using a particular version of RO-Crate, JSON-LD context or particular packaging like .zip or BagIt.

Workflow RO-Crate profile

Workflow RO-Crate is a profile of RO-Crate 1.1, which requires at least one data entity that is a ComputationalWorkflow. This workflow must also be indicated with mainEntity from the Root Data set. The Workflow RO-Crate profile further recommends how to accompany the native workflow definition with a abstract CWL description, diagram, and meaning of named subfolders test and examples.

Profiles can also provide recommendations, for instance Workflow RO-Crate has a list of recognized workflow languages and licenses. Serialization of Workflow RO-Crates are restricted to ZIP archives and with a filename ending with .crate.zip.

This profile of RO-Crate is used by the WorkflowHub both as downloadable archives of workflow entries and their metadata, but also for manual and programmatic upload of workflows, such as generated by galaxy2cwl. The associated service Life Monitor uses the profile as a submission format for workflow test suites, and the workflow execution service WfExS accesses Workflow RO-Crates through the hubs’ implementation of the Tool Registry Service API to pick up the workflow to be executed.

In summary the Workflow RO-Crate profile covers as of 2021-03-09:

Workflow Testing RO-Crate profile

Workflow Testing RO-Crate is a specialization of Workflow RO-Crate used by Life Monitor to support the submission of test suites for computational workflows.

This profile is an RO-Crate extension that employs additional terms from the ro-terms test namespace https://github.com/ResearchObject/ro-terms:

[
    "https://w3id.org/ro/crate/1.1/context",
    {
        "TestSuite": "https://w3id.org/ro/terms/test#TestSuite",
        "TestInstance": "https://w3id.org/ro/terms/test#TestInstance",
        "TestService": "https://w3id.org/ro/terms/test#TestService",
        "TestDefinition": "https://w3id.org/ro/terms/test#TestDefinition",
        "PlanemoEngine": "https://w3id.org/ro/terms/test#PlanemoEngine",
        "JenkinsService": "https://w3id.org/ro/terms/test#JenkinsService",
        "TravisService": "https://w3id.org/ro/terms/test#TravisService",
        "instance": "https://w3id.org/ro/terms/test#instance",
        "runsOn": "https://w3id.org/ro/terms/test#runsOn",
        "resource": "https://w3id.org/ro/terms/test#resource",
        "definition": "https://w3id.org/ro/terms/test#definition",
        "engineVersion": "https://w3id.org/ro/terms/test#engineVersion"
    }
]

A Workflow Testing RO-Crate is essentially a Workflow RO-Crate with additional specification on how to structure test suites and refer to them from the "test" Dataset:

{
    "@id": "test/",
    "@type": "Dataset",
    "about": [
        {
            "@id": "#test1"
        },
        {
            "@id": "#test2"
        }
    ]
}

More details are available from the spec page.

Describo profiles

Describo has pre-defined default profile based on the RO-Crate specifications, which informs the user interface which entity types and properties should be offered and requested when a user is editing an RO-Crate.

The default profile covers the recommendations in RO-Crate 1.1 specifications, adding UI details such as value type, labels and description for each field.

In addition, a Domain-specific profile can be created as JSON and used by Describo to customize the selection of types and properties, including adding additional schema.org types, third-party vocabularies and inline ad-hoc term definitions.

In summary the default profile covers as of Describo 0.13.0:

Paradisec profile

The PARADISEC Describo profile is built in to Describo and is the basis for the PARADISEC RO-Crates exposed in the Modern PARADISEC demonstrator to annotate and expose digital cultural heritage records.

In summary the PARADISEC profile covers as of Describo 0.13.0:

  • Collection
  • Item (unknown type)
    • additionalType: Value
    • name: Text (required)
    • description: Text (required)
    • contributor: Person (required)
    • dateCreated: Date
    • dateModified: Date
    • adminComment: Text (ad-hoc term)
    • dialect: Text (ad-hoc term)
    • digitisedOn: Date
    • discourseType: Text (ad-hoc term)
    • external: Text (ad-hoc term)
    • bornDigital: Text (ad-hoc term)
    • citeAs: Text (ad-hoc term)
    • ingestNotes: Text (ad-hoc term)
    • languageAsGiven: Text (ad-hoc term)
    • license: License
    • hasPart: File
    • contentLanguages: Language (ad-hoc term??)
    • subjectLanguages: Language (ad-hoc term??)
    • private: Text (ad-hoc term)
    • originatedOn: Date (ad-hoc term)
    • originatedOnNarrative: Text
    • receivedOn: Text
    • publisher: Organization
    • contentLocation: Place
  • GeoBox (ad-hoc type)
    • name: Text
    • box: Text
  • License (ad-hoc type)
    • name: (selection of 4 values)
    • description: Text

This is a good example of how a specific profile can guide a user interface and extend RO-Crate with additional terms.

Note that some RO-Crates in the Modern PARADISEC demonstrator have evolved from this profile to conform with RO-Crate 1.1’s repository content types

Making an RO-Crate profile

As a starting point, an RO-Crate profile can be written down in structured human language, as exemplified by Workflow RO-Crate.

Consistent use of the key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL as described in RFC 2119 makes explicit what is a strict requirement of the profile, what are best practice recommendations, and what are open-ended extensions.

A more formal definition of the profile can take multiple forms, depending on the serialization requirements for the RO-Crate, and how open-ended or restricted the profile is intended to be:

  • JSON Schema requiring a restricted JSON form of RO-Crate JSON-LD. May include restricted JSON forms for expressing selected data and contextual entities in a certain way.
  • RDF Shapes expressed in ShEx or SHACL to check graph patterns like author must be of @type: Person and have affiliation to a @type: Organization that has a url to a valid URL
  • Hard-coded validator, e.g. checking expected folders like test/ exists or that a file really is of declared media type.

Implementors making validators may also want to first check that the RO-Crate metadata file is:

  • Valid JSON
  • Has expected/supported JSON-LD @context
  • Valid JSON-LD Compacted form
  • Valid JSON-LD
  • Valid RDF triples
  • Correct use of schema.org types/properties

Depending on requirements it may be beneficial to combine these approaches - for instance a hard-coded validator can rely on structural RO-Crate JSON checks before inspecting a particular data item in detail.