Skip to content Skip to footer

RO-Crate Structure

Table of contents

  1. Types of RO-Crate
  2. RO-Crate Metadata Document (ro-crate-metadata.json)
  3. Attached RO-Crate Package
    1. Payload files and directories
    2. RO-Crate Website (ro-crate-preview.html and ro-crate-preview_files/) for Packages
    3. Self-describing and self-contained
  4. Detached RO-Crate Package

Types of RO-Crate

An RO-Crate Metadata Document is used to package data in one of two ways:

  1. An Attached RO-Crate Package that defines an on-disk collection of data:
    • It is defined within a file-system-like service as a directory (known as the RO-Crate Root) with the RO-Crate Metadata Document saved in a file-like entity with a file name of ro-crate-metadata.json.
    • References to files and directories in the RO-Crate Metadata Document are present in the RO-Crate or available online as Web-based Data Entities.
    • Typically, software processing an Attached RO-Crate Package would be passed a path to a directory or a zip file.
  2. A Detached RO-Crate Package:
    • Is defined by a stand-alone RO-Crate Metadata Document which may be stored in a file or distributed via an API.
    • References to files and directories in the RO-Crate Metadata Document are all Web-based Data Entities.
    • If stored in a file, known as a Detached RO-Crate Metadata File, the filename SHOULD be ${prefix}-ro-crate-metadata.json rather than ro-crate-metadata.json where the variable ${prefix} is a human readable version of the dataset’s ID or name, to signal that on disk, the presence of the file does not indicate an Attached RO-Crate Data Package.
    • Typically, software processing a Detached RO-Crate Package would be passed a path to a file, an absolute URI, or a JSON string or object, without a directory context.

RO-Crate Metadata Documents MAY also be processed in non-packaging contexts by tools such as website generators or crate visualizers, where data entities are not processed, or in applications which use RO-Crate conventions for representing context (such as a schema definition using Schema.org conventions).

In all crates the metadata is completed with contextual entities that further describe the relationships and context of the data to form a Research Object.

RO-Crate Metadata Document (ro-crate-metadata.json)

JSON-LD is a structured form of JSON that can represent a Linked Data graph.

The graph MUST describe:

  1. The RO-Crate Metadata Descriptor
  2. The Root Data Entity
  3. Zero or more Data Entities
  4. Zero or more Contextual Entities

Any referenced contextual entities SHOULD also be described in the RO-Crate Metadata Document with the same identifier. Similarly any contextual entity in the RO-Crate Metadata Document SHOULD be linked to from at least one of the other entities using the same identifier.

The appendix RO-Crate JSON-LD details the general structure of the JSON-LD that is expected in the RO-Crate Metadata Document. In short, the rest of this specification describes the different types of entities that can be added as {} objects to the RO-Crate JSON-LD @graph array below:

{ "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
  "@graph": [

  ]
}

Attached RO-Crate Package

An Attached RO-Crate Package is used to contain and describe an optional payload of files and directories as well as web-based data entities along with their contextual information.

When processing an Attached RO-Crate Package the RO-Crate Metadata Document MUST be present in the RO-Crate Root and it MUST be named ro-crate-metadata.json.

An Attached RO-Crate Package can be stored and published in multiple ways depending on its use:

A valuable feature of the Attached RO-Crate Package approach is that the metadata is preserved when a crate is transferred between these types of storage/publication systems.

The file path structure an Attached RO-Crate Package MUST follow is:

<RO-Crate root directory>/
|   ro-crate-metadata.json    # RO-Crate Metadata File containing the RO-Crate Metadata Document MUST be present 
|   ro-crate-preview.html     # RO-Crate Website homepage MAY be present
|   ro-crate-preview_files/   # MAY be present
|    | [other RO-Crate Website files]
|   [payload files and directories]  # 0 or more

The name of the RO-Crate root directory is not defined, but a root directory is identifiable by the presence of the RO-Crate Metadata File, ro-crate-metadata.json. For instance, if an RO-Crate is archived in a ZIP-file, the ZIP root directory is an RO-Crate root directory if it contains ro-crate-metadata.json.

The payload files and directories MAY be described within the RO-Crate Metadata File as Data Entities. Additional Web-based Data Entities MAY also be described, but are not considered part of the payload.

The @id of the Root Data Entity in an Attached RO-Crate Package MUST be either ./ or be a URI, such as a DOI URL or other persistent URL which is considered to be the main identifier of the Attached RO-Crate Package.

Payload files and directories

These are the actual files and directories that make up the payload of the dataset being described in an Attached RO-Crate Package.

The base RO-Crate specification makes no assumptions about the presence of any specific files or folders beyond the reserved RO-Crate files described above.

Payload files may appear directly in the RO-Crate Root alongside the RO-Crate Metadata File, and/or appear in sub-directories of the RO-Crate Root. Each file and directory MAY be represented as Data Entities in the RO-Crate Metadata File.

An RO-Crate may also contain Web-based Data Entities that are not present as part of the payload and referenced using absolute URIs. These may require additional preservation measures.

RO-Crate Website (ro-crate-preview.html and ro-crate-preview_files/) for Packages

In addition to the machine-oriented RO-Crate Metadata Document, an Attached RO-Crate Package MAY include a human-readable HTML rendering of the same information, known as the RO-Crate Website. If present, the RO-Crate Website MUST be a file named ro-crate-preview.html in the root directory, which MAY serve as the entry point to other web-resources, which MUST be in ro-crate-preview_files/ in the root directory.

If present in the root directory of an Attached RO-Crate Package as ro-crate-preview.html (or otherwise served in a Detached RO-Crate Package), the RO-Crate Website MUST:

  • Be a valid HTML 5 document
  • Be useful to users of the RO-Crate - this will vary by community and intended use, but in general the aim is to assist users in reusing data by explaining what it is, how it was created, how it can be used and how to cite it. One simple approach to this is to expose all the metadata in the RO-Crate Metadata Document.

ro-crate-preview.html SHOULD:

  • Display at least the metadata relating to the Root Data Entity as static HTML without the need for scripting. It MAY contain extra features enabled by JavaScript.
  • When a Data Entity or Contextual Entity is referenced by its ID:
    • If it has a name property, provide a link to its HTML version.
    • If it does not have a name (e.g. a GeoCoordinates location), show it embedded in the HTML for the entity.
    • For external URI values, provide a link.
  • For keys that resolve in the RO-Crate JSON-LD Context to a URI, indicate this (the simplest way is to link the key to its definition).
  • If there are additional resources necessary to render the preview (e.g. CSS, JSON, HTML), link to them in a subdirectory ro-crate-preview_files/

The RO-Crate Website is not considered a part of the RO-Crate payload in an Attached RO-Crate Package, but serves as a way to make metadata available in a user-appropriate format. The ro-crate-preview.html file and the ro-crate-preview-files/ directory and any contents SHOULD NOT be included in the hasPart property of the Root Dataset or any other Dataset entity within an RO-Crate.

Metadata about parts of the RO-Crate Website MAY be included in an RO-Crate as in the following example. Metadata such as an author property, dateCreated or other provenance can be included, including details about the software that created them, as described in Software used to create files.

{
      "@id": "ro-crate-preview.html",
      "@type": "CreativeWork",
      "about": {"@id": "./"}
}

{
      "@id": "https://www.npmjs.com/package/ro-crate-html-js",
      "@type": "SoftwareApplication",
      "url": "https://www.npmjs.com/package/ro-crate-html-js",
      "name": "ro-crate-html-js",
      "version": "1.4.19"
}

{
      "@id": "#ro-crate-preview-generation",
      "@type": "CreateAction",
      "name": "Create HTML summary",
      "endTime": "2022-10-019T17:01:07+10:00",
      "instrument": {
        "@id": "https://www.npmjs.com/package/ro-crate-html-js"
      },
      "object": {
        "@id": "ro-crate-metadata.json"
      },
      "result": {
        "@id": "ro-crate-preview.html"
      }
}

Self-describing and self-contained

Attached RO-Crates Packages SHOULD be self-describing and self-contained.

A minimal Attached RO-Crate Package is a directory containing a single RO-Crate Metadata Document stored as an RO-Crate Metadata File ro-crate-metadata.json.

At the basic level, an Attached RO-Crate Package is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata Document describes the RO-Crate, and MUST be stored in the RO-Crate Root.

While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based.

It is important to note that the RO-Crate Metadata Document is not necessarily an exhaustive manifest or inventory, that is, it is not required to list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects.

The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.

Detached RO-Crate Package

A Detached RO-Crate Package is an RO-Crate, defined in an RO-Crate Metadata Document without a defined root directory, where the RO-Crate Metadata Document content is accessed independently (e.g. as part of a programmatic API).

Unlike an Attached RO-Crate Package, a Detached RO-Crate Package is not processed in a file-system context and thus does not carry a data payload in the same sense, but may reference data deposited separately, or purely reference contextual entities.

In a Detached RO-Crate Package the root data entity SHOULD have an @id which is an absolute URL if it is available online. If it is not yet, or will never be available online then @id MAY be any valid URI - including ./.

Any data entities in a Detached RO-Crate Package Package MUST be Web-based Data Entities.

A Detached RO-Crate Package may still use #-based local identifiers for contextual entities.

The concept of an RO-Crate Website is undefined for a Detached RO-Crate Package.

To distribute a Detached RO-Crate Package and optionally to provide an RO-Crate Website, any Detached RO-Crate Package can be saved in a directory (and zipped or otherwise bundled) and will function as an Attached RO-Crate Package with no payload data. See the appendix on converting from Detached to Attached RO-Crate Package for further guidance on this.