Skip to content Skip to footer

RO-Crate Structure

Table of contents

  1. RO-Crate Metadata File (ro-crate-metadata.json)
  2. RO-Crate Website (ro-crate-preview.html and ro-crate-preview_files/)
  3. Payload files and directories
  4. Self-describing and self-contained

The structure an RO-Crate MUST follow is:

<RO-Crate root directory>/
|   ro-crate-metadata.json    # RO-Crate Metadata File MUST be present 
|   ro-crate-preview.html     # RO-Crate Website homepage MAY be present
|   ro-crate-preview_files/   # MAY be present
|    | [other RO-Crate Website files]
|   [payload files and directories]  # 0 or more

The name of the RO-Crate root directory is not defined, but a root directory is identifiable by the presence of the RO-Crate Metadata File, ro-crate-metadata.json. For instance, if an RO-Crate is archived in a ZIP-file, the ZIP root directory is an RO-Crate root directory if it contains ro-crate-metadata.json.

Data Entities in the RO-Crate MUST either be payload files/directories present within the RO-Crate root directory or its subdirectories, or be Web-based Data Entities.

RO-Crate Metadata File (ro-crate-metadata.json)

  • In new RO-Crates the RO-Crate Metadata File MUST be named ro-crate-metadata.json and appear in the RO-Crate Root
  • The RO-Crate Metadata File MUST contain RO-Crate JSON-LD; a valid JSON-LD 1.0 document in flattened and compacted form
  • The RO-Crate JSON-LD SHOULD use the RO-Crate JSON-LD Context https://w3id.org/ro/crate/1.1/context by reference.
  • If an RO-Crate conforming to version 1.0 or earlier contains a file named ro-crate-metadata.jsonld instead of ro-crate-metadata.json then processing software should treat this as the RO-Crate Metadata File. If the crate is updated then the file SHOULD be renamed to ro-crate-metadata.json and the RO-Crate Metadata File Descriptor SHOULD be updated to reference it, with an up to date conformsTo property naming an appropriate version of this specification.

JSON-LD is a structured form of JSON that can represent a Linked Data graph.

A valid RO-Crate JSON-LD graph MUST describe:

  1. The RO-Crate Metadata File Descriptor
  2. The Root Data Entity
  3. Zero or more Data Entities
  4. Zero or more Contextual Entities

It is RECOMMENDED that any referenced contextual entities are also described in the RO-Crate Metadata File with the same identifier. Similarly it is RECOMMENDED that any contextual entity in the RO-Crate Metadata file is linked to from at least one of the other entities using the same identifier.

The appendix RO-Crate JSON-LD details the general structure of the JSON-LD that is expected in the RO-Crate Metadata File. In short, the rest of this specification describe the different types of entities that can be added as {} objects to the RO-Crate JSON-LD @graph array below:

{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [

  ]
}

RO-Crate Website (ro-crate-preview.html and ro-crate-preview_files/)

In addition to the machine-oriented RO-Crate Metadata File, the RO-Crate MAY include a human-readable HTML rendering of the same information, known as the RO-Crate Website.

If present in the root directory, ro-crate-preview.html MUST:

  • Be a valid HTML 5 document
  • Be useful to users of the RO-Crate - this will vary by community and intended use, but in general the aim to assist users in reusing data by explaining what it is, how it was created how it can be used and how to cite it. One simple approach to this is to expose all the metadata in the RO-Crate Metadata File.
  • Contain a copy of the RO-Crate JSON-LD in a script element of the head element of the HTML, for example:
      <script type="application/ld+json">
      {
          "@context": "https://w3id.org/ro/crate/1.1/context",
          "@graph": [ ...]
      }
      </script>
    

ro-crate-preview.html SHOULD:

  • Display at least the metadata relating to the Root Data Enity as static HTML without the need for scripting. It MAY contain extra features enabled by JavaScript.
  • When a Data Entity or Contextual Entity is referenced by its ID:
    • If it has a name property, provide a link to its HTML version.
    • If it does not have a name (e.g. a GeoCoordinates location), show it embedded in the HTML for the entity.
    • For external URI values, provide a link.
  • For keys that resolve in the RO-Crate JSON-LD Context to a URI, indicate this (the simplest way is to link the key to its definition).
  • If there is sufficient metadata, contain a prominent “Cite-as” text with a natural language data citation (see for example the FORCE11 Data Citation Principles).
  • If there are additional resources necessary to render the preview (e.g. CSS, JSON, HTML), link to them in a subdirectory ro-crate-preview-files/

Payload files and directories

These are the actual files and directories that make up the dataset being described.

The base RO-Crate specification makes no assumptions about the presence of any specific files or folders beyond the reserved RO-Crate files described above. Payload files may appear directly in the RO-Crate Root alongside the RO-Crate Metadata File, and/or appear in sub-directories of the RO-Crate Root. Each file and directory MAY be represented as Data Entities in the RO-Crate Metadata File.

Self-describing and self-contained

RO-Crates SHOULD be self-describing and self-contained.

A minimal RO-Crate is a directory containing a single RO-Crate Metadata File ro-crate-metadata.json.

At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root.

While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based.

It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects.

The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.