RO-Crate Structure
Table of contents
- Types of RO-Crate
- Attached RO-Crate
- Detached RO-Crate
- RO-Crate Metadata Document (
ro-crate-metadata.json
) - RO-Crate Website (
ro-crate-preview.html
andro-crate-preview_files/
) - Payload files and directories (Attached RO-Crates)
- Self-describing and self-contained (Attached RO-Crates)
Types of RO-Crate
There are two classes of RO-Crate detailed below:
- Attached RO-Crate
- A crate that is present in a file-system context which has an RO-Crate Root directory and can carry an explicit payload of local data entities as regular files (combined with Web-based Data Entities where needed). This type of RO-Crate can be suitable for packaging a data payload for long-term preservation, transfer and publishing, as the RO-Crate Metadata Document is stored in an RO-Crate Metadata File alongside the crate’s payload. See further definition of attached RO-Crate below.
- Detached RO-Crate
- A crate without a defined payload directory. In this kind of crate, all data references are absolute. This approach may be suitable for use with dynamic web service APIs and repositories that can’t preserve file paths. As the data of these crates can only be Web-based Data Entities, the payload is implicit and must be preserved/transferred/archived independent of the RO-Crate Metadata Document. See further definition of detached RO-Crate below.
In both types of crates the metadata is completed with contextual entities that further describe the relationships and context of the data to form a Research Object.
Attached RO-Crate
An Attached RO-Crate is used to contain and describe a payload of files and directories, among with their contextual information.
An Attached RO-Crate can be stored and published in multiple ways depending on its use:
- On a typical hierarchical file system (e.g.
/files/shared/crates/my-crate-01/
) - Exposed as a Web resource within a folder structure (e.g. https://www.researchobject.org/2021-packaging-research-artefacts-with-ro-crate/)
- Packaged within a ZIP file, BagIt archive or OCFL structure
- Archived as a set of named files in other ways (e.g. Zenodo deposit)
A valuable feature of the Attached RO-Crate approach is that the metadata is preserved when a crate is transferred between these types of storage/publication systems.
The file path structure a Attached RO-Crate MUST follow is:
<RO-Crate root directory>/
| ro-crate-metadata.json # RO-Crate Metadata File containing the RO-Crate Metadata Document MUST be present
| ro-crate-preview.html # RO-Crate Website homepage MAY be present
| ro-crate-preview_files/ # MAY be present
| | [other RO-Crate Website files]
| [payload files and directories] # 0 or more
The name of the RO-Crate root directory is not defined, but a root directory is identifiable by the presence of the RO-Crate Metadata File, ro-crate-metadata.json
. For instance, if an RO-Crate is archived in a ZIP-file, the ZIP root directory is an RO-Crate root directory if it contains ro-crate-metadata.json
.
The payload directory (and its child directory) contains files and directories that SHOULD be described within the RO-Crate Metadata File as Data Entities. Additional Web-based Data Entities MAY also be described, but are not considered part of the payload.
Detached RO-Crate
A Detached RO-Crate is an RO-Crate without a defined root directory, where the RO-Crate Metadata Document and/or RO-Crate Website content is accessed independently (e.g. as part of a programmatic API).
These crates cannot carry their own data payload, but may reference data deposited separately, or purely reference contextual entities.
Any data entities in a Detached RO-Crate MUST be Web-based Data Entities.
Using relative URI references like
example/data.txt
in a Detached RO-Crate is NOT RECOMMENDED as this is considered ambigious and fragile.
A Detached RO-Crate can be identified by the root data entity having an @id
different from ./
in the JSON.
Finding the Root Data Entity can be harder for consumers of detached crates, particularly if the platform serving the RO-Crate Metadata Document is unable to ensure the URI path ends with
…/ro-crate-metadata.json
.
Note that a detached RO-Crate may still use #
-based local identifiers for contextual entities.
RO-Crate Metadata Document (ro-crate-metadata.json
)
- In an Attached RO-Crate the RO-Crate Metadata Document MUST be present, named
ro-crate-metadata.json
and appear in the RO-Crate Root- If an RO-Crate conforming to version 1.0 or earlier contains a file named
ro-crate-metadata.jsonld
but notro-crate-metadata.json
, then processing software should treat this as the RO-Crate Metadata File. If the crate is updated, the file SHOULD be renamed toro-crate-metadata.json
and the RO-Crate Metadata Descriptor SHOULD be updated to reference it, with an up to date conformsTo property naming an appropriate version of this specification.
- If an RO-Crate conforming to version 1.0 or earlier contains a file named
- In a Detached RO-Crate the RO-Crate Metadata Document is a JSON-LD document served over an API or loaded via other means.
- The RO-Crate Metadata Document MUST contain RO-Crate JSON-LD; a valid JSON-LD 1.0 document in flattened and compacted form
- The RO-Crate JSON-LD SHOULD use the RO-Crate JSON-LD Context https://w3id.org/ro/crate/1.2-DRAFT/context by reference.
JSON-LD is a structured form of JSON that can represent a Linked Data graph.
A valid RO-Crate JSON-LD graph MUST describe:
- The RO-Crate Metadata Descriptor
- The Root Data Entity
- Zero or more Data Entities
- Zero or more Contextual Entities
It is RECOMMENDED that any referenced contextual entities are also described in the RO-Crate Metadata Document with the same identifier. Similarly it is RECOMMENDED that any contextual entity in the RO-Crate Metadata Document is linked to from at least one of the other entities using the same identifier.
The appendix RO-Crate JSON-LD details the general structure of the JSON-LD that is expected in the RO-Crate Metadata Document. In short, the rest of this specification describe the different types of entities that can be added as {}
objects to the RO-Crate JSON-LD @graph
array below:
{ "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
"@graph": [
]
}
RO-Crate Website (ro-crate-preview.html
and ro-crate-preview_files/
)
In addition to the machine-oriented RO-Crate Metadata Document, the RO-Crate MAY include a human-readable HTML rendering of the same information, known as the RO-Crate Website. If present, the RO-Crate Website MUST be a file named ro-crate-preview.html
in the root directory, which MAY serve as the entry point to other web-resources, which MUST be in ro-crate-preview_files/
in the root directory.
If present in the root directory of a Attached RO-Crate as ro-crate-preview.html
, (or otherwise served in a Detached RO-Crate), the RO-Crate Website MUST:
- Be a valid HTML 5 document
- Be useful to users of the RO-Crate - this will vary by community and intended use, but in general the aim to assist users in reusing data by explaining what it is, how it was created how it can be used and how to cite it. One simple approach to this is to expose all the metadata in the RO-Crate Metadata Document.
- Contain a copy of the RO-Crate JSON-LD in a
script
element of thehead
element of the HTML, for example:<script type="application/ld+json"> { "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context", "@graph": [ ...] } </script>
ro-crate-preview.html
SHOULD:
- Display at least the metadata relating to the Root Data Enity as static HTML without the need for scripting. It MAY contain extra features enabled by JavaScript.
- When a Data Entity or Contextual Entity is referenced by its ID:
- If it has a name property, provide a link to its HTML version.
- If it does not have a name (e.g. a GeoCoordinates location), show it embedded in the HTML for the entity.
- For external URI values, provide a link.
- For keys that resolve in the
RO-Crate JSON-LD Context
to a URI, indicate this (the simplest way is to link the key to its definition). - If there are additional resources necessary to render the preview (e.g. CSS, JSON, HTML), link to them in a subdirectory
ro-crate-preview-files/
The RO-Crate Website is not considered a part of the RO-Crate, but serves as a way to make metadata available in a user-appropriate format. The ro-crate-preview.html
file and the ro-crate-preview-files/
directory and any contents SHOULD NOT be included in the hasPart
property of the Root Dataset or any other Dataset
entity within an RO-Crate.
Metadata about parts of the RO-Crate Website MAY be included in an RO-Crate as in the following example. Metadata such as an author
property, dateCreated
or other provenance can be included, including details about the software that created them, as described in Software used to create files).
{
"@id": "ro-crate-preview.html",
"@type": "CreativeWork",
"about": {"@id": "./"}
}
{
"@id": "https://www.npmjs.com/package/ro-crate-html-js",
"@type": "SoftwareApplication",
"url": "https://www.npmjs.com/package/ro-crate-html-js",
"name": "ro-crate-html-js",
"version": "1.4.19"
}
{
"@id": "#ro-crate-preview-generation",
"@type": "CreateAction",
"name": "Create HTML summary",
"endTime": "2022-10-019T17:01:07+10:00",
"instrument": {
"@id": "https://www.npmjs.com/package/ro-crate-html-js"
},
"object": {
"@id": "ro-crate-metadata.json"
},
"result": {
"@id": "ro-crate-preview.html"
}
}
In a Detached RO-Crate it is undefined how to find the RO-Crate Website from the RO-Crate Metadata Document or vice versa; it is RECOMMENDED to describe both as contextual entities.
Payload files and directories (Attached RO-Crates)
These are the actual files and directories that make up the payload of the dataset being described in a Attached RO-Crate.
The base RO-Crate specification makes no assumptions about the presence of any specific files or folders beyond the reserved RO-Crate files described above.
Payload files may appear directly in the RO-Crate Root alongside the RO-Crate Metadata File, and/or appear in sub-directories of the RO-Crate Root. Each file and directory MAY be represented as Data Entities in the RO-Crate Metadata File.
A RO-Crate may also contain Web-based Data Entities that are not present as part of the payload and referenced using absolute URIs. These may require additional preservation measures.
A RO-Crate packaged with BagIt may be referencing external files which are not present in the RO-Crate Root hierarchy until the BagIt has been completed. This method can be used for files that are large, require authentication or otherwise inconvenient to transfer with the RO-Crate, but which should nevertheless still be considered part of the payload.
Self-describing and self-contained (Attached RO-Crates)
RO-Crates SHOULD be self-describing and self-contained.
A minimal Attached RO-Crate is a directory containing a single RO-Crate Metadata Document stored as an RO-Crate Metadata File ro-crate-metadata.json
.
At the basic level, an Attached RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata Document describes the RO-Crate, and MUST be stored in the RO-Crate Root.
While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based.
It is important to note that the RO-Crate Metadata Document is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects.
The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.