APPENDIX: Handling relative URI references
Table of contents
- Converting from Attached to Detached RO-Crate
- Converting from Detached to Attached RO-Crate
- Handling relative URI references when using JSON-LD/RDF tools
- Flattening JSON-LD from nested JSON
- Expanding/parsing JSON-LD keeping relative referencing
- Establishing absolute URI for RO-Crate Root
- Finding RO-Crate Root in RDF triple stores
- Parsing as RDF with a different RO-Crate Root
- Establishing a base URI inside a ZIP file
- Relativizing absolute URIs within RO-Crate Root
In an Attached RO-Crate, the RO-Crate Metadata File use relative URI references to identify files and directories contained within the RO-Crate Root and its children. As described in section Describing entities in JSON-LD, relative URI references are also frequently used for identifying Contextual entities.
Converting from Attached to Detached RO-Crate
An Attached RO-Crate can be published on the Web by placing its RO-Crate Root directory on a static file-based Web server (e.g. Nginx, Apache HTTPd, GitHub Pages). The use of relative URI references in the RO-Crate Metadata File ensures identifiers of data entities work as they should.
Sometimes it is desired to make a Detached RO-Crate, e.g. for depositing or integrating the RO-Crate Metadata File into a knowledge graph or repository that is unable to preserve data files using their existing pathnames. In this case one needs to:
- Decide on new Web locations for individual data files and update their absolute URI in
@id
- Observe the preservation considerations for Web-based Data Entities
- Ensure all nested directories not browsable on the Web are represented as
Dataset
with its content listed withhasPart
ordistribution
(see section Directories on the web). Change their relative@id
to become absolute, e.g. using ARCP. - Rewrite the JSON-LD with absolute URIs for data entities
If the RO-Crate is already published on the Web, with directory browsing enabled for nested directories, then these steps can be achieved using JSON-LD tooling.
For example, as the RO-Crate Metadata file https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json along with the RO-Crate Root is published on the Web (using GitHub Pages), we can generate a random UUID (e.g. d6be5c9b-132a-4a93-9837-3e02e06c08e6
) and use JSON-LD flattening
from this context:
{ "@context": [
{"@base": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json"},
"https://w3id.org/ro/crate/1.1/context"
]
}
to this context:
{ "@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
]
}
None of the existing resources will have a @id
starting with this fresh base URI, therefore all URIs will be made absolute. The resulting {@base: ..}
is harmless, but can be removed from the output JSON-LD.
Example output (abbreviated):
{
"@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
"creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
"@type": "Dataset",
"hasPart": [
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
],
"name": "Workflow RO-Crate profile"
}
Notice how identifiers like ro-crate-metadata.json
, ./
, index.html
and example/
have been translated to absolute URIs.
The above JSON-LD processing will also expand any #
-based local identifiers of contextual entities:
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
"@type": "Recommendation",
"category": "MUST",
"name": "Include Main Workflow",
"itemReviewed": {
"@id": "https://bioschemas.org/ComputationalWorkflow"
}
}
In this approach, the Detached RO-Crate can be resolved to the corresponding Attached RO-Crate by following the @id
of the Root Data Set or the Root Metadata File entity.
If the new Detached RO-Crate is not meant as a snapshot of the corresponding Attached RO-Crate, then such contextual entities should be assigned new @id
, e.g. by generating random UUIDs like urn:uuid:e47e41d9-f924-4c07-bc90-97e7ed34fe35
. Such tranformations are typically not catered for by traditional JSON-LD tooling and require additional implementation.
Converting from Detached to Attached RO-Crate
Converting a Detached Crate to an Attached Crate can mean multiple things depending on intentions, and may imply an elaborate process.
First, check if the Root Dataset already have a distribution download listed, in which case that can be retrieved as the corresponding Attached Crate.
To archive a snapshot of an Detached Crate’s metadata, keeping all data entities web-based:
- Crate a new folder as the RO-Crate Root, save the RO-Crate Metadata Document as the RO-Crate Metadata File according to Attached RO-Crate structure
- Copy the absolute
@id
to become anidentifier
according to recommendations for Root Data Entity identifier - Change the
@id
of the root dataset to./
and update all references to it, including from the Metadata Descriptor
If the new Attached Crate is intended as a fork that will evolve independently of the Detached Crate, then:
- Delete the
identifier
, add the previous@id
asisBasedOn
- Delete/update
datePublished
andpublisher
- Add yourself as
author
orcontributor
to the Root Dataset - Add records of changes to the Crate
To additionally save Web-based Data entities to become part of the Detached Crate, a possible algorithm is:
- For each data entity which
@type
includeDataset
:- If it has a distribution download, retrieve and unpack that according to its
encodingFormat
, using its new folder name as the new local path name. - If not, create a corresponding folder in the RO-Crate Root, possibly generating the local path name based on
name
or path elements of@id
URI - Replace the
@id
of the dataset and all its references with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary. - Recurse this algorithm to process each data entity from this dataset’s
hasPart
- If it has a distribution download, retrieve and unpack that according to its
- For each data entity which
@type
includeFile
:- Decide based on
@id
URI elements,contentSize
encodingFormat
and (possibly implied)licence
if this file is acceptable to archive - Retrieve the file and check the
contentSize
matches, if specified - Store the file with a file path generated in a way consistent with the
Dataset
s, ideally added to the folder of the firstDataset
that directly has this data entity as itshasPart
- Add the previous
@id
downloaded from ascontentUrl
according to Embedded data entnties that are also on the Web - Replace the
@id
of theFile
with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.
- Decide based on
As this procedure can be error-prone (e.g. a Web-based entity may not be accessible or may require authentication), the implementation should consider the new Attached Crate as a fork and update identifier
and isDefinedBy
as specified above.
If you are archiving an attached RO-Crate that is already on the Web, then first establish the absolute URI for the root, and retrieve all payload files that are considered URI path-wise to be part the RO-Crate Root, creating corresponding local paths. In this scenario the above algorithm can be simplified and the rewriting of identifiers can be avoided if they are already relative URIs.
Handling relative URI references when using JSON-LD/RDF tools
When using JSON-LD tooling and RDF libraries to consume or generate RO-Crates, extra care should be taken to ensure these URI references are handled correctly.
For this, a couple of scenarios are sketched below with recommendations for consistent handling:
Flattening JSON-LD from nested JSON
If performing JSON-LD flattening to generate a valid RO-Crate Metadata File for a Attached RO-Crate, add @base: null
to the input JSON-LD @context
array to avoid expanding relative URI references. The flattening @context
SHOULD NOT need @base: null
.
Example, this JSON-LD is in compacted form which may be beneficial for processing, but is not yet valid RO-Crate Metadata File as it has not been flattened into a @graph
array.
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context"
],
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"description": "RO-Crate Metadata Descriptor (this file)",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"},
"about": {
"@id": "./",
"@type": "Dataset",
"name": "Example RO-Crate",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{ "@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities",
},
{ "@id": "subfolder/",
"@type": "Dataset"
}
]
}
}
Performing JSON-LD flattening with:
{ "@context":
"https://w3id.org/ro/crate/1.2-DRAFT/context"
}
Results in a valid RO-Crate JSON-LD (actual order in @graph
may differ):
{
"@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata Descriptor (this file)"
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
The saved RO-Crate JSON-LD SHOULD NOT include
{@base: null}
in its@context
.
Expanding/parsing JSON-LD keeping relative referencing
JSON-LD Expansion can be used to
resolve terms from the @context
to absolute URIs, e.g. http://schema.org/description
. This may be needed to parse extended properties or for combinations with other Linked Data.
This algorithm would normally also expand @id
fields based on the current base URI of the RO-Crate Metadata File, but this may be a temporary location like file:///tmp/rocrate54/ro-crate-metadata.json
, meaning @id
: subfolder/
becomes file:///tmp/rocrate54/subfolder/
after JSON-LD expansion.
To avoid absoluting local identifiers, before expanding, augment the JSON-LD @context
to ensure it is an array that includes {"@base": null}
.
For example, expanding this JSON-LD:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": null}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata Descriptor (this file)"
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
}
]
}
Results in a expanded form without @context
, using absolute URIs for properties and types, but retains relative URI references for entities within the RO-Crate Root:
[
{
"@id": "ro-crate-metadata.json",
"@type": [
"http://schema.org/CreativeWork"
],
"http://schema.org/about": [
{
"@id": "./"
}
],
"http://purl.org/dc/terms/conformsTo": [
{
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
}
],
"http://schema.org/description": [
{
"@value": "RO-Crate Metadata Descriptor (this file)"
}
]
},
{
"@id": "./",
"@type": [
"http://schema.org/Dataset"
],
"http://schema.org/description": [
{
"@value": "The RO-Crate Root Data Entity"
}
],
"http://schema.org/hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"http://schema.org/name": [
{
"@value": "Example RO-Crate"
}
]
}
]
@base: null
will not relativize existing absolute URIs that happen to be contained by the RO-Crate Root (see section Relativizing absolute URIs within RO-Crate Root).
Most RDF parsers supporting JSON-LD will perform this kind of expansion before generating triples, but not all RDF stores or serializations support relative URI references. Consider using an alternative
@base
as detailed in sections below.
Establishing absolute URI for RO-Crate Root
When loading RO-Crate JSON-LD as RDF, or combining the crate’s Linked Data into a larger JSON-LD, it is important to ensure correct base URI to resolve URI references that are relative to the RO-Crate Root.
When retrieving an RO-Crate over the web, servers might have performed HTTP redirections so that the base URI is different from what was requested. It is RECOMMENDED to follow section Establishing a Base URI of RFC3986 before resolving relative links from the RO-Crate Metadata File.
For instance, consider this HTTP redirection from a permalink (simplified):
GET https://w3id.org/ro/crate/1.0/crate HTTP/1.1
HTTP/1.1 301 Moved Permanently
Location: https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
GET https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/ld+json
{
"@context": "https://w3id.org/ro/crate/1.0/context",
"@graph": [
{
"@id": "ro-crate-metadata.jsonld",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.0"
},
"about": {
"@id": "./"
},
"license": {
"@id": "https://creativecommons.org/publicdomain/zero/1.0/"
}
},
{
"@id": "./",
"@type": "Dataset",
"hasPart": [
{
"@id": "index.html"
}
}
]
}
Following redirection we see that:
- Base URI of the RO-Crate Metadata File becomes
https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
- The absolute URI for
index.html
resolves tohttps://www.researchobject.org/ro-crate/1.0/index.html
- ..rather than
https://w3id.org/ro/crate/1.0/index.html
which would not redirect correctly
- ..rather than
This example also use RO-Crate 1.0, where the RO-Crate Metadata File is called ro-crate-metadata.jsonld
instead of ro-crate-metadata.json
. Note that the recommended
algorithm to find the Root Data Entity
is agnostic to the actual filename.
Finding RO-Crate Root in RDF triple stores
When parsing RO-Crate JSON-LD as RDF, where the RDF framework performs resolution to absolute URIs, it may be difficult to find the RO-Crate Root in the parsed triples.
The algorithm proposed in section Root Data Entity allows finding the RDF resource describing ro-crate-metadata.json
, independent of its parsed base URI. We can adopt this for RDF triples, thus finding crates conforming to this specification can be queried with SPARQL:
PREFIX schema: <http://schema.org/>
SELECT ?crate ?metadatafile
WHERE {
?crate a schema:Dataset .
?metadatafile schema:about ?crate .
filter(contains(str(?metadatafile), "ro-crate-metadata.json"))
}
Parsing as RDF with a different RO-Crate Root
When parsing a RO-Crate Metadata File into RDF triples, for instance uploading it to a graph store like Apache Jena’s Fuseki, it is important to ensure consistent base URI:
- Some RDF stores and RDF formats don’t support relative URI references in triples (see RDF 1.1 note on IRIs)
- The RO-Crate Root may depend on where the RO-Crate Metadata File was parsed from, e.g.
<file:///tmp/ro-crate-metadata.json>
(file) or<http://localhost:3030/test/ro-crate-metadata.json>
(web upload) - Parsing multiple RO-Crates into the same RDF graph, using same base URI, may merge them into the same RO-Crate
ro-crate-metadata.json
may not be recognized as JSON-LD and must be renamed toro-crate-metadata.jsonld
- Web servers hosting
ro-crate-metadata.json
may not send the JSON-LD Content-Type - If base URI is not correct it may be difficult to find the corresponding file and directory paths from an RDF query returning absolute URIs
If the RDF library can parse the RO-Crate JSON-LD directly by retrieving from a
http
/https
URI of the RO-Crate Metadata File it should calculate the correct base URI as detailed in section Establishing absolute URI for RO-Crate Root and you should not need to override the base URI as detailed here.
If a web-based URI for the RO-Crate root is known, then this can be supplied as a base URI. Most RDF tools support a --base
option or similar. If this is not possible, then the @context
of the RO-Crate JSON-LD
can be modified by ensuring the @context
is an array that sets the desired @base
:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": "http://example.com/crate255/"}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": "Dataset",
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
Parsing this will generate triples like below using http://example.com/crate255/
as the RO-Crate Root (shortened):
<http://example.com/crate255/ro-crate-metadata.json>
<http://purl.org/dc/terms/conformsTo>
<https://w3id.org/ro/crate/1.2-DRAFT> .
<http://example.com/crate255/ro-crate-metadata.json>
<http://schema.org/about>
<http://example.com/crate255/> .
<http://example.com/crate255/>
<http://schema.org/name>
"Example RO-Crate" .
<http://example.com/crate255/>
<http://schema.org/hasPart>
<http://example.com/crate255/data1.txt> .
<http://example.com/crate255/>
<http://schema.org/hasPart>
<http://example.com/crate255/subfolder/> .
<http://example.com/crate255/data1.txt>
<http://schema.org/description>
"One of hopefully many Data Entities" .
Generating a RO-Crate JSON-LD from such triples can be done by first finding the RO-Crate Root and then use it as base URI to relativize absolute URIs within RO-Crate Root.
Establishing a base URI inside a ZIP file
An RO-Crate may have been packaged as a ZIP file or similar archive. RO-Crates may exist in a temporary file path which should not determine its identifiers.
When parsing such crates it is recommended to use the Archive and Package (arcp) URI scheme to establish a temporary/location-based UUID or hash-based (SHA256) base URI.
For instance, given a randomly generated UUID b7749d0b-0e47-5fc4-999d-f154abe68065
we can use arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/
as the @base
:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/"}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
Parsing this as RDF will generate triples including:
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ro-crate-metadata.json>
<http://schema.org/about>
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> .
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/>
<http://schema.org/hasPart>
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/data1.txt> .
Here consumers can assume /
is the RO-Crate Root and generating relative URIs can safely be achieved by search-replace as the arcp URI is unique. Saving RO-Crate JSON-LD from the triples can be done by using the arcp URI to relativize absolute URIs within RO-Crate Root.
Bagit: The arcp specification suggests how BagIt identifiers can be used to calculate the base URI. See also section Combining with other packaging schemes - note that in this approach the RO-Crate Root will be the payload folder
/data/
under the calculated arcp base URI.
Relativizing absolute URIs within RO-Crate Root
Some applications may prefer working with absolute URIs, e.g. in a joint graph store or web-based repository, but should relativize URIs within the RO-Crate Root before generating the RO-Crate Metadata File.
Assuming a repository at example.com
has JSON-LD with absolute URIs:
{
"@context": "https://w3id.org/ro/crate/1.2-DRAFT",
"@graph": [
{
"@id": "http://example.com/crate415/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "http://example.com/crate415/"
},
},
{
"@id": "http://example.com/crate415/",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "http://example.com/crate415/data1.txt"
},
{
"@id": "http://example.com/crate415/subfolder/"
}
],
"name": "Example RO-Crate"
}
]
}
Then performing JSON-LD flattening with this @context
:
{ "@context": [
{"@base": "http://example.com/crate415/"},
"https://w3id.org/ro/crate/1.2-DRAFT"
]
}
Will output RO-Crate JSON-LD with relative URIs:
{
"@context": [
{
"@base": "http://example.com/crate415/"
},
"https://w3id.org/ro/crate/1.2-DRAFT"
],
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
}
]
}
This method would also relativize URIs outside the RO-Crate Root that are on the same host, e.g.
http://example.com/crate255/other.txt
would become../create255/other.txt
- this can particularly be a challenge with localfile:///
URIs. `