APPENDIX: Handling relative URI references
Table of contents
- Converting from Attached to Detached RO-Crate Package
- Converting from Detached to Attached RO-Crate Package
- Handling relative URI references when using JSON-LD/RDF tools
- Flattening JSON-LD from nested JSON
- Expanding/parsing JSON-LD keeping relative referencing
- Establishing absolute URI for RO-Crate Root
- Finding RO-Crate Root in RDF triple stores
- Parsing as RDF with a different RO-Crate Root
- Establishing a base URI inside a ZIP file
- Relativizing absolute URIs within RO-Crate Root
In an Attached RO-Crate Package, the RO-Crate Metadata File use relative URI references to identify files and directories contained within the RO-Crate Root and its children. As described in section Describing entities in JSON-LD, relative URI references are also frequently used for identifying Contextual entities.
Converting from Attached to Detached RO-Crate Package
An Attached RO-Crate Package can be published on the Web by placing its RO-Crate Root directory on a static file-based Web server (e.g. Nginx, Apache HTTPd, GitHub Pages). The use of relative URI references in the RO-Crate Metadata File ensures identifiers of data entities work as they should.
Sometimes it is desired to make a Detached RO-Crate Package, e.g. for depositing or integrating the RO-Crate Metadata File into a knowledge graph or repository that is unable to preserve data files using their existing pathnames. In this case one needs to:
- Decide on new Web locations for individual data files and update their absolute URI in
@id
- Observe the preservation considerations for Web-based Data Entities
- Ensure all nested directories not browsable on the Web are represented as
Dataset
with its content listed withhasPart
ordistribution
(see section Directories on the web). Change their relative@id
to become absolute, e.g. using ARCP. - Rewrite the JSON-LD with absolute URIs for data entities
If the RO-Crate is already published on the Web, with directory browsing enabled for nested directories, then these steps can be achieved using JSON-LD tooling.
For example, as the RO-Crate Metadata file https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json along with the RO-Crate Root is published on the Web (using GitHub Pages), we can generate a random UUID (e.g. d6be5c9b-132a-4a93-9837-3e02e06c08e6
) and use JSON-LD flattening
from this context:
{ "@context": [
{"@base": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json"},
"https://w3id.org/ro/crate/1.1/context"
]
}
to this context:
{ "@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
]
}
None of the existing resources will have a @id
starting with this fresh base URI, therefore all URIs will be made absolute. The resulting {@base: ..}
is harmless, but can be removed from the output JSON-LD.
Example output (abbreviated):
{
"@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
"creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
"@type": "Dataset",
"hasPart": [
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
],
"name": "Workflow RO-Crate profile"
},
...
]
}
Notice how identifiers like ro-crate-metadata.json
, ./
, index.html
and example/
have been translated to absolute URIs.
The above JSON-LD processing will also expand any #
-based local identifiers of contextual entities:
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
"@type": "Recommendation",
"category": "MUST",
"name": "Include Main Workflow",
"itemReviewed": {
"@id": "https://bioschemas.org/ComputationalWorkflow"
}
}
In this approach, the Detached RO-Crate Package can be resolved to the corresponding Attached RO-Crate Package by following the @id
of the Root Data Set or the Root Metadata File entity.
If the new Detached RO-Crate Package is not meant as a snapshot of the corresponding Attached RO-Crate Package, then such contextual entities should be assigned new @id
, e.g. by generating random UUIDs like urn:uuid:e47e41d9-f924-4c07-bc90-97e7ed34fe35
. Such tranformations are typically not catered for by traditional JSON-LD tooling and require additional implementation.
Converting from Detached to Attached RO-Crate Package
Converting a Detached Crate to an Attached Crate can mean multiple things depending on intentions, and may imply an elaborate process.
First, check if the Root Data Entity already have a distribution download listed, in which case that can be retrieved as the corresponding Attached Crate.
To archive a snapshot of an Detached Crate’s metadata, keeping all data entities web-based:
- Crate a new folder as the RO-Crate Root, save the RO-Crate Metadata Document as the RO-Crate Metadata File according to Attached RO-Crate Package structure
- Copy the absolute
@id
to become anidentifier
according to recommendations for Root Data Entity identifier - Optional: Change the
@id
of the Root Data Entity to./
and update all references to it, including from the Metadata Descriptor
If the new Attached Crate is intended as a fork that will evolve independently of the Detached Crate, then:
- Delete the
identifier
, add the previous@id
asisBasedOn
- Delete/update
datePublished
andpublisher
- Add yourself as
author
orcontributor
to the Root Data Entity - Add records of changes to the Crate
To additionally save Web-based Data entities to become part of the Detached Crate, a possible algorithm is:
- For each data entity which
@type
includeDataset
:- If it has a distribution download, retrieve and unpack that according to its
encodingFormat
, using its new folder name as the new local path name. - If not, create a corresponding folder in the RO-Crate Root, possibly generating the local path name based on
name
or path elements of@id
URI - Replace the
@id
of the dataset and all its references with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary. - Recurse this algorithm to process each data entity from this dataset’s
hasPart
- If it has a distribution download, retrieve and unpack that according to its
- For each data entity which
@type
includeFile
:- Decide based on
@id
URI elements,contentSize
encodingFormat
and (possibly implied)licence
if this file is acceptable to archive - Retrieve the file and check the
contentSize
matches, if specified - Store the file.
- If the file has a
localPath
property, use that relative to the RO-Crate Root. - If not, calculate a path from the folder of the first
Dataset
that directly has this data entity as itshasPart
.
- If the file has a
- Add the previous
@id
downloaded from ascontentUrl
according to Data entities in an Attached RO-Crate that are also on the Web - Replace the
@id
of theFile
with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.
- Decide based on
As this procedure can be error-prone (e.g. a Web-based entity may not be accessible or may require authentication), the implementation should consider the new Attached RO-Crate Pacakge as a fork and update identifier
and isDefinedBy
as specified above.
Handling relative URI references when using JSON-LD/RDF tools
When using JSON-LD tooling and RDF libraries to consume or generate RO-Crates, extra care should be taken to ensure these URI references are handled correctly.
For this, a couple of scenarios are sketched below with recommendations for consistent handling:
Flattening JSON-LD from nested JSON
If performing JSON-LD flattening to generate a valid RO-Crate Metadata File for a Attached RO-Crate Package, add @base: null
to the input JSON-LD @context
array to avoid expanding relative URI references. The flattening @context
SHOULD NOT need @base: null
.
Example, this JSON-LD is in compacted form which may be beneficial for processing, but is not yet valid RO-Crate Metadata File as it has not been flattened into a @graph
array.
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context"
],
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"description": "RO-Crate Metadata Descriptor (this file)",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"},
"about": {
"@id": "./",
"@type": "Dataset",
"name": "Example RO-Crate",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{ "@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities",
},
{ "@id": "subfolder/",
"@type": "Dataset"
}
]
}
}
Performing JSON-LD flattening with:
{ "@context":
"https://w3id.org/ro/crate/1.2-DRAFT/context"
}
Results in a valid RO-Crate JSON-LD (actual order in @graph
may differ):
{
"@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata Descriptor (this file)"
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
{@base: null}
in its @context
.Expanding/parsing JSON-LD keeping relative referencing
JSON-LD Expansion can be used to
resolve terms from the @context
to absolute URIs, e.g. http://schema.org/description
. This may be needed to parse extended properties or for combinations with other Linked Data.
This algorithm would normally also expand @id
fields based on the current base URI of the RO-Crate Metadata File, but this may be a temporary location like file:///tmp/rocrate54/ro-crate-metadata.json
, meaning @id
: subfolder/
becomes file:///tmp/rocrate54/subfolder/
after JSON-LD expansion.
To avoid absoluting local identifiers, before expanding, augment the JSON-LD @context
to ensure it is an array that includes {"@base": null}
.
For example, expanding this JSON-LD:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": null}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata Descriptor (this file)"
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
}
]
}
Results in a expanded form without @context
, using absolute URIs for properties and types, but retains relative URI references for entities within the RO-Crate Root:
[
{
"@id": "ro-crate-metadata.json",
"@type": [
"http://schema.org/CreativeWork"
],
"http://schema.org/about": [
{
"@id": "./"
}
],
"http://purl.org/dc/terms/conformsTo": [
{
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
}
],
"http://schema.org/description": [
{
"@value": "RO-Crate Metadata Descriptor (this file)"
}
]
},
{
"@id": "./",
"@type": [
"http://schema.org/Dataset"
],
"http://schema.org/description": [
{
"@value": "The RO-Crate Root Data Entity"
}
],
"http://schema.org/hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"http://schema.org/name": [
{
"@value": "Example RO-Crate"
}
]
}
]
@base: null
will not relativize existing absolute URIs that happen to be contained by the RO-Crate Root (see section Relativizing absolute URIs within RO-Crate Root).@base
as detailed in sections below.Establishing absolute URI for RO-Crate Root
When loading RO-Crate JSON-LD as RDF, or combining the crate’s Linked Data into a larger JSON-LD, it is important to ensure correct base URI to resolve URI references that are relative to the RO-Crate Root.
For instance, consider this HTTP redirection from a permalink (simplified):
GET https://w3id.org/ro/crate/1.0/crate HTTP/1.1
HTTP/1.1 301 Moved Permanently
Location: https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
GET https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/ld+json
{
"@context": "https://w3id.org/ro/crate/1.0/context",
"@graph": [
{
"@id": "ro-crate-metadata.jsonld",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.0"
},
"about": {
"@id": "./"
},
"license": {
"@id": "https://creativecommons.org/publicdomain/zero/1.0/"
}
},
{
"@id": "./",
"@type": "Dataset",
"hasPart": [
{
"@id": "index.html"
}
}
]
}
Following redirection we see that:
- Base URI of the RO-Crate Metadata File becomes
https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
- The absolute URI for
index.html
resolves tohttps://www.researchobject.org/ro-crate/1.0/index.html
- ..rather than
https://w3id.org/ro/crate/1.0/index.html
which would not redirect correctly
- ..rather than
This example also use RO-Crate 1.0, where the RO-Crate Metadata File is called ro-crate-metadata.jsonld
instead of ro-crate-metadata.json
. Note that the recommended
algorithm to find the Root Data Entity
is agnostic to the actual filename.
Finding RO-Crate Root in RDF triple stores
When parsing RO-Crate JSON-LD as RDF, where the RDF framework performs resolution to absolute URIs, it may be difficult to find the RO-Crate Root in the parsed triples.
The algorithm proposed in section Root Data Entity allows finding the RDF resource describing ro-crate-metadata.json
, independent of its parsed base URI. We can adopt this for RDF triples, thus finding crates conforming to this specification can be queried with SPARQL:
PREFIX schema: <http://schema.org/>
SELECT ?crate ?metadatafile
WHERE {
?crate a schema:Dataset .
?metadatafile schema:about ?crate .
filter(contains(str(?metadatafile), "ro-crate-metadata.json"))
}
Parsing as RDF with a different RO-Crate Root
When parsing a RO-Crate Metadata File into RDF triples, for instance uploading it to a graph store like Apache Jena’s Fuseki, it is important to ensure consistent base URI:
- Some RDF stores and RDF formats don’t support relative URI references in triples (see RDF 1.1 note on IRIs)
- The RO-Crate Root may depend on where the RO-Crate Metadata File was parsed from, e.g.
<file:///tmp/ro-crate-metadata.json>
(file) or<http://localhost:3030/test/ro-crate-metadata.json>
(web upload) - Parsing multiple RO-Crates into the same RDF graph, using same base URI, may merge them into the same RO-Crate
ro-crate-metadata.json
may not be recognized as JSON-LD and must be renamed toro-crate-metadata.jsonld
- Web servers hosting
ro-crate-metadata.json
may not send the JSON-LD Content-Type - If base URI is not correct it may be difficult to find the corresponding file and directory paths from an RDF query returning absolute URIs
http
/https
URI of the RO-Crate Metadata File it should calculate the correct base URI as detailed in section Establishing absolute URI for RO-Crate Root and you should not need to override the base URI as detailed here.If a web-based URI for the RO-Crate root is known, then this can be supplied as a base URI. Most RDF tools support a --base
option or similar. If this is not possible, then the @context
of the RO-Crate JSON-LD
can be modified by ensuring the @context
is an array that sets the desired @base
:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": "http://example.com/crate255/"}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": "Dataset",
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
Parsing this will generate triples like below using http://example.com/crate255/
as the RO-Crate Root (shortened):
<http://example.com/crate255/ro-crate-metadata.json>
<http://purl.org/dc/terms/conformsTo>
<https://w3id.org/ro/crate/1.2-DRAFT> .
<http://example.com/crate255/ro-crate-metadata.json>
<http://schema.org/about>
<http://example.com/crate255/> .
<http://example.com/crate255/>
<http://schema.org/name>
"Example RO-Crate" .
<http://example.com/crate255/>
<http://schema.org/hasPart>
<http://example.com/crate255/data1.txt> .
<http://example.com/crate255/>
<http://schema.org/hasPart>
<http://example.com/crate255/subfolder/> .
<http://example.com/crate255/data1.txt>
<http://schema.org/description>
"One of hopefully many Data Entities" .
Generating a RO-Crate JSON-LD from such triples can be done by first finding the RO-Crate Root and then use it as base URI to relativize absolute URIs within RO-Crate Root.
Establishing a base URI inside a ZIP file
An RO-Crate may have been packaged as a ZIP file or similar archive. RO-Crates may exist in a temporary file path which should not determine its identifiers.
When parsing such crates it is recommended to use the Archive and Package (arcp) URI scheme to establish a temporary/location-based UUID or hash-based (SHA256) base URI.
For instance, given a randomly generated UUID b7749d0b-0e47-5fc4-999d-f154abe68065
we can use arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/
as the @base
:
{
"@context": [
"https://w3id.org/ro/crate/1.2-DRAFT/context",
{"@base": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/"}
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "data1.txt",
"@type": "File",
"description": "One of hopefully many Data Entities"
},
{
"@id": "subfolder/",
"@type": "Dataset"
}
]
}
Parsing this as RDF will generate triples including:
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ro-crate-metadata.json>
<http://schema.org/about>
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> .
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/>
<http://schema.org/hasPart>
<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/data1.txt> .
Here consumers can assume /
is the RO-Crate Root and generating relative URIs can safely be achieved by search-replace as the arcp URI is unique. Saving RO-Crate JSON-LD from the triples can be done by using the arcp URI to relativize absolute URIs within RO-Crate Root.
/data/
under the calculated arcp base URI.Relativizing absolute URIs within RO-Crate Root
Some applications may prefer working with absolute URIs, e.g. in a joint graph store or web-based repository, but should relativize URIs within the RO-Crate Root before generating the RO-Crate Metadata File.
Assuming a repository at example.com
has JSON-LD with absolute URIs:
{
"@context": "https://w3id.org/ro/crate/1.2-DRAFT",
"@graph": [
{
"@id": "http://example.com/crate415/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "http://example.com/crate415/"
},
},
{
"@id": "http://example.com/crate415/",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "http://example.com/crate415/data1.txt"
},
{
"@id": "http://example.com/crate415/subfolder/"
}
],
"name": "Example RO-Crate"
}
]
}
Then performing JSON-LD flattening with this @context
:
{ "@context": [
{"@base": "http://example.com/crate415/"},
"https://w3id.org/ro/crate/1.2-DRAFT"
]
}
Will output RO-Crate JSON-LD with relative URIs:
{
"@context": [
{
"@base": "http://example.com/crate415/"
},
"https://w3id.org/ro/crate/1.2-DRAFT"
],
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"@id": "data1.txt"
},
{
"@id": "subfolder/"
}
],
"name": "Example RO-Crate"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2-DRAFT"
},
"about": {
"@id": "./"
}
}
]
}
http://example.com/crate255/other.txt
would become ../create255/other.txt
- this can particularly be a challenge with local file:///
URIs. `