Skip to content Skip to footer

APPENDIX: Handling relative URI references

Table of contents

  1. Converting from Attached to Detached RO-Crate Package
  2. Converting from Detached to Attached RO-Crate Package
  3. Handling relative URI references when using JSON-LD/RDF tools
  4. Flattening JSON-LD from nested JSON
  5. Expanding/parsing JSON-LD keeping relative referencing
  6. Establishing absolute URI for RO-Crate Root
  7. Finding RO-Crate Root in RDF triple stores
  8. Parsing as RDF with a different RO-Crate Root
  9. Establishing a base URI inside a ZIP file
  10. Relativizing absolute URIs within RO-Crate Root

In an Attached RO-Crate Package, the RO-Crate Metadata File use relative URI references to identify files and directories contained within the RO-Crate Root and its children. As described in section Describing entities in JSON-LD, relative URI references are also frequently used for identifying Contextual entities.

Converting from Attached to Detached RO-Crate Package

An Attached RO-Crate Package can be published on the Web by placing its RO-Crate Root directory on a static file-based Web server (e.g. Nginx, Apache HTTPd, GitHub Pages). The use of relative URI references in the RO-Crate Metadata File ensures identifiers of data entities work as they should.

Sometimes it is desired to make a Detached RO-Crate Package, e.g. for depositing or integrating the RO-Crate Metadata File into a knowledge graph or repository that is unable to preserve data files using their existing pathnames. In this case one needs to:

  1. Decide on new Web locations for individual data files and update their absolute URI in @id
  2. Observe the preservation considerations for Web-based Data Entities
  3. Ensure all nested directories not browsable on the Web are represented as Dataset with its content listed with hasPart or distribution (see section Directories on the web). Change their relative @id to become absolute, e.g. using ARCP.
  4. Rewrite the JSON-LD with absolute URIs for data entities

If the RO-Crate is already published on the Web, with directory browsing enabled for nested directories, then these steps can be achieved using JSON-LD tooling.

For example, as the RO-Crate Metadata file https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json along with the RO-Crate Root is published on the Web (using GitHub Pages), we can generate a random UUID (e.g. d6be5c9b-132a-4a93-9837-3e02e06c08e6) and use JSON-LD flattening from this context:

{ "@context": [
   {"@base": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json"},
   "https://w3id.org/ro/crate/1.1/context"
  ]
}

to this context:

{ "@context": [
   {"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
   "https://w3id.org/ro/crate/1.1/context"
  ]  
}

None of the existing resources will have a @id starting with this fresh base URI, therefore all URIs will be made absolute. The resulting {@base: ..} is harmless, but can be removed from the output JSON-LD.

Example output (abbreviated):

{
  "@context": [
    {"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
    "https://w3id.org/ro/crate/1.1/context"
  ],
  "@graph": [
    {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
      "creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
    },
    {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
      "@type": "Dataset",
      "hasPart": [
        { "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
        { "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
      ],
      "name": "Workflow RO-Crate profile"
    },
    ...
  ]
}

Notice how identifiers like ro-crate-metadata.json, ./, index.html and example/ have been translated to absolute URIs.

The above JSON-LD processing will also expand any #-based local identifiers of contextual entities:

  {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
      "@type": "Recommendation",
      "category": "MUST",
      "name": "Include Main Workflow",
      "itemReviewed": {
        "@id": "https://bioschemas.org/ComputationalWorkflow"
      }
    }

In this approach, the Detached RO-Crate Package can be resolved to the corresponding Attached RO-Crate Package by following the @id of the Root Data Set or the Root Metadata File entity.

If the new Detached RO-Crate Package is not meant as a snapshot of the corresponding Attached RO-Crate Package, then such contextual entities should be assigned new @id, e.g. by generating random UUIDs like urn:uuid:e47e41d9-f924-4c07-bc90-97e7ed34fe35. Such tranformations are typically not catered for by traditional JSON-LD tooling and require additional implementation.

Converting from Detached to Attached RO-Crate Package

Converting a Detached Crate to an Attached Crate can mean multiple things depending on intentions, and may imply an elaborate process.

First, check if the Root Data Entity already have a distribution download listed, in which case that can be retrieved as the corresponding Attached Crate.

To archive a snapshot of an Detached Crate’s metadata, keeping all data entities web-based:

  • Crate a new folder as the RO-Crate Root, save the RO-Crate Metadata Document as the RO-Crate Metadata File according to Attached RO-Crate Package structure
  • Copy the absolute @id to become an identifier according to recommendations for Root Data Entity identifier
  • Optional: Change the @id of the Root Data Entity to ./ and update all references to it, including from the Metadata Descriptor

If the new Attached Crate is intended as a fork that will evolve independently of the Detached Crate, then:

  • Delete the identifier, add the previous @id as isBasedOn
  • Delete/update datePublished and publisher
  • Add yourself as author or contributor to the Root Data Entity
  • Add records of changes to the Crate

To additionally save Web-based Data entities to become part of the Detached Crate, a possible algorithm is:

  • For each data entity which @type include Dataset:
    • If it has a distribution download, retrieve and unpack that according to its encodingFormat, using its new folder name as the new local path name.
    • If not, create a corresponding folder in the RO-Crate Root, possibly generating the local path name based on name or path elements of @id URI
    • Replace the @id of the dataset and all its references with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.
    • Recurse this algorithm to process each data entity from this dataset’s hasPart
  • For each data entity which @type include File:
    • Decide based on @id URI elements, contentSize encodingFormat and (possibly implied) licence if this file is acceptable to archive
    • Retrieve the file and check the contentSize matches, if specified
    • Store the file.
      • If the file has a localPath property, use that relative to the RO-Crate Root.
      • If not, calculate a path from the folder of the first Dataset that directly has this data entity as its hasPart.
    • Add the previous @id downloaded from as contentUrl according to Data entities in an Attached RO-Crate that are also on the Web
    • Replace the @id of the File with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.

As this procedure can be error-prone (e.g. a Web-based entity may not be accessible or may require authentication), the implementation should consider the new Attached RO-Crate Pacakge as a fork and update identifier and isDefinedBy as specified above.

Handling relative URI references when using JSON-LD/RDF tools

When using JSON-LD tooling and RDF libraries to consume or generate RO-Crates, extra care should be taken to ensure these URI references are handled correctly.

For this, a couple of scenarios are sketched below with recommendations for consistent handling:

Flattening JSON-LD from nested JSON

If performing JSON-LD flattening to generate a valid RO-Crate Metadata File for a Attached RO-Crate Package, add @base: null to the input JSON-LD @context array to avoid expanding relative URI references. The flattening @context SHOULD NOT need @base: null.

Example, this JSON-LD is in compacted form which may be beneficial for processing, but is not yet valid RO-Crate Metadata File as it has not been flattened into a @graph array.

{ 
  "@context": [
      "https://w3id.org/ro/crate/1.2-DRAFT/context"
  ],
  "@id": "ro-crate-metadata.json",
  "@type": "CreativeWork",
  "description": "RO-Crate Metadata Descriptor (this file)",
  "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"},
  "about": {
    "@id": "./",
    "@type": "Dataset",
    "name": "Example RO-Crate",
    "description": "The RO-Crate Root Data Entity",
    "hasPart": [
      { "@id": "data1.txt",
        "@type": "File",
        "description": "One of hopefully many Data Entities",
      },
      { "@id": "subfolder/",
        "@type": "Dataset"
      }
    ]
  }
}

Performing JSON-LD flattening with:

{ "@context": 
     "https://w3id.org/ro/crate/1.2-DRAFT/context"
}

Results in a valid RO-Crate JSON-LD (actual order in @graph may differ):

{
  "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      },
      "description": "RO-Crate Metadata Descriptor (this file)"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

Expanding/parsing JSON-LD keeping relative referencing

JSON-LD Expansion can be used to resolve terms from the @context to absolute URIs, e.g. http://schema.org/description. This may be needed to parse extended properties or for combinations with other Linked Data.

This algorithm would normally also expand @id fields based on the current base URI of the RO-Crate Metadata File, but this may be a temporary location like file:///tmp/rocrate54/ro-crate-metadata.json, meaning @id: subfolder/ becomes file:///tmp/rocrate54/subfolder/ after JSON-LD expansion.

To avoid absoluting local identifiers, before expanding, augment the JSON-LD @context to ensure it is an array that includes {"@base": null}.

For example, expanding this JSON-LD:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": null}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      },
      "description": "RO-Crate Metadata Descriptor (this file)"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    }
  ]
}

Results in a expanded form without @context, using absolute URIs for properties and types, but retains relative URI references for entities within the RO-Crate Root:

[
  {
    "@id": "ro-crate-metadata.json",
    "@type": [
      "http://schema.org/CreativeWork"
    ],
    "http://schema.org/about": [
      {
        "@id": "./"
      }
    ],
    "http://purl.org/dc/terms/conformsTo": [
      {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      }
    ],
    "http://schema.org/description": [
      {
        "@value": "RO-Crate Metadata Descriptor (this file)"
      }
    ]
  },
  {
    "@id": "./",
    "@type": [
      "http://schema.org/Dataset"
    ],
    "http://schema.org/description": [
      {
        "@value": "The RO-Crate Root Data Entity"
      }
    ],
    "http://schema.org/hasPart": [
      {
        "@id": "data1.txt"
      },
      {
        "@id": "subfolder/"
      }
    ],
    "http://schema.org/name": [
      {
        "@value": "Example RO-Crate"
      }
    ]
  }
]

Establishing absolute URI for RO-Crate Root

When loading RO-Crate JSON-LD as RDF, or combining the crate’s Linked Data into a larger JSON-LD, it is important to ensure correct base URI to resolve URI references that are relative to the RO-Crate Root.

For instance, consider this HTTP redirection from a permalink (simplified):

GET https://w3id.org/ro/crate/1.0/crate HTTP/1.1

HTTP/1.1 301 Moved Permanently
Location: https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
GET https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld HTTP/1.1

HTTP/1.1 200 OK
Content-Type: application/ld+json

{
    "@context": "https://w3id.org/ro/crate/1.0/context",
    "@graph": [
      {
        "@id": "ro-crate-metadata.jsonld",
        "@type": "CreativeWork",
        "conformsTo": {
          "@id": "https://w3id.org/ro/crate/1.0"
        },
        "about": {
          "@id": "./"
        },
        "license": {
          "@id": "https://creativecommons.org/publicdomain/zero/1.0/"
        }
      },
      {
        "@id": "./",
        "@type": "Dataset",
        "hasPart": [
          {
            "@id": "index.html"
          }
      }
    ]
}

Following redirection we see that:

  • Base URI of the RO-Crate Metadata File becomes https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
  • The absolute URI for index.html resolves to https://www.researchobject.org/ro-crate/1.0/index.html
    • ..rather than https://w3id.org/ro/crate/1.0/index.html which would not redirect correctly

This example also use RO-Crate 1.0, where the RO-Crate Metadata File is called ro-crate-metadata.jsonld instead of ro-crate-metadata.json. Note that the recommended algorithm to find the Root Data Entity is agnostic to the actual filename.

Finding RO-Crate Root in RDF triple stores

When parsing RO-Crate JSON-LD as RDF, where the RDF framework performs resolution to absolute URIs, it may be difficult to find the RO-Crate Root in the parsed triples.

The algorithm proposed in section Root Data Entity allows finding the RDF resource describing ro-crate-metadata.json, independent of its parsed base URI. We can adopt this for RDF triples, thus finding crates conforming to this specification can be queried with SPARQL:

PREFIX schema:  <http://schema.org/>

SELECT ?crate ?metadatafile
WHERE {
  ?crate        a                  schema:Dataset .
  ?metadatafile schema:about       ?crate .
  filter(contains(str(?metadatafile), "ro-crate-metadata.json"))
}

Parsing as RDF with a different RO-Crate Root

When parsing a RO-Crate Metadata File into RDF triples, for instance uploading it to a graph store like Apache Jena’s Fuseki, it is important to ensure consistent base URI:

  • Some RDF stores and RDF formats don’t support relative URI references in triples (see RDF 1.1 note on IRIs)
  • The RO-Crate Root may depend on where the RO-Crate Metadata File was parsed from, e.g. <file:///tmp/ro-crate-metadata.json> (file) or <http://localhost:3030/test/ro-crate-metadata.json> (web upload)
  • Parsing multiple RO-Crates into the same RDF graph, using same base URI, may merge them into the same RO-Crate
  • ro-crate-metadata.json may not be recognized as JSON-LD and must be renamed to ro-crate-metadata.jsonld
  • Web servers hosting ro-crate-metadata.json may not send the JSON-LD Content-Type
  • If base URI is not correct it may be difficult to find the corresponding file and directory paths from an RDF query returning absolute URIs

If a web-based URI for the RO-Crate root is known, then this can be supplied as a base URI. Most RDF tools support a --base option or similar. If this is not possible, then the @context of the RO-Crate JSON-LD can be modified by ensuring the @context is an array that sets the desired @base:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": "http://example.com/crate255/"}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

Parsing this will generate triples like below using http://example.com/crate255/ as the RO-Crate Root (shortened):

<http://example.com/crate255/ro-crate-metadata.json> 
  <http://purl.org/dc/terms/conformsTo> 
  <https://w3id.org/ro/crate/1.2-DRAFT> .

<http://example.com/crate255/ro-crate-metadata.json> 
  <http://schema.org/about> 
  <http://example.com/crate255/> .

<http://example.com/crate255/> 
  <http://schema.org/name> 
  "Example RO-Crate" .

<http://example.com/crate255/>
  <http://schema.org/hasPart>
  <http://example.com/crate255/data1.txt> .

<http://example.com/crate255/>
  <http://schema.org/hasPart>
  <http://example.com/crate255/subfolder/> .

<http://example.com/crate255/data1.txt>
 <http://schema.org/description> 
 "One of hopefully many Data Entities" .

Generating a RO-Crate JSON-LD from such triples can be done by first finding the RO-Crate Root and then use it as base URI to relativize absolute URIs within RO-Crate Root.

Establishing a base URI inside a ZIP file

An RO-Crate may have been packaged as a ZIP file or similar archive. RO-Crates may exist in a temporary file path which should not determine its identifiers.

When parsing such crates it is recommended to use the Archive and Package (arcp) URI scheme to establish a temporary/location-based UUID or hash-based (SHA256) base URI.

For instance, given a randomly generated UUID b7749d0b-0e47-5fc4-999d-f154abe68065 we can use arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ as the @base:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/"}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

Parsing this as RDF will generate triples including:

<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ro-crate-metadata.json> 
  <http://schema.org/about> 
  <arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> .

<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> 
  <http://schema.org/hasPart> 
  <arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/data1.txt> .

Here consumers can assume / is the RO-Crate Root and generating relative URIs can safely be achieved by search-replace as the arcp URI is unique. Saving RO-Crate JSON-LD from the triples can be done by using the arcp URI to relativize absolute URIs within RO-Crate Root.

Relativizing absolute URIs within RO-Crate Root

Some applications may prefer working with absolute URIs, e.g. in a joint graph store or web-based repository, but should relativize URIs within the RO-Crate Root before generating the RO-Crate Metadata File.

Assuming a repository at example.com has JSON-LD with absolute URIs:

{
  "@context": "https://w3id.org/ro/crate/1.2-DRAFT",
  "@graph": [
    {
      "@id": "http://example.com/crate415/ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "http://example.com/crate415/"
      },
    },
    {
      "@id": "http://example.com/crate415/",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "http://example.com/crate415/data1.txt"
        },
        {
          "@id": "http://example.com/crate415/subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    }
  ]
}

Then performing JSON-LD flattening with this @context:

{ "@context": [
    {"@base": "http://example.com/crate415/"},
     "https://w3id.org/ro/crate/1.2-DRAFT"
  ]
}

Will output RO-Crate JSON-LD with relative URIs:

{
  "@context": [
    {
      "@base": "http://example.com/crate415/"
    },
    "https://w3id.org/ro/crate/1.2-DRAFT"
  ],
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    }
  ]
}