Skip to main content Link Menu Expand (external link) Document Search Copy Copied

APPENDIX: Handling relative URI references

Table of contents

  1. Converting from Attached to Detached RO-Crate
  2. Converting from Detached to Attached RO-Crate
  3. Handling relative URI references when using JSON-LD/RDF tools
  4. Flattening JSON-LD from nested JSON
  5. Expanding/parsing JSON-LD keeping relative referencing
  6. Establishing absolute URI for RO-Crate Root
  7. Finding RO-Crate Root in RDF triple stores
  8. Parsing as RDF with a different RO-Crate Root
  9. Establishing a base URI inside a ZIP file
  10. Relativizing absolute URIs within RO-Crate Root

In an Attached RO-Crate, the RO-Crate Metadata File use relative URI references to identify files and directories contained within the RO-Crate Root and its children. As described in section Describing entities in JSON-LD, relative URI references are also frequently used for identifying Contextual entities.

Converting from Attached to Detached RO-Crate

An Attached RO-Crate can be published on the Web by placing its RO-Crate Root directory on a static file-based Web server (e.g. Nginx, Apache HTTPd, GitHub Pages). The use of relative URI references in the RO-Crate Metadata File ensures identifiers of data entities work as they should.

Sometimes it is desired to make a Detached RO-Crate, e.g. for depositing or integrating the RO-Crate Metadata File into a knowledge graph or repository that is unable to preserve data files using their existing pathnames. In this case one needs to:

  1. Decide on new Web locations for individual data files and update their absolute URI in @id
  2. Observe the preservation considerations for Web-based Data Entities
  3. Ensure all nested directories not browsable on the Web are represented as Dataset with its content listed with hasPart or distribution (see section Directories on the web). Change their relative @id to become absolute, e.g. using ARCP.
  4. Rewrite the JSON-LD with absolute URIs for data entities

If the RO-Crate is already published on the Web, with directory browsing enabled for nested directories, then these steps can be achieved using JSON-LD tooling.

For example, as the RO-Crate Metadata file https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json along with the RO-Crate Root is published on the Web (using GitHub Pages), we can generate a random UUID (e.g. d6be5c9b-132a-4a93-9837-3e02e06c08e6) and use JSON-LD flattening from this context:

{ "@context": [
   {"@base": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json"},
   "https://w3id.org/ro/crate/1.1/context"
  ]
}

to this context:

{ "@context": [
   {"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
   "https://w3id.org/ro/crate/1.1/context"
  ]  
}

None of the existing resources will have a @id starting with this fresh base URI, therefore all URIs will be made absolute. The resulting {@base: ..} is harmless, but can be removed from the output JSON-LD.

Example output (abbreviated):

{
  "@context": [
    {"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
    "https://w3id.org/ro/crate/1.1/context"
  ],
  "@graph": [
    {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
      "creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
    },
    {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
      "@type": "Dataset",
      "hasPart": [
        { "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
        { "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
      ],
      "name": "Workflow RO-Crate profile"
    }

Notice how identifiers like ro-crate-metadata.json, ./, index.html and example/ have been translated to absolute URIs.

The above JSON-LD processing will also expand any #-based local identifiers of contextual entities:

  {
      "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
      "@type": "Recommendation",
      "category": "MUST",
      "name": "Include Main Workflow",
      "itemReviewed": {
        "@id": "https://bioschemas.org/ComputationalWorkflow"
      }
    }

In this approach, the Detached RO-Crate can be resolved to the corresponding Attached RO-Crate by following the @id of the Root Data Set or the Root Metadata File entity.

If the new Detached RO-Crate is not meant as a snapshot of the corresponding Attached RO-Crate, then such contextual entities should be assigned new @id, e.g. by generating random UUIDs like urn:uuid:e47e41d9-f924-4c07-bc90-97e7ed34fe35. Such tranformations are typically not catered for by traditional JSON-LD tooling and require additional implementation.

Converting from Detached to Attached RO-Crate

Converting a Detached Crate to an Attached Crate can mean multiple things depending on intentions, and may imply an elaborate process.

First, check if the Root Dataset already have a distribution download listed, in which case that can be retrieved as the corresponding Attached Crate.

To archive a snapshot of an Detached Crate’s metadata, keeping all data entities web-based:

  • Crate a new folder as the RO-Crate Root, save the RO-Crate Metadata Document as the RO-Crate Metadata File according to Attached RO-Crate structure
  • Copy the absolute @id to become an identifier according to recommendations for Root Data Entity identifier
  • Change the @id of the root dataset to ./ and update all references to it, including from the Metadata Descriptor

If the new Attached Crate is intended as a fork that will evolve independently of the Detached Crate, then:

  • Delete the identifier, add the previous @id as isBasedOn
  • Delete/update datePublished and publisher
  • Add yourself as author or contributor to the Root Dataset
  • Add records of changes to the Crate

To additionally save Web-based Data entities to become part of the Detached Crate, a possible algorithm is:

  • For each data entity which @type include Dataset:
    • If it has a distribution download, retrieve and unpack that according to its encodingFormat, using its new folder name as the new local path name.
    • If not, create a corresponding folder in the RO-Crate Root, possibly generating the local path name based on name or path elements of @id URI
    • Replace the @id of the dataset and all its references with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.
    • Recurse this algorithm to process each data entity from this dataset’s hasPart
  • For each data entity which @type include File:
    • Decide based on @id URI elements, contentSize encodingFormat and (possibly implied) licence if this file is acceptable to archive
    • Retrieve the file and check the contentSize matches, if specified
    • Store the file with a file path generated in a way consistent with the Datasets, ideally added to the folder of the first Dataset that directly has this data entity as its hasPart
    • Add the previous @id downloaded from as contentUrl according to Embedded data entnties that are also on the Web
    • Replace the @id of the File with the relative URI based on the path from the RO-Crate Root, encoding file paths as necessary.

As this procedure can be error-prone (e.g. a Web-based entity may not be accessible or may require authentication), the implementation should consider the new Attached Crate as a fork and update identifier and isDefinedBy as specified above.

If you are archiving an attached RO-Crate that is already on the Web, then first establish the absolute URI for the root, and retrieve all payload files that are considered URI path-wise to be part the RO-Crate Root, creating corresponding local paths. In this scenario the above algorithm can be simplified and the rewriting of identifiers can be avoided if they are already relative URIs.

Handling relative URI references when using JSON-LD/RDF tools

When using JSON-LD tooling and RDF libraries to consume or generate RO-Crates, extra care should be taken to ensure these URI references are handled correctly.

For this, a couple of scenarios are sketched below with recommendations for consistent handling:

Flattening JSON-LD from nested JSON

If performing JSON-LD flattening to generate a valid RO-Crate Metadata File for a Attached RO-Crate, add @base: null to the input JSON-LD @context array to avoid expanding relative URI references. The flattening @context SHOULD NOT need @base: null.

Example, this JSON-LD is in compacted form which may be beneficial for processing, but is not yet valid RO-Crate Metadata File as it has not been flattened into a @graph array.

{ 
  "@context": [
      "https://w3id.org/ro/crate/1.2-DRAFT/context"
  ],
  "@id": "ro-crate-metadata.json",
  "@type": "CreativeWork",
  "description": "RO-Crate Metadata Descriptor (this file)",
  "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"},
  "about": {
    "@id": "./",
    "@type": "Dataset",
    "name": "Example RO-Crate",
    "description": "The RO-Crate Root Data Entity",
    "hasPart": [
      { "@id": "data1.txt",
        "@type": "File",
        "description": "One of hopefully many Data Entities",
      },
      { "@id": "subfolder/",
        "@type": "Dataset"
      }
    ]
  }
}

Performing JSON-LD flattening with:

{ "@context": 
     "https://w3id.org/ro/crate/1.2-DRAFT/context"
}

Results in a valid RO-Crate JSON-LD (actual order in @graph may differ):

{
  "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context",
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      },
      "description": "RO-Crate Metadata Descriptor (this file)"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

The saved RO-Crate JSON-LD SHOULD NOT include {@base: null} in its @context.

Expanding/parsing JSON-LD keeping relative referencing

JSON-LD Expansion can be used to resolve terms from the @context to absolute URIs, e.g. http://schema.org/description. This may be needed to parse extended properties or for combinations with other Linked Data.

This algorithm would normally also expand @id fields based on the current base URI of the RO-Crate Metadata File, but this may be a temporary location like file:///tmp/rocrate54/ro-crate-metadata.json, meaning @id: subfolder/ becomes file:///tmp/rocrate54/subfolder/ after JSON-LD expansion.

To avoid absoluting local identifiers, before expanding, augment the JSON-LD @context to ensure it is an array that includes {"@base": null}.

For example, expanding this JSON-LD:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": null}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      },
      "description": "RO-Crate Metadata Descriptor (this file)"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    }
  ]
}

Results in a expanded form without @context, using absolute URIs for properties and types, but retains relative URI references for entities within the RO-Crate Root:

[
  {
    "@id": "ro-crate-metadata.json",
    "@type": [
      "http://schema.org/CreativeWork"
    ],
    "http://schema.org/about": [
      {
        "@id": "./"
      }
    ],
    "http://purl.org/dc/terms/conformsTo": [
      {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      }
    ],
    "http://schema.org/description": [
      {
        "@value": "RO-Crate Metadata Descriptor (this file)"
      }
    ]
  },
  {
    "@id": "./",
    "@type": [
      "http://schema.org/Dataset"
    ],
    "http://schema.org/description": [
      {
        "@value": "The RO-Crate Root Data Entity"
      }
    ],
    "http://schema.org/hasPart": [
      {
        "@id": "data1.txt"
      },
      {
        "@id": "subfolder/"
      }
    ],
    "http://schema.org/name": [
      {
        "@value": "Example RO-Crate"
      }
    ]
  }
]

@base: null will not relativize existing absolute URIs that happen to be contained by the RO-Crate Root (see section Relativizing absolute URIs within RO-Crate Root).

Most RDF parsers supporting JSON-LD will perform this kind of expansion before generating triples, but not all RDF stores or serializations support relative URI references. Consider using an alternative @base as detailed in sections below.

Establishing absolute URI for RO-Crate Root

When loading RO-Crate JSON-LD as RDF, or combining the crate’s Linked Data into a larger JSON-LD, it is important to ensure correct base URI to resolve URI references that are relative to the RO-Crate Root.

When retrieving an RO-Crate over the web, servers might have performed HTTP redirections so that the base URI is different from what was requested. It is RECOMMENDED to follow section Establishing a Base URI of RFC3986 before resolving relative links from the RO-Crate Metadata File.

For instance, consider this HTTP redirection from a permalink (simplified):

GET https://w3id.org/ro/crate/1.0/crate HTTP/1.1

HTTP/1.1 301 Moved Permanently
Location: https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
GET https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld HTTP/1.1

HTTP/1.1 200 OK
Content-Type: application/ld+json

{
    "@context": "https://w3id.org/ro/crate/1.0/context",
    "@graph": [
      {
        "@id": "ro-crate-metadata.jsonld",
        "@type": "CreativeWork",
        "conformsTo": {
          "@id": "https://w3id.org/ro/crate/1.0"
        },
        "about": {
          "@id": "./"
        },
        "license": {
          "@id": "https://creativecommons.org/publicdomain/zero/1.0/"
        }
      },
      {
        "@id": "./",
        "@type": "Dataset",
        "hasPart": [
          {
            "@id": "index.html"
          }
      }
    ]
}

Following redirection we see that:

  • Base URI of the RO-Crate Metadata File becomes https://www.researchobject.org/ro-crate/1.0/ro-crate-metadata.jsonld
  • The absolute URI for index.html resolves to https://www.researchobject.org/ro-crate/1.0/index.html
    • ..rather than https://w3id.org/ro/crate/1.0/index.html which would not redirect correctly

This example also use RO-Crate 1.0, where the RO-Crate Metadata File is called ro-crate-metadata.jsonld instead of ro-crate-metadata.json. Note that the recommended algorithm to find the Root Data Entity is agnostic to the actual filename.

Finding RO-Crate Root in RDF triple stores

When parsing RO-Crate JSON-LD as RDF, where the RDF framework performs resolution to absolute URIs, it may be difficult to find the RO-Crate Root in the parsed triples.

The algoritm proposed in section Root Data Entity allows finding the RDF resource describing ro-crate-metadata.json, independent of its parsed base URI. We can adopt this for RDF triples, thus finding crates conforming to this specification can be queried with SPARQL:

PREFIX schema:  <http://schema.org/>

SELECT ?crate ?metadatafile
WHERE {
  ?crate        a                  schema:Dataset .
  ?metadatafile schema:about       ?crate .
  filter(contains(str(?metadatafile), "ro-crate-metadata.json"))
}

Parsing as RDF with a different RO-Crate Root

When parsing a RO-Crate Metadata File into RDF triples, for instance uploading it to a graph store like Apache Jena’s Fuseki, it is important to ensure consistent base URI:

  • Some RDF stores and RDF formats don’t support relative URI references in triples (see RDF 1.1 note on IRIs)
  • The RO-Crate Root may depend on where the RO-Crate Metadata File was parsed from, e.g. <file:///tmp/ro-crate-metadata.json> (file) or <http://localhost:3030/test/ro-crate-metadata.json> (web upload)
  • Parsing multiple RO-Crates into the same RDF graph, using same base URI, may merge them into the same RO-Crate
  • ro-crate-metadata.json may not be recognized as JSON-LD and must be renamed to ro-crate-metadata.jsonld
  • Web servers hosting ro-crate-metadata.json may not send the JSON-LD Content-Type
  • If base URI is not correct it may be difficult to find the corresponding file and directory paths from an RDF query returning absolute URIs

If the RDF library can parse the RO-Crate JSON-LD directly by retrieving from a http/https URI of the RO-Crate Metadata File it should calculate the correct base URI as detailed in section Establishing absolute URI for RO-Crate Root and you should not need to override the base URI as detailed here.

If a web-based URI for the RO-Crate root is known, then this can be supplied as a base URI. Most RDF tools support a --base option or similar. If this is not possible, then the @context of the RO-Crate JSON-LD can be modified by ensuring the @context is an array that sets the desired @base:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": "http://example.com/crate255/"}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

Parsing this will generate triples like below using http://example.com/crate255/ as the RO-Crate Root (shortened):

<http://example.com/crate255/ro-crate-metadata.json> 
  <http://purl.org/dc/terms/conformsTo> 
  <https://w3id.org/ro/crate/1.2-DRAFT> .

<http://example.com/crate255/ro-crate-metadata.json> 
  <http://schema.org/about> 
  <http://example.com/crate255/> .

<http://example.com/crate255/> 
  <http://schema.org/name> 
  "Example RO-Crate" .

<http://example.com/crate255/>
  <http://schema.org/hasPart>
  <http://example.com/crate255/data1.txt> .

<http://example.com/crate255/>
  <http://schema.org/hasPart>
  <http://example.com/crate255/subfolder/> .

<http://example.com/crate255/data1.txt>
 <http://schema.org/description> 
 "One of hopefully many Data Entities" .

Generating a RO-Crate JSON-LD from such triples can be done by first finding the RO-Crate Root and then use it as base URI to relativize absolute URIs within RO-Crate Root.

Establishing a base URI inside a ZIP file

An RO-Crate may have been packaged as a ZIP file or similar archive. RO-Crates may exist in a temporary file path which should not determine its identifiers.

When parsing such crates it is recommended to use the Archive and Package (arcp) URI scheme to establish a temporary/location-based UUID or hash-based (SHA256) base URI.

For instance, given a randomly generated UUID b7749d0b-0e47-5fc4-999d-f154abe68065 we can use arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ as the @base:

{
  "@context": [
    "https://w3id.org/ro/crate/1.2-DRAFT/context",
    {"@base": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/"}
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities"
    },
    {
      "@id": "subfolder/",
      "@type": "Dataset"
    }
  ]
}

Parsing this as RDF will generate triples including:

<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ro-crate-metadata.json> 
  <http://schema.org/about> 
  <arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> .

<arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/> 
  <http://schema.org/hasPart> 
  <arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/data1.txt> .

Here consumers can assume / is the RO-Crate Root and generating relative URIs can safely be achieved by search-replace as the arcp URI is unique. Saving RO-Crate JSON-LD from the triples can be done by using the arcp URI to relativize absolute URIs within RO-Crate Root.

Bagit: The arcp specification suggests how BagIt identifiers can be used to calculate the base URI. See also section Combining with other packaging schemes - note that in this approach the RO-Crate Root will be the payload folder /data/ under the calculated arcp base URI.

Relativizing absolute URIs within RO-Crate Root

Some applications may prefer working with absolute URIs, e.g. in a joint graph store or web-based repository, but should relativize URIs within the RO-Crate Root before generating the RO-Crate Metadata File.

Assuming a repository at example.com has JSON-LD with absolute URIs:

{
  "@context": "https://w3id.org/ro/crate/1.2-DRAFT",
  "@graph": [
    {
      "@id": "http://example.com/crate415/ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "http://example.com/crate415/"
      },
    },
    {
      "@id": "http://example.com/crate415/",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "http://example.com/crate415/data1.txt"
        },
        {
          "@id": "http://example.com/crate415/subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    }
  ]
}

Then performing JSON-LD flattening with this @context:

{ "@context": [
    {"@base": "http://example.com/crate415/"},
     "https://w3id.org/ro/crate/1.2-DRAFT"
  ]
}

Will output RO-Crate JSON-LD with relative URIs:

{
  "@context": [
    {
      "@base": "http://example.com/crate415/"
    },
    "https://w3id.org/ro/crate/1.2-DRAFT"
  ],
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {
          "@id": "data1.txt"
        },
        {
          "@id": "subfolder/"
        }
      ],
      "name": "Example RO-Crate"
    },
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.2-DRAFT"
      },
      "about": {
        "@id": "./"
      }
    }
  ]
}

This method would also relativize URIs outside the RO-Crate Root that are on the same host, e.g. http://example.com/crate255/other.txt would become ../create255/other.txt - this can particularly be a challenge with local file:/// URIs. `