The RO-Crate specification uses http://schema.org/ as the base vocabulary for describing entities, with a handful of additions from other vocabularies. These are brought into a particular RO-Crate metadata document using the JSON-LD key @context
.
The main principle when extending RO-Crate is to identify potential matching types and properties in schema.org and re-use these, allowing for a liberal interpretation. The schema.org type hierarchy may be helpful for exploring the existing types.
This is particularly the case when defining a new profile for a set of RO-Crates. For instance, the Workflow Run Crate profiles uses multipe Action
subclasses to describe workflow execution. Another advantage of this is that the @context
do not need to be modified as most schema.org terms are part of the RO-Crate context.
Note: The RO-Crate context is based on a released version of schema.org, for instance https://w3id.org/ro/crate/1.2/context is based on schema.org v22.0 meaning terms added to schema.org after this date needs to be defined explicitly in @context
.
In some cases, your crates can benefit from additional vocabularies which might offer extra precision when describing specific domains. Extending RO-Crate can be done with ad-hoc terms or by using an existing RDF vocabulary. Such vocabularies should be noted in the RO-Crate profile as a DefinedTermSet
entity. This page suggests how such vocabularies can be identified.
Vocabulary Sources
Below are some helpful links that might be helpful in finding a vocabulary or ontology that is RO-Crate compatible. Some vocabularies are indexed in multiple databases.
- W3C publishes standards, drafts and notes relating to linked data. Some of these are themselves vocabularies:
- DBO (DBpedia Ontology) is a general purpose ontology for describing Wikipedia data
- awesome-ontology links to other useful databases and ontologies
Life Sciences
- BioPortal indexes some 17 million classes relating to the life sciences
- EBISPOT (EMBL-EBI Samples, Phenotypes, and Ontologies Team)
- EFO (Experimental Factor Ontology) describes experimental variables
- DUO (Data Use Ontology) can be used to describe data use and patient consent
- OBO (Open Biological and Biomedical Ontology Foundry)
- OBI (Ontology for Biomedical Investigations) describes research projects in the life sciences
- GO (Gene Ontology) describes the function of genes
- CL (Cell Ontology) describes cell types
- Bioschemas extends the schema.org model to include concepts in the life sciences
General Advice
IRI Requirement
Not all vocabularies are RO-Crate compatible. At a minimum, an RO-Crate compatible vocabulary needs to have IRIs for each class and property you want to use. For example, the REMBI schema for biological imaging defines some useful types and properties. However, none of these have IRIs, so how could they fit into RO-Crate?
On the other hand, something like EDAM-BIOIMAGING
can be readily used in RO-Crate because, for example, the Image
class has the IRI of http://edamontology.org/data_Image
.
Adapting Vocabularies
If a vocabulary you want to use doesn’t have IRIs, you can try to solve this in a few ways:
- Request canonical IRIs from the author of the vocabulary
- Assign IRIs yourself, and publish your IRIs for others to re-use. Some of the guidance in the specification can be helpful here.
- Choose a different vocabulary with IRIs that captures a similar domain
Making JSON-LD Contexts
If you just want to use one term from a vocabulary, you can just add a new context entry with the ID and term name (e.g. the class’s name).
For example, if I wanted to use the wasInformedBy
property in the PROV-O vocabulary, we could go to the documentation on that property, find the IRI, and then add it to our context:
{
"@context": [
"https://w3id.org/ro/crate/1.2/context",
{
"wasInformedBy": "http://www.w3.org/ns/prov#wasInformedBy"
}
]
}
Some vocabularies publish JSON-LD contexts that you can copy into your crates in order to “import” the vocabulary. For example, the OME imaging vocabulary has an official context available here.
OWL
If you want to use an entire OWL vocabulary, you can consider using the owl2jsonld
tool, for example:
wget https://github.com/stain/owl2jsonld/releases/download/0.2.1/owl2jsonld-0.2.1-standalone.jar
java -jar owl2jsonld-0.2.1-standalone.jar https://github.com/EBISPOT/efo/releases/download/current/efo.owl
Unfortunately this may output uninformative term names like “OBA_0002423”. It is recommended that you instead use the class or property name in this case, e.g.
{
"@context": [
"https://w3id.org/ro/crate/1.2/context",
{
"CholesterolEsterificationRate ": "http://purl.obolibrary.org/obo/OBA_0002423"
}
]
}
LinkML
You can generate a JSON-LD context from any LinkML schema using the linkml
Python package
pip install linkml
linkml generate jsonld-context some-schema.yml
Searching the Web
Different domains describe their vocabularies differently. You might have more luck by varying these search terms: “vocabulary”, “ontology”, “schema”, “data model”. Then, to make sure the vocabulary uses IRIs and is therefore RO-Crate compatible, you might want to add “RDF”, “OWL” or “IRI” to your search.