Process Run Crate
- Version: 0.5
- Permalink: https://w3id.org/ro/wfrun/process/0.5
- Authors: Workflow Run RO-Crate working group
- License: Apache License, version 2.0 (SPDX:
Apache-2.0
) - Example conforming crate: ro-crate-metadata.json ro-crate-preview.html
- Profile Crate: ro-crate-metadata.json ro-crate-preview.html
- Extends:
- JSON-LD context: https://w3id.org/ro/terms/workflow-run/context
- Vocabulary terms: https://w3id.org/ro/terms/workflow-run#
This profile uses terminology from the RO-Crate 1.1 specification, and extends it with additional terms from the workflow-run ro-terms namespace.
Overview
This profile is used to describe the execution of an implicit workflow, indicating that one or more computational tools have been executed, typically generating some result files that are represented as data entities in the RO-Crate.
By “implicit workflow” we mean that the composition of these tools may have been done by hand (a user executes one tool following another) or by some script that has not yet been included as part of the crate (for instance because it is an embedded part of a larger application).
This profile requires the indication of Software used to create files, namely a SoftwareApplication (the tool) and a CreateAction (the execution of said tool).
The following diagram shows the relationships between provenance-related entities. Note the distinction between prospective provenance (plans for activities, e.g., an application) and retrospective provenance (what actually happened, e.g. the execution of an application).
Example Metadata File (ro-crate-metadata.json
)
{ "@context": [
"https://w3id.org/ro/crate/1.1/context",
"https://w3id.org/ro/terms/workflow-run/context"
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "./"}
},
{
"@id": "./",
"@type": "Dataset",
"conformsTo": {"@id": "https://w3id.org/ro/wfrun/process/0.4"},
"hasPart": [
{"@id": "pics/2017-06-11%2012.56.14.jpg"},
{"@id": "pics/sepia_fence.jpg"}
],
"mentions": {"@id": "#SepiaConversion_1"},
"name": "My Pictures"
},
{ "@id": "https://w3id.org/ro/wfrun/process/0.4",
"@type": "CreativeWork",
"name": "Process Run Crate",
"version": "0.1"
},
{
"@id": "https://www.imagemagick.org/",
"@type": "SoftwareApplication",
"url": "https://www.imagemagick.org/",
"name": "ImageMagick",
"softwareVersion": "6.9.7-4"
},
{
"@id": "#SepiaConversion_1",
"@type": "CreateAction",
"name": "Convert dog image to sepia",
"description": "convert -sepia-tone 80% test_data/sample/pics/2017-06-11\\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg",
"endTime": "2018-09-19T17:01:07+10:00",
"instrument": {"@id": "https://www.imagemagick.org/"},
"object": {"@id": "pics/2017-06-11%2012.56.14.jpg"},
"result": {"@id": "pics/sepia_fence.jpg"},
"agent": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "pics/2017-06-11%2012.56.14.jpg",
"@type": "File",
"description": "Original image",
"encodingFormat": "image/jpeg",
"name": "2017-06-11 12.56.14.jpg (input)"
},
{
"@id": "pics/sepia_fence.jpg",
"@type": "File",
"description": "The converted picture, now sepia-colored",
"encodingFormat": "image/jpeg",
"name": "sepia_fence (output)"
},
{
"@id": "https://orcid.org/0000-0001-9842-9718",
"@type": "Person",
"name": "Stian Soiland-Reyes"
}
]
}
Note that the command line shown in the action’s description
is not directly re-executable, as file paths are not required to match the RO-Crate locations. For a more structural and reproducible description of tool executions, see Workflow Run Crate.
Requirements
Property | Required? | Description |
Dataset (the root data entity, e.g. "@id": "./" ) |
||
---|---|---|
conformsTo | MUST | MUST reference a CreativeWork entity with an @id URI that is consistent with the versioned Permalink of this document, e.g. {"@id": "https://w3id.org/ro/wfrun/process/0.4"} |
SoftwareApplication | ||
@type | MUST | SHOULD include SoftwareApplication, SoftwareSourceCode or ComputationalWorkflow |
@id | MUST | SHOULD be an absolute URI, but MAY be a relative URI to a data entity in the crate (e.g. "bin/simulation4" ) or a local identifier for tools that are not otherwise described on the web (e.g. "#statistical-analysis" ) |
name | SHOULD | A human readable name for the tool in general (not just how it was used here) |
url | SHOULD | Homepage, documentation or source for the tool |
version | SHOULD | The version string for the software application. In the case of a SoftwareApplication , this MAY be provided via the more specific softwareVersion. SoftwareApplication entities SHOULD NOT specify both version and softwareVersion : in this case, consumers SHOULD prioritize softwareVersion . In order to facilitate comparison attempts by consumers, it is RECOMMENDED to specify a machine-readable version string if available (see for instance Python's PEP 440). |
CreateAction | ||
@type | MUST | SHOULD be CreateAction to indicate that this tool created the result data entities. MAY be ActivateAction if the provenance does not include any result . MAY be UpdateAction if the tool modified an existing data entity or database in-place. |
@id | MUST | A unique identifier for the execution, e.g. "urn:uuid:50ec5c76-1f7a-4130-8ef6-846756b228c1" , "#f99a8e6c" . MAY be an absolute URI, e.g. http://example.com/runs/846756b228c1. The use of randomly generated UUIDs (type 4) is RECOMMENDED. SHOULD be listed under mentions of the root data entity. |
name | SHOULD | Short human-readable description of the execution. |
description | SHOULD | Details of the execution, for instance command line arguments or settings. This field is for information only, no particular structure is to be assumed. |
endTime | SHOULD | The time the process ended, i.e. when the last of the entities in result has been created. SHOULD be a DateTime in ISO 8601 format. |
startTime | MAY | The time the process started, i.e. the earliest time the process may have accessed an entity in object . SHOULD be a DateTime in ISO 8601 format. |
instrument | MUST | Identifier of the executed tool. |
agent | SHOULD | Identifier of a Person or Organization contextual entity that started/executed this tool. |
object | MAY | The identifier of one or more entities of the RO-Crate that were consumed by this action, e.g. input files or reference datasets. |
result | SHOULD | The identifier of one or more entities that were created or modified by this action, e.g. output files. |
actionStatus | MAY | SHOULD be CompletedActionStatus if the process completed successfully or FailedActionStatus if it failed to complete. In the latter case, consumers should be prepared for the absence of any dependent actions in the metadata. If this attribute is not specified, consumers should assume that the process completed successfully. |
error | MAY | Additional information on the cause of the failure, such as an error message from the application, if available. SHOULD NOT be specified unless actionStatus is set to FailedActionStatus . |
Entities referenced by an action’s object or result SHOULD be of type File
(an RO-Crate alias for MediaObject) for files, Dataset for directories and Collection for multi-file datasets, but MAY be a CreativeWork for other types of data (e.g. an online database); they MAY be of type PropertyValue to capture numbers/strings that are not stored as files.
Data entities involved in an application’s input and output SHOULD have an @id
that reflects the original file or directory name as processed by the application, but MAY be renamed to avoid clashes with other entities in the crate. In this case, they SHOULD refer to the original name via alternateName. This is particularly important to support reproducibility in cases where an application expects to find input in specific locations and with specific names (see the MIRAX example in Representing multi-file objects).
Multiple processes
A process crate can be used to indicate one single execution as a single CreateAction
, or a series of processes that generate different data entities. These actions MAY form an implicit workflow by following the links between entities that appear as result
in an action and as object
in the following one, but a process crate is not required to ensure such consistency (e.g. there may be an intermediate action that has not been recorded).
Referencing configuration files
Some applications support the modification of their behavior via configuration files. Typically, these are not part of the input interface, but are searched for by the application among a set of possible predefined file system paths. In the case of applications that support a configuration file, the specific configuration file used during a run SHOULD be added to the object
attribute of the corresponding CreateAction
, especially if its settings are different from the default ones.
{
"@id": "#SepiaConversion_1",
"@type": "CreateAction",
"name": "Convert dog image to sepia",
"description": "convert -sepia-tone 80% test_data/sample/pics/2017-06-11\\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg",
"endTime": "2018-09-19T17:01:07+10:00",
"instrument": {"@id": "https://www.imagemagick.org/"},
"object": [
{"@id": "pics/2017-06-11%2012.56.14.jpg"},
{"@id": "SepiaConversion_1/colors.xml"}
],
"result": {"@id": "pics/sepia_fence.jpg"},
"agent": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "SepiaConversion_1/colors.xml",
"@type": "File",
"description": "Imagemagick color names configuration",
"encodingFormat": "text/xml",
"name": "colors"
}
Representing multi-file objects
In some formats, the data belonging to a digital entity is stored in more than one file. For instance, the Mirax2-Fluorescence-2 image is stored as the following set of files:
Mirax2-Fluorescence-2.mrxs
Mirax2-Fluorescence-2/Index.dat
Mirax2-Fluorescence-2/Slidedat.ini
Mirax2-Fluorescence-2/Data0000.dat
Mirax2-Fluorescence-2/Data0001.dat
...
Mirax2-Fluorescence-2/Data0023.dat
An application that reads this format needs to be pointed to the .mrxs
file, and expects to find a directory containing the other files in the same location as the .mrxs
file, with the same name minus the extension. Thus, even though an application that processes MIRAX files would probably take only the .mrxs
file as argument, the other ones must be present in the expected location and with the expected names (in CWL, this kind of relationship is expressed via secondaryFiles
). In this case, the object SHOULD be represented by a contextual entity of type Collection listing all files under hasPart
, with a mainEntity
referencing the main file. The collection SHOULD be referenced from the root data entity via mentions
.
{
"@id": "./",
"@type": "Dataset",
"hasPart": [
{"@id": "Mirax2-Fluorescence-2.mrxs"},
{"@id": "Mirax2-Fluorescence-2/"},
{"@id": "Mirax2-Fluorescence-2.png"}
],
"mentions": [
{"@id": "https://openslide.cs.cmu.edu/download/openslide-testdata/Mirax/Mirax2-Fluorescence-2.zip"},
{"@id": "#conversion_1"}
]
},
{
"@id": "https://openslide.org/",
"@type": "SoftwareApplication",
"url": "https://openslide.org/",
"name": "OpenSlide",
"version": "3.4.1"
},
{
"@id": "#conversion_1",
"@type": "CreateAction",
"name": "Convert image to PNG",
"endTime": "2018-09-19T17:01:07+10:00",
"instrument": {"@id": "https://openslide.org/"},
"object": {"@id": "https://openslide.cs.cmu.edu/download/openslide-testdata/Mirax/Mirax2-Fluorescence-2.zip"},
"result": {"@id": "Mirax2-Fluorescence-2.png"}
},
{
"@id": "https://openslide.cs.cmu.edu/download/openslide-testdata/Mirax/Mirax2-Fluorescence-2.zip",
"@type": "Collection",
"mainEntity": {"@id": "Mirax2-Fluorescence-2.mrxs"},
"hasPart": [
{"@id": "Mirax2-Fluorescence-2.mrxs"},
{"@id": "Mirax2-Fluorescence-2/"}
]
},
{
"@id": "Mirax2-Fluorescence-2.mrxs",
"@type": "File"
},
{
"@id": "Mirax2-Fluorescence-2/",
"@type": "Dataset"
},
{
"@id": "Mirax2-Fluorescence-2.png",
"@type": "File"
}
If the collection does not have a web presence, its @id
can be an arbitrary internal one, possibly randomly generated (as for any other contextual entity):
{
"@id": "#af0253d688f3409a2c6d24bf6b35df7c4e271292",
"@type": "Collection",
"mainEntity": {"@id": "Mirax2-Fluorescence-2.mrxs"},
"hasPart": [
{"@id": "Mirax2-Fluorescence-2.mrxs"},
{"@id": "Mirax2-Fluorescence-2/"}
]
}
The use case shown here is an example of a situation where it’s important to refer to the original names in case any renamings took place, as described in Requirements:
{
"@id": "#af0253d688f3409a2c6d24bf6b35df7c4e271292",
"@type": "Collection",
"mainEntity": {"@id": "f62aa607a75508ac5fc6a22e9c0e39ef58a2c852"},
"hasPart": [
{"@id": "f62aa607a75508ac5fc6a22e9c0e39ef58a2c852"},
{"@id": "c7398fbf741b851e80ae731d60cbee9258ff81f3/"}
]
},
{
"@id": "f62aa607a75508ac5fc6a22e9c0e39ef58a2c852",
"@type": "File",
"alternateName": "Mirax2-Fluorescence-2.mrxs"
},
{
"@id": "c7398fbf741b851e80ae731d60cbee9258ff81f3/",
"@type": "Dataset",
"alternateName": "Mirax2-Fluorescence-2/",
"hasPart": [
{"@id": "c7398fbf741b851e80ae731d60cbee9258ff81f3/46c443af080a36000c9298b49b675eb240eeb41c"},
...
]
},
{
"@id": "c7398fbf741b851e80ae731d60cbee9258ff81f3/46c443af080a36000c9298b49b675eb240eeb41c",
"@type": "File",
"alternateName": "Mirax2-Fluorescence-2/Index.dat"
},
...
Representing environment variable settings
The behavior of some applications may be modified by setting appropriate environment variables. These are different from ordinary application inputs in that they are part of the environment in which the process runs, rather than parameters supplied through a command line or a graphical interface. To represent the fact that an environment variable was set to a certain value during the execution of an action, use the environment
property from the workflow-run ro-terms namespace, making it point to a PropertyValue
that describes the setting:
{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
"https://w3id.org/ro/terms/workflow-run/context"
],
"@graph": [
...
{
"@id": "#SepiaConversion_1",
"@type": "CreateAction",
"instrument": {"@id": "https://www.imagemagick.org/"},
"object": {"@id": "pics/2017-06-11%2012.56.14.jpg"},
"result": {"@id": "pics/sepia_fence.jpg"},
"environment": [
{"@id": "#height-limit-pv"},
{"@id": "#width-limit-pv"}
]
},
{
"@id": "#width-limit-pv",
"@type": "PropertyValue",
"name": "MAGICK_WIDTH_LIMIT",
"value": "4096"
},
{
"@id": "#height-limit-pv",
"@type": "PropertyValue",
"name": "MAGICK_HEIGHT_LIMIT",
"value": "3072"
}
]
}
Note that we added the workflow-run
context to the @context
entry in order to bring in the definition of environment
.
Environment variable settings SHOULD be listed if they are different from the default ones (usually unset) and affected the results of the action.
Representing container images
An application may use one or more container images (e.g. Docker container images) to perform its duty. An action MAY indicate that a container image was used during the execution via the containerImage
property, defined in the workflow-run ro-terms namespace.
{
"@id": "#cb04c897-eb92-4c53-8a38-bcc1a16fd650",
"@type": "CreateAction",
"instrument": {"@id": "bam2fastq.cwl"},
...
"containerImage": {"@id": "#samtools-image"}
},
{
"@id": "#samtools-image",
"@type": "ContainerImage",
"additionalType": {"@id": "https://w3id.org/ro/terms/workflow-run#DockerImage"},
"registry": "docker.io",
"name": "biocontainers/samtools",
"tag": "v1.9-4-deb_cv1",
"sha256": "da61624fda230e94867c9429ca1112e1e77c24e500b52dfc84eaf2f5820b4a2a"
}
The ContainerImage
type (note the leading lowercase “C”) and most of the properties shown above are also defined in the workflow-run namespace. The additionalType
describes the specific image type (e.g., DockerImage
, SIFImage
); the registry is the service that hosts the image (e.g., “docker.io”, “quay.io”); the name
is the identifier of the image within the registry; tag
describes the image tag and sha256
its sha256 checksum. A ContainerImage
entity SHOULD list at least the additionalType
, registry
and name
properties.
Alternatively, the containerImage
could point to a URL
. For instance:
{
"@id": "#cb04c897-eb92-4c53-8a38-bcc1a16fd650",
"@type": "CreateAction",
"instrument": {"@id": "bam2fastq.cwl"},
...
"containerImage": "https://example.com/samtools.sif"
}
Specifying software dependencies
Software dependencies MAY be specified using softwareRequirements
to a SoftwareApplication
:
{
"@id": "script.py",
"@type": "SoftwareApplication",
"name": "Analysis Script",
"version": "0.1",
"softwareRequirements": {"@id": "https://pypi.org/project/numpy/1.26.2/"}
},
{
"@id": "https://pypi.org/project/numpy/1.26.2/",
"@type": "SoftwareApplication",
"name": "NumPy",
"version": "1.26.2"
}