Workflow Run Crate
- Version: 0.3
- Permalink: https://w3id.org/ro/wfrun/workflow/0.3
- Authors: Workflow Run RO-Crate working group
This profile uses terminology from the RO-Crate 1.1 specification.
Overview
This profile is used to describe the execution of a computational tool that has orchestrated the execution of other tools. Such a tool is represented as a workflow that can be executed using a Workflow Management System (WMS), or workflow engine (e.g. cwltool).
Workflow Run Crate is a combination of Process Run Crate and Workflow RO-Crate. In particular, the RO-Crate MUST have a ComputationalWorkflow
mainEntity described according to the Workflow RO-Crate specification (main workflow), and CreateAction
instances corresponding to its execution (thus having the main workflow as instrument
) MUST be described as specified in Process Run Crate and this profile. Details regarding the execution of individual workflow steps can be described with the Provenance Run Crate profile.
Workflows can have multiple input and output parameter slots that have to be mapped to actual files, directories or other values (e.g., a string or a number) before they can be executed. It is OPTIONAL to define such entities for a ComputationalWorkflow
. If included, parameter definitions MUST be provided as FormalParameter entities and referenced from the ComputationalWorkflow
via input
and output
(see the Bioschemas ComputationalWorkflow profile).
A data entity or PropertyValue
that realizes a FormalParameter
definition SHOULD refer to it via exampleOfWork; additionally, if the data entity or PropertyValue
is an illustrative example of the parameter, the latter MAY refer back to the former using the reverse property workExample. This links the input
of a ComputationalWorkflow
to the object
of a CreateAction
, and the output
of a ComputationalWorkflow
to the result
of a CreateAction
. An object
item that does not match a slot in the workflow’s input interface (e.g., a configuration file read from a predefined path) MUST NOT refer to a FormalParameter
of the ComputationalWorkflow
via exampleOfWork
. A FormalParameter
that maps to a PropertyValue
SHOULD have a subclass of DataType (e.g., Integer) — or PropertyValue, in the case of dictionary-like structured types — as its additionalType
. See CWL parameter mapping for an example. To support reproducibility, the name
field of a FormalParameter
instance SHOULD match the name of the corresponding workflow parameter slot.
Additional properties described in the Bioschemas FormalParameter profile (e.g., defaultValue
) MAY be used to provide additional information, but strict conformance is not required. A FormalParameter
definition that strictly conforms to the Bioschemas profile SHOULD reference the relevant versioned URL via conformsTo
.
The following diagram shows the relationships between provenance-related entities. Note the distinction between prospective provenance (plans for activities, e.g. a workflow) and retrospective provenance (what actually happened, e.g. the execution of a workflow).
Example Metadata File (ro-crate-metadata.json
)
{ "@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {"@id": "./"},
"conformsTo": [
{"@id": "https://w3id.org/ro/crate/1.1"},
{"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
]
},
{
"@id": "./",
"@type": "Dataset",
"conformsTo": [
{"@id": "https://w3id.org/ro/wfrun/process/0.1"},
{"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
{"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
],
"hasPart": [
{"@id": "Galaxy-Workflow-Hello_World.ga"},
{"@id": "inputs/abcdef.txt"},
{"@id": "outputs/Select_first_on_data_1_2.txt"},
{"@id": "outputs/tac_on_data_360_1.txt"}
],
"license": {"@id": "http://spdx.org/licenses/CC0-1.0"},
"mainEntity": {"@id": "Galaxy-Workflow-Hello_World.ga"},
"mentions": {"@id": "#wfrun-5a5970ab-4375-444d-9a87-a764a66e3a47"}
},
{ "@id": "https://w3id.org/ro/wfrun/process/0.1",
"@type": "CreativeWork",
"name": "Process Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
"@type": "CreativeWork",
"name": "Workflow Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
"@type": "CreativeWork",
"name": "Workflow RO-Crate",
"version": "1.0"
},
{
"@id": "Galaxy-Workflow-Hello_World.ga",
"@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
"name": "Hello World (Galaxy Workflow)",
"author": {"@id": "https://orcid.org/0000-0001-9842-9718"},
"creator": {"@id": "https://orcid.org/0000-0001-9842-9718"},
"programmingLanguage": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate#galaxy"},
"input": [
{"@id": "#simple_input"},
{"@id": "#verbose-param"}
],
"output": [
{"@id": "#reversed"},
{"@id": "#last_lines"}
]
},
{
"@id": "#simple_input",
"@type": "FormalParameter",
"additionalType": "File",
"conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
"description": "A simple set of lines in a text file",
"encodingFormat": [
"text/plain",
{"@id": "http://edamontology.org/format_2330"}
],
"workExample": {"@id": "inputs/abcdef.txt"},
"name": "simple_input",
"valueRequired": "True"
},
{
"@id": "#verbose-param",
"@type": "FormalParameter",
"additionalType": "Boolean",
"conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
"description": "Increase logging output",
"workExample": {"@id": "#verbose-pv"},
"name": "verbose",
"valueRequired": "False"
},
{
"@id": "#reversed",
"@type": "FormalParameter",
"additionalType": "File",
"conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
"description": "All the lines, reversed",
"encodingFormat": [
"text/plain",
{"@id": "http://edamontology.org/format_2330"}
],
"name": "reversed",
"workExample": {"@id": "outputs/tac_on_data_360_1.txt"}
},
{
"@id": "#last_lines",
"@type": "FormalParameter",
"additionalType": "File",
"conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
"description": "The last lines of workflow input are the first lines of the reversed input",
"encodingFormat": [
"text/plain",
{"@id": "http://edamontology.org/format_2330"}
],
"name": "last_lines",
"workExample": {"@id": "outputs/Select_first_on_data_1_2.txt"}
},
{
"@id": "https://orcid.org/0000-0001-9842-9718",
"@type": "Person",
"name": "Stian Soiland-Reyes"
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#galaxy",
"@type": "ComputerLanguage",
"identifier": "https://galaxyproject.org/",
"name": "Galaxy",
"url": "https://galaxyproject.org/"
},
{
"@id": "#wfrun-5a5970ab-4375-444d-9a87-a764a66e3a47",
"@type": "CreateAction",
"name": "Galaxy workflow run 5a5970ab-4375-444d-9a87-a764a66e3a47",
"endTime": "2018-09-19T17:01:07+10:00",
"instrument": {"@id": "Galaxy-Workflow-Hello_World.ga"},
"subjectOf": {"@id": "https://usegalaxy.eu/u/5dbf7f05329e49c98b31243b5f35045c/p/invocation-report-a3a1d27edb703e5c"},
"object": [
{"@id": "inputs/abcdef.txt"},
{"@id": "#verbose-pv"}
],
"result": [
{"@id": "outputs/Select_first_on_data_1_2.txt"},
{"@id": "outputs/tac_on_data_360_1.txt"}
]
},
{
"@id": "inputs/abcdef.txt",
"@type": "File",
"description": "Example input, a simple text file",
"encodingFormat": "text/plain",
"exampleOfWork": {"@id": "#simple_input"}
},
{
"@id": "#verbose-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#verbose-param"},
"name": "verbose",
"value": "True"
},
{
"@id": "outputs/Select_first_on_data_1_2.txt",
"@type": "File",
"name": "Select_first_on_data_1_2 (output)",
"description": "Example output of the last (aka first of reversed) lines",
"encodingFormat": "text/plain",
"exampleOfWork": {"@id": "#last_lines"}
},
{
"@id": "outputs/tac_on_data_360_1.txt",
"@type": "File",
"name": "tac_on_data_360_1 (output)",
"description": "Example output of the reversed lines",
"encodingFormat": "text/plain",
"exampleOfWork": {"@id": "#reversed"}
},
{
"@id": "https://usegalaxy.eu/u/5dbf7f05329e49c98b31243b5f35045c/p/invocation-report-a3a1d27edb703e5c",
"@type": "CreativeWork",
"encodingFormat": "text/html",
"datePublished": "2021-11-18T02:02:00Z",
"name": "Workflow Execution Summary of Hello World"
}
]
}
Adding engine-specific traces
Some engines are able to generate contextual information about workflow runs in the form of logs, reports, etc. These are not workflow outputs, but rather additional files automatically generated by the engine, either by default or when activated via a configuration parameter or command line flag. It is RECOMMENDED to add any such files to the RO-Crate; the corresponding entities SHOULD refer to the relevant Action
instance via about:
{
"@id": "#action-1",
"@type": "CreateAction",
...
},
{
"@id": "trace-20230120-40360336.txt",
"@type": "File",
"name": "Nextflow trace for action-1",
"conformsTo": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
"encodingFormat": "text/tab-separated-values",
"about": "#action-1"
},
{
"@id": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
"@type": "CreativeWork",
"name": "Nextflow trace report CSV profile"
}
Environment variables as formal parameters
The Process Run Crate profile specifies how to represent environment variable settings that affected the execution of a particular action via environment
. A workflow, in turn, MAY indicate that it is affected by a certain environment variable by using the same environment
property and having it point to a FormalParameter
whose name
is equal to the variable’s name. If an action corresponding to an execution of the workflow sets that variable, the PropertyValue
SHOULD point to the FormalParameter
via exampleOfWork
:
{
"@id": "run_blast.cwl",
"@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
...
"environment": [
{"@id": "run_blast.cwl#batch_size"}
]
},
{
"@id": "run_blast.cwl#batch_size",
"@type": "FormalParameter",
"additionalType": "Integer",
"name": "BATCH_SIZE",
},
{
"@id": "#cb04c897-eb92-4c53-8a38-bcc1a16fd650",
"@type": "CreateAction",
"instrument": {"@id": "run_blast.cwl"},
...
"environment": [
{"@id": "#batch_size-pv"}
]
},
{
"@id": "#batch_size-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "run_blast.cwl#batch_size"},
"name": "BATCH_SIZE",
"value": "100"
}
Requirements
This profile inherits the requirements of Process Run Crate and Workflow RO-Crate. Additional specifications are listed below.
Property | Required? | Description |
Dataset (the root data entity, e.g. "@id": "./" ) |
||
---|---|---|
conformsTo | MUST | Array MUST reference a CreativeWork entity with an @id URI that is consistent with the versioned Permalink of this document, and SHOULD also reference versioned permalinks for Process Run Crate and Workflow RO-Crate. |
PropertyValue or data entity that realizes a FormalParameter | ||
exampleOfWork | SHOULD | Identifier of the FormalParameter instance realized by this entity. |
FormalParameter | ||
name | SHOULD | SHOULD match the name of the corresponding workflow parameter slot, e.g. n_lines |
description | MAY | A description of the parameter's purpose, e.g. Number of lines |
workExample | MAY | Identifier of the data entity or PropertyValue instance that realizes this parameter. The data entity or PropertyValue instance SHOULD refer to this parameter via exampleOfWork. |
additionalType | MUST | SHOULD include: File , Dataset or Collection if it maps to a file, directory or multi-file dataset, respectively; PropertyValue if it maps to a dictionary-like structured value (e.g. a CWL record); DataType or one of its subtypes (e.g. Integer) if it maps to a non-structured value. |