View on GitHub

Workflow Run RO-Crate

RO-Crate profiles to capture the provenance of workflow runs

Provenance Run Crate

This profile uses terminology from the RO-Crate 1.1 specification, and extends it with additional terms from the workflow-run ro-terms namespace.

Overview

This profile extends Workflow Run Crate with specifications to describe internal details of the workflow run, such as step executions and intermediate outputs.

A Provenance Run Crate MUST record the details of tool executions orchestrated by the workflow through additional CreateAction entities, each of which MUST refer to an entity representing the tool itself via instrument as specified in Process Run Crate. Entities representing the tools MAY reference formal parameter definitions via input and output (and environment, in the case of environment variables) as specified in Workflow Run Crate. The workflow MUST refer to the orchestrated tools via hasPart (the usage of hasPart for this purpose follows the Bioschemas ComputationalWorkflow profile).

The crate SHOULD also record step executions via ControlAction instances, each of which MUST reference: a HowToStep instance representing the step via instrument; the CreateAction representing the corresponding tool run via object. The workflow MUST reference any HowToStep instances that represent its steps via step. Each HowToStep instance MUST reference the entity that represents its corresponding tool via workExample, and MAY indicate its position in the execution order via position. In addition to File, SoftwareSourceCode and ComputationalWorkflow, a workflow that points to step metadata via step MUST have a type of HowTo.

The crate MAY also include an OrganizeAction representing the execution of the workflow engine (e.g. cwltool), which MUST point to: an entity representing the workflow engine (e.g. a SoftwareApplication) via instrument; the CreateAction that represents the workflow run via result; the ControlAction instances representing the step executions via object.

The tool that implements a step can in turn be a workflow (nested workflow or subworkflow): in this case, it MUST be represented as a ComputationalWorkflow, and all of the above directions apply to it recursively. If the subworkflow is described in a section of the main workflow (e.g. as in packed CWL workflows), rather than in a file of its own, it SHOULD be added to the crate as a contextual entity: in this case, its type list MUST NOT include File.

The following diagram shows the relationships between all provenance-related entities. Note the distinction between prospective provenance (plans for activities, e.g. a workflow) and retrospective provenance (what actually happened, e.g. the execution of a workflow).

Entity-relationship diagram

Example Metadata File (ro-crate-metadata.json)

{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
        "@id": "ro-crate-metadata.json",
        "@type": "CreativeWork",
        "about": {"@id": "./"},
        "conformsTo": [
            {"@id": "https://w3id.org/ro/crate/1.1"},
            {"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
        ]
    },
    {
        "@id": "./",
        "@type": "Dataset",
        "conformsTo": [
            {"@id": "https://w3id.org/ro/wfrun/process/0.1"},
            {"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
            {"@id": "https://w3id.org/ro/wfrun/provenance/0.1"},
            {"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
        ],
        "hasPart": [
            {"@id": "packed.cwl"},
            {"@id": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"},
            {"@id": "b9214658cc453331b62c2282b772a5c063dbd284"},
            {"@id": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"}
        ],
        "mainEntity": {"@id": "packed.cwl"},
        "mentions": [
            {"@id": "#4154dad3-00cc-4e35-bb8f-a2de5cd7dc49"}
        ]
    },
    {   "@id": "https://w3id.org/ro/wfrun/process/0.1",
        "@type": "CreativeWork",
        "name": "Process Run Crate",
        "version": "0.1"
    },
    {   "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
        "@type": "CreativeWork",
        "name": "Workflow Run Crate",
        "version": "0.1"
    },
    {   "@id": "https://w3id.org/ro/wfrun/provenance/0.1",
        "@type": "CreativeWork",
        "name": "Provenance Run Crate",
        "version": "0.1"
    },
    {   "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
        "@type": "CreativeWork",
        "name": "Workflow RO-Crate",
        "version": "1.0"
    },
    {
        "@id": "packed.cwl",
        "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow", "HowTo"],
        "hasPart": [
            {"@id": "packed.cwl#revtool.cwl"},
            {"@id": "packed.cwl#sorttool.cwl"}
        ],
        "input": [
            {"@id": "packed.cwl#main/input"},
            {"@id": "packed.cwl#main/reverse_sort"}
        ],
        "name": "packed.cwl",
        "output": [
            {"@id": "packed.cwl#main/output"}
        ],
        "programmingLanguage": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl"},
        "step": [
            {"@id": "packed.cwl#main/rev"},
            {"@id": "packed.cwl#main/sorted"}
        ]
    },
    {
        "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
        "@type": "ComputerLanguage",
        "alternateName": "CWL",
        "identifier": {"@id": "https://w3id.org/cwl/v1.0/"},
        "name": "Common Workflow Language",
        "url": {"@id": "https://www.commonwl.org/"},
        "version": "v1.0"
    },
    {
        "@id": "packed.cwl#main/input",
        "@type": "FormalParameter",
        "additionalType": "File",
        "defaultValue": "file:///home/stain/src/cwltool/tests/wf/hello.txt",
        "encodingFormat": "https://www.iana.org/assignments/media-types/text/plain",
        "name": "main/input"
    },
    {
        "@id": "packed.cwl#main/reverse_sort",
        "@type": "FormalParameter",
        "additionalType": "Boolean",
        "defaultValue": "True",
        "name": "main/reverse_sort"
    },
    {
        "@id": "packed.cwl#main/output",
        "@type": "FormalParameter",
        "additionalType": "File",
        "name": "main/output"
    },
    {
        "@id": "packed.cwl#main/rev",
        "@type": "HowToStep",
        "position": "0",
        "workExample": {"@id": "packed.cwl#revtool.cwl"}
    },
    {
        "@id": "packed.cwl#revtool.cwl",
        "@type": "SoftwareApplication",
        "description": "Reverse each line using the `rev` command",
        "input": [
            {"@id": "packed.cwl#revtool.cwl/input"}
        ],
        "name": "revtool.cwl",
        "output": [
            {"@id": "packed.cwl#revtool.cwl/output"}
        ]
    },
    {
        "@id": "packed.cwl#revtool.cwl/input",
        "@type": "FormalParameter",
        "additionalType": "File",
        "name": "revtool.cwl/input"
    },
    {
        "@id": "packed.cwl#revtool.cwl/output",
        "@type": "FormalParameter",
        "additionalType": "File",
        "name": "revtool.cwl/output"
    },
    {
        "@id": "packed.cwl#main/sorted",
        "@type": "HowToStep",
        "position": "1",
        "workExample": {"@id": "packed.cwl#sorttool.cwl"}
    },
    {
        "@id": "packed.cwl#sorttool.cwl",
        "@type": "SoftwareApplication",
        "description": "Sort lines using the `sort` command",
        "input": [
            {"@id": "packed.cwl#sorttool.cwl/reverse"},
            {"@id": "packed.cwl#sorttool.cwl/input"}
        ],
        "name": "sorttool.cwl",
        "output": [
            {"@id": "packed.cwl#sorttool.cwl/output"}
        ]
    },
    {
        "@id": "packed.cwl#sorttool.cwl/reverse",
        "@type": "FormalParameter",
        "additionalType": "Boolean",
        "name": "sorttool.cwl/reverse"
    },
    {
        "@id": "packed.cwl#sorttool.cwl/input",
        "@type": "FormalParameter",
        "additionalType": "File",
        "name": "sorttool.cwl/input"
    },
    {
        "@id": "packed.cwl#sorttool.cwl/output",
        "@type": "FormalParameter",
        "additionalType": "File",
        "name": "sorttool.cwl/output"
    },
    {
        "@id": "#a73fd902-8d14-48c9-835b-a5ba2f9149fd",
        "@type": "SoftwareApplication",
        "name": "cwltool 1.0.20181012180214"
    },
    {
        "@id": "#d6ab3175-88f5-4b6a-b028-1b13e6d1a158",
        "@type": "OrganizeAction",
        "agent": {"@id": "https://orcid.org/0000-0001-9842-9718"},
        "instrument": {"@id": "#a73fd902-8d14-48c9-835b-a5ba2f9149fd"},
        "name": "Run of cwltool 1.0.20181012180214",
        "object": [
            {"@id": "#4f7f887f-1b9b-4417-9beb-58618a125cc5"},
            {"@id": "#793b3df4-cbb7-4d17-94d4-0edb18566ed3"}
        ],
        "result": {"@id": "#4154dad3-00cc-4e35-bb8f-a2de5cd7dc49"},
        "startTime": "2018-10-25T15:46:35.210973"
    },
    {
        "@id": "https://orcid.org/0000-0001-9842-9718",
        "@type": "Person",
        "name": "Stian Soiland-Reyes"
    },
    {
        "@id": "#4154dad3-00cc-4e35-bb8f-a2de5cd7dc49",
        "@type": "CreateAction",
        "endTime": "2018-10-25T15:46:43.020168",
        "instrument": {"@id": "packed.cwl"},
        "name": "Run of workflow/packed.cwl#main",
        "object": [
            {"@id": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"},
            {"@id": "#pv-main/reverse_sort"}
        ],
        "result": [
            {"@id": "b9214658cc453331b62c2282b772a5c063dbd284"}
        ],
        "startTime": "2018-10-25T15:46:35.211153"
    },
    {
        "@id": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
        "@type": "File",
        "exampleOfWork": [
            {"@id": "packed.cwl#main/input"},
            {"@id": "packed.cwl#revtool.cwl/input"}
        ]
    },
    {
        "@id": "#pv-main/reverse_sort",
        "@type": "PropertyValue",
        "exampleOfWork": {"@id": "packed.cwl#main/reverse_sort"},
        "name": "main/reverse_sort",
        "value": "True"
    },
    {
        "@id": "b9214658cc453331b62c2282b772a5c063dbd284",
        "@type": "File",
        "exampleOfWork": [
            {"@id": "packed.cwl#main/output"},
            {"@id": "packed.cwl#sorttool.cwl/output"}
        ]
    },
    {
        "@id": "#6933cce1-f8f0-4032-8848-e0fc9166e92f",
        "@type": "CreateAction",
        "endTime": "2018-10-25T15:46:36.967359",
        "instrument": {"@id": "packed.cwl#revtool.cwl"},
        "name": "Run of workflow/packed.cwl#main/rev",
        "object": [
            {"@id": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"}
        ],
        "result": [
            {"@id": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"}
        ],
        "startTime": "2018-10-25T15:46:35.314101"
    },
    {
        "@id": "#4f7f887f-1b9b-4417-9beb-58618a125cc5",
        "@type": "ControlAction",
        "instrument": {"@id": "packed.cwl#main/rev"},
        "name": "orchestrate revtool.cwl",
        "object": {"@id": "#6933cce1-f8f0-4032-8848-e0fc9166e92f"}
    },
    {
        "@id": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
        "@type": "File",
        "exampleOfWork": [
            {"@id": "packed.cwl#revtool.cwl/output"},
            {"@id": "packed.cwl#sorttool.cwl/input"}
        ]
    },
    {
        "@id": "#9eac64b2-c2c8-401f-9af8-7cfb0e998107",
        "@type": "CreateAction",
        "endTime": "2018-10-25T15:46:38.069110",
        "instrument": {"@id": "packed.cwl#sorttool.cwl"},
        "name": "Run of workflow/packed.cwl#main/sorted",
        "object": [
            {"@id": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"},
            {"@id": "#pv-main/sorted/reverse"}
        ],
        "result": [
            {"@id": "b9214658cc453331b62c2282b772a5c063dbd284"}
        ],
        "startTime": "2018-10-25T15:46:36.975235"
    },
    {
        "@id": "#793b3df4-cbb7-4d17-94d4-0edb18566ed3",
        "@type": "ControlAction",
        "instrument": {"@id": "packed.cwl#main/sorted"},
        "name": "orchestrate sorttool.cwl",
        "object": {"@id": "#9eac64b2-c2c8-401f-9af8-7cfb0e998107"}
    },
    {
        "@id": "#pv-main/sorted/reverse",
        "@type": "PropertyValue",
        "exampleOfWork": {"@id": "packed.cwl#sorttool.cwl/reverse"},
        "name": "main/sorted/reverse",
        "value": "True"
    }
]
}

Representing parameter connections

In most workflows, the outputs of one or more steps are needed as input for subsequent steps: this creates a connection between the corresponding parameters of the tools that implement those steps. For instance, consider the “revsort” workflow represented in the above example:

revsort workflow diagram

In this workflow, the output of the rev step is used as input by the sorted step, creating a connection between the output parameter of revtool.cwl and the input parameter of sorttool.cwl. A connection can also occur between tool parameters and workflow parameters: looking again at the above example, the reverse_sort workflow parameter is connected to the reverse parameter of sorttool.cwl.

A provenance run crate MAY describe parameter connections using the ParameterConnection type from the workflow-run ro-terms namespace. References to the ParameterConnection instances SHOULD follow the CWL convention, where connections to workflow output parameters are referenced by the workflow while other connections are referenced by the receiving step:

{
    "@id": "packed.cwl",
    "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow", "HowTo"],
    "connection": [
        {"@id": "#150ffba3-9dc2-4b14-8a6b-3f826f70e41b"}
    ],
    ...
},
{
    "@id": "#150ffba3-9dc2-4b14-8a6b-3f826f70e41b",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#sorttool.cwl/output"},
    "targetParameter": {"@id": "packed.cwl#main/output"}
},
{
    "@id": "packed.cwl#main/sorted",
    "@type": "HowToStep",
    "connection": [
        {"@id": "#548ab27a-3abf-4035-b3dd-f2989762d5c0"},
        {"@id": "#ed883346-fb32-43dd-b965-18aa5cac9350"}
    ],
    "workExample": {"@id": "packed.cwl#sorttool.cwl"}
},
{
    "@id": "#548ab27a-3abf-4035-b3dd-f2989762d5c0",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#revtool.cwl/output"},
    "targetParameter": {"@id": "packed.cwl#sorttool.cwl/input"}
},
{
    "@id": "#ed883346-fb32-43dd-b965-18aa5cac9350",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#main/reverse_sort"},
    "targetParameter": {"@id": "packed.cwl#sorttool.cwl/reverse"}
}

Note that the workflow-run terms are not part of the standard RO-Crate context, so they have to be added to the crate’s @context to be used:

{
    "@context": [
        "https://w3id.org/ro/crate/1.1/context",
        "https://w3id.org/ro/terms/workflow-run"
    ],
    "@graph": [...]
}

Engine configuration files

A workflow engine may support configuration through a configuration file. In this case, the specific configuration file used in the engine run SHOULD be added to the object attribute of the corresponding OrganizeAction.

{
    "@id": "#e55c4723-7814-4cef-b3b6-96c1dbf1ae9b",
    "@type": "SoftwareApplication",
    "name": "StreamFlow 0.2.0.dev2"
},
{
    "@id": "#7ff2f0b6-0294-4da5-9ecc-5846b8aa4e66",
    "@type": "OrganizeAction",
    "instrument": {"@id": "#e55c4723-7814-4cef-b3b6-96c1dbf1ae9b"},
    "name": "Run of StreamFlow 0.2.0.dev2",
    "object": [
        {"@id": "7ff2f0b6-0294-4da5-9ecc-5846b8aa4e66/streamflow.yml"},
        {"@id": "#a203c665-668d-4488-bf57-5b2eedf77905"},
        ...
    ],
    "result": {"@id": "#9984d778-7cd8-49ea-984d-7c58a0404f85"}
},
{
    "@id": "7ff2f0b6-0294-4da5-9ecc-5846b8aa4e66/streamflow.yml",
    "@type": "File",
    "name": "StreamFlow configuration file",
    "encodingFormat": "application/yaml"
},
{
    "@id": "#a203c665-668d-4488-bf57-5b2eedf77905",
    "@type": "ControlAction",
    "instrument": {"@id": "predictions.cwl#extract-tissue-low"},
    "object": {"@id": "#465dafb2-66cf-4af9-a6cf-b8fea0e8acc9"}
},
...

See also the section on referencing configuration files of executed tools.

Tool wrapper dependencies

In some workflow systems (e.g., CWL, Galaxy), tools are typically wrappers for an executable written in a scripting programming language. This MAY be represented by listing the wrapped tool and its dependencies as described in Specifying software dependencies. The wrapped tool can be highlighted by using mainEntity:

{
    "@id": "data_analysis_tool.cwl",
    "@type": "SoftwareApplication",
    "softwareRequirements": [
        {"@id": "scripts/data_analysis_script.py"},
        {"@id": "https://pypi.org/project/numpy/1.26.2/"}
    ],
    "mainEntity": {"@id": "scripts/data_analysis_script.py"}
},
{
    "@id": "scripts/data_analysis_script.py",
    "@type": "SoftwareApplication",
    "version": "0.1"
},
{
    "@id": "https://pypi.org/project/numpy/1.26.2/",
    "@type": "SoftwareApplication",
    "name": "NumPy",
    "version": "1.26.2"
}

Requirements

The requirements of this profile are those of Workflow Run Crate plus the ones listed below.

</table>
Property Required? Description
Dataset (the root data entity, e.g. "@id": "./")
conformsTo MUST Array MUST reference a CreativeWork entity with an @id URI that is consistent with the versioned Permalink of this document, and SHOULD also reference versioned permalinks for Process Run Crate, Workflow Run Crate and Workflow RO-Crate.
ComputationalWorkflow
@type MUST MUST include File, SoftwareSourceCode and ComputationalWorkflow. If the step property is used, MUST also include HowTo. In the case of a subworkflow added to the crate as a contextual entity, MUST NOT include File.
hasPart MUST Identifiers of the tools (including subworkflows) orchestrated by this workflow, represented as specified in the Process Run Crate requirements under "SoftwareApplication". The referenced tools MAY also incude formal parameter definitions via input and output as specified in Workflow Run Crate. In the case of subworkflows, the type MUST include ComputationalWorkflow
step SHOULD Identifiers of the HowToStep instances representing this workflow's steps. If this property is used, the workflow MUST include HowTo among its types.
HowToStep
workExample MUST Identifier of the tool (or subworkflow) that implements this step.
position MAY An integer indicating the step's position in the execution order. In general, there may be more than one valid execution order for a workflow. For instance, if step C needs outputs from steps A and B, but A and B don't need each other's output, both A-B-C and B-A-C are valid execution orders (A and B might even be executed in parallel). For this reason, the only requirement is that for each step pair (S1, S2), if S2 needs outputs from S1, S2's position MUST be greater than S1's.
ControlAction
instrument MUST Identifier of the HowToStep whose execution is represented by this action.
object MUST Identifier(s) of the CreateAction describing the tool execution(s) corresponding to this step execution.
actionStatus MAY SHOULD be CompletedActionStatus if the step completed successfully or FailedActionStatus if it failed to complete. In the latter case, consumers should be prepared for the absence of any dependent actions (i.e., CreateAction instances corresponding to tool executions; the opposite is not necessarily true: a step can be successful even if some of its associated tool executions failed, e.g. in fault tolerant engines). If this attribute is not specified, consumers should assume that the step completed successfully.
error MAY Additional information on the cause of the failure, if available. SHOULD NOT be specified unless actionStatus is set to FailedActionStatus.
OrganizeAction
instrument MUST Identifier of the entity (e.g. a SoftwareApplication) that represents the workflow engine (e.g. cwltool).
object MUST Identifiers of the ControlAction instances representing the step executions.
result MUST Identifier of the CreateAction representing the workflow execution.
actionStatus MAY SHOULD be CompletedActionStatus if the engine execution was successful or FailedActionStatus if it failed. In the latter case, consumers should be prepared for the absence of any dependent actions (i.e., CreateAction instances corresponding to workflow and tool executions, ControlAction instances corresponding to step executions). If this attribute is not specified, consumers should assume that the execution was successful.
error MAY Additional information on the cause of the failure, if available. SHOULD NOT be specified unless actionStatus is set to FailedActionStatus.