View on GitHub

Workflow Run RO-Crate

RO-Crate profiles to capture the provenance of workflow runs

Previous requirement gathering exercises

WfExS possible RO-Crate profiles
2021-02 Workflow run RO-crate draft
2020-10-28 Workflow Run RO-Crate discussion
Profile for recording workflow runs (conceptual ideas from RO-Crate paper)

Key concept

Extend, nest or reference a Workflow Crate for the workflow that has been executed
Use Provenance of software run to detail that a workflow run has occurred
Recommend CWLProv-like structure of RO-Crate folders for inputs/outputs/intermediate?
Optional detailed workflow run provenance in separate PROV files

Competency Questions / User stories

id	CQ description	Existing/new terms	Rationale	Profile¹	Issue #
CQ1	What container images (e.g., Docker) were used by the run?	Overload image? The type of the target entity can be `File` if the image is a tarball from `docker save`	To archive images before they disappear so workflow can run later in time	1, 3	9
CQ2	How much memory/cpu/disk was used in run?	memory, disk, cpu, architecture, gpu (possibly memoryRequirements storageRequirements)	To find the right hardware for running workflow	1, 2, 3	10
CQ3	What are the configuration files used in a workflow execution step?	ChooseAction? Though maybe the crate generator should just merge the params with the other ones if it can parse the config file. To link to the config file as a black box instead we probably need a new property	For reproducibility purposes, the values/settings inside config files can have big impact on output	1, 3	11
CQ4	What is the environment/container file used in a specific workflow execution step?	Similar to the configuration file problem. Need env dump support from workflow engine	Knowing the environment helps debugging and reproducing the setup	1, 3	12
CQ5	How long does this workflow component take to run?	totalTime? Allowed on HowTo and HowToDirection but not on HowToStep. Can also get actual duration from endTime - startTime on the action	If a workflow step is computationally expensive, I may need to get an estimate for impatient users, or show a warning	1, 3	13
CQ6	How long does this workflow take to run?	totalTime. Can also get actual duration from endTime - startTime on the action	Same as CQ5, but with the full workflow	2, 3	14
CQ7	Was the execution successful?	actionStatus to FailedActionStatus or CompletedActionStatus - can also provide error	Needed to know whether or not retrieve the results	1, 2, 3	15
CQ8	What are the inputs and outputs of the overall workflow?	object and result on the workflow run action	High level representation of the workflow execution	2, 3	16
CQ9	What is the source code version of the component executed in a workflow step?	softwareVersion, though getting the version of the actual tool (e.g., `grep`) that was called by the wrapper might not be easy	Knowing which release/software version was used (reproducibility)	1, 3	17
CQ10	What is the script used to wrap up a software component?	We’re mapping tool wrappers (e.g., `foo.cwl`) to SoftwareApplication. Wrappers at lower levels can also be `SoftwareApplication`, but we need to draw the line somewhere	Many executables are complicated, and need an additional script to wrap them up or simplify. For example a “run.sh” script that exposes a simpler set of parameters and fixes another set.	3	18
CQ11	How were workflow parameters used in tool runs?	We’re linking tool params directly (with connectedTo), but that’s inaccurate since those links only exist within a workflow.	Knowing how workflow parameters were passed to individual tools to find out how they affected the outputs	3	25

NOTE: SPARQL queries corresponding to the Competency Questions are available in the sparql directory.

1: Process Run Crate; 2: Workflow Run Crate; 3: Provenance Run Crate. ↩