View on GitHub

Workflow Run RO-Crate

RO-Crate profiles to capture the provenance of workflow runs

Previous requirement gathering exercises

Key concept

Competency Questions / User stories

id CQ description Existing/new terms Rationale Profile1 Issue #
CQ1 What container images (e.g., Docker) were used by the run? Overload image? The type of the target entity can be File if the image is a tarball from docker save To archive images before they disappear so workflow can run later in time 1, 3 9
CQ2 How much memory/cpu/disk was used in run? memory, disk, cpu, architecture, gpu (possibly memoryRequirements storageRequirements) To find the right hardware for running workflow 1, 2, 3 10
CQ3 What are the configuration files used in a workflow execution step? ChooseAction? Though maybe the crate generator should just merge the params with the other ones if it can parse the config file. To link to the config file as a black box instead we probably need a new property For reproducibility purposes, the values/settings inside config files can have big impact on output 1, 3 11
CQ4 What is the environment/container file used in a specific workflow execution step? Similar to the configuration file problem. Need env dump support from workflow engine Knowing the environment helps debugging and reproducing the setup 1, 3 12
CQ5 How long does this workflow component take to run? (estimate) totalTime? Allowed on HowTo and HowToDirection but not on HowToStep. Can also get actual duration from endTime - startTime on the action If a workflow step is computationally expensive, I may need to get an estimate for impatient users, or show a warning 1, 3 13
CQ6 How long does this workflow take to run? totalTime. Can also get actual duration from endTime - startTime on the action Same as CQ5, but with the full workflow 2, 3 14
CQ7 Was the execution successful? actionStatus to FailedActionStatus or CompletedActionStatus - can also provide error Needed to know whether or not retrieve the results 1, 2, 3 15
CQ8 What are the inputs and outputs of the overall workflow (I don’t care about the intermediate results) object and result on the workflow run action High level representation of the workflow execution 2, 3 16
CQ9 What is the source code version of the component executed in a workflow step? Is it a script? and executable? softwareVersion, though getting the version of the actual tool (e.g., grep) that was called by the wrapper might not be easy Knowing which release/software version was used (reproducibility) 1, 3 17
CQ10 What is the script used to wrap up a software component? We’re mapping tool wrappers (e.g., foo.cwl) to SoftwareApplication. Wrappers at lower levels can also be SoftwareApplication, but we need to draw the line somewhere Many executables are complicated, and need an additional script to wrap them up or simplify. For example a “run.sh” script that exposes a simpler set of parameters and fixes another set. 3 18
CQ11 How were workflow parameters used in tool runs? We’re linking tool params directly (with connectedTo), but that’s inaccurate since those links only exist within a workflow. Knowing how workflow parameters were passed to individual tools to find out how they affected the outputs 3 25