Previous requirement gathering exercises
- WfExS possible RO-Crate profiles
- 2021-02 Workflow run RO-crate draft
- 2020-10-28 Workflow Run RO-Crate discussion
- Profile for recording workflow runs (conceptual ideas from RO-Crate paper)
Key concept
- Extend, nest or reference a Workflow Crate for the workflow that has been executed
- Use Provenance of software run to detail that a workflow run has occurred
- Recommend CWLProv-like structure of RO-Crate folders for inputs/outputs/intermediate?
- Optional detailed workflow run provenance in separate PROV files
Competency Questions / User stories
id | CQ description | Existing/new terms | Rationale | Profile1 | Issue # |
---|---|---|---|---|---|
CQ1 | What container images (e.g., Docker) were used by the run? | Overload image? The type of the target entity can be File if the image is a tarball from docker save |
To archive images before they disappear so workflow can run later in time | 1, 3 | |
CQ2 | How much memory/cpu/disk was used in run? | memory, disk, cpu, architecture, gpu (possibly memoryRequirements storageRequirements) | To find the right hardware for running workflow | 1, 2, 3 | |
CQ3 | What are the configuration files used in a workflow execution step? | ChooseAction? Though maybe the crate generator should just merge the params with the other ones if it can parse the config file. To link to the config file as a black box instead we probably need a new property | For reproducibility purposes, the values/settings inside config files can have big impact on output | 1, 3 | |
CQ4 | What is the environment/container file used in a specific workflow execution step? | Similar to the configuration file problem. Need env dump support from workflow engine | Knowing the environment helps debugging and reproducing the setup | 1, 3 | 12 |
CQ5 | How long does this workflow component take to run? | totalTime? Allowed on HowTo and HowToDirection but not on HowToStep. Can also get actual duration from endTime - startTime on the action | If a workflow step is computationally expensive, I may need to get an estimate for impatient users, or show a warning | 1, 3 | |
CQ6 | How long does this workflow take to run? | totalTime. Can also get actual duration from endTime - startTime on the action | Same as CQ5, but with the full workflow | 2, 3 | |
CQ7 | Was the execution successful? | actionStatus to FailedActionStatus or CompletedActionStatus - can also provide error | Needed to know whether or not retrieve the results | 1, 2, 3 | |
CQ8 | What are the inputs and outputs of the overall workflow? | object and result on the workflow run action | High level representation of the workflow execution | 2, 3 | |
CQ9 | What is the source code version of the component executed in a workflow step? | softwareVersion, though getting the version of the actual tool (e.g., grep ) that was called by the wrapper might not be easy |
Knowing which release/software version was used (reproducibility) | 1, 3 | |
CQ10 | What is the script used to wrap up a software component? | We’re mapping tool wrappers (e.g., foo.cwl ) to SoftwareApplication. Wrappers at lower levels can also be SoftwareApplication , but we need to draw the line somewhere |
Many executables are complicated, and need an additional script to wrap them up or simplify. For example a “run.sh” script that exposes a simpler set of parameters and fixes another set. | 3 | |
CQ11 | How were workflow parameters used in tool runs? | We’re linking tool params directly (with connectedTo), but that’s inaccurate since those links only exist within a workflow. | Knowing how workflow parameters were passed to individual tools to find out how they affected the outputs | 3 |
NOTE: SPARQL queries corresponding to the Competency Questions are available in the sparql directory.
-
1: Process Run Crate; 2: Workflow Run Crate; 3: Provenance Run Crate. ↩