proceed.model.Pipeline¶
- class proceed.model.Pipeline(version='v0.0.11', description=None, args=<factory>, prototype=None, steps=<factory>)¶
Specifies top-level pipeline configuration and processing steps.
Most
Pipelineattributes are optional, butstepsis required in order to actually run anything.- Parameters:
- version: str = 'v0.0.11'¶
Which version of the Proceed
Pipelinespecification, itself.You don’t need to set the
version. It might be used by Proceed iteslf to check for version compatibility.
- description: str = None¶
Any description to save along with the pipeline.
The pipeline description is not used during execution. It’s provided as a convenience to support user documentation, notes-to-self, audits, etc.
Unlike code comments or YAML comments, the description is saved as part of the
ExecutionRecord.
- args: dict[str, str]¶
Expected arguments that apply to the pipeline specification at runtime.
This is a key-value mapping from arg names to default values. The keys and values are both strings.
Pipeline
argsallow for controlled, explicit configuration of the pipeline at runtime. This is intended to make pipelines somewhat dynamic and portable without losing track of the dynamic values that were actually used.Before pipeline execution, given and default args values will be merged then applied to the pipeline’s
prototypeandsteps. This means occurrences of arg names prefixed with$or surrounded with${ ... }will be replaced with the corresponding arg value (see YAML examples below).After execution, the arg values that were used as well as the amended
prototypeandstepswill be saved in theExecutionRecord. This should reduce guesswork about what was actually executed.Here are two examples of how you might use
args:- Host-specific
data_dir Your laptop might have a folder with data files in it, and your collaborators might have similar folders on their laptops. You could write a pipeline that expects a
data_dirarg, allowing the host-specificdata_dirto be supplied at runtime. This way everyone could use the exact same pipeline specification.- Daily
data_file You might have a pipeline that runs once a day, to process that day’s data. You could write a pipeline that expects a
data_filearg, allowing the name of each day’s data file to be supplied at runtime. The same exact pipeline specification could be reused each day.
Here’s an example YAML pipeline spec that declares two expected
argsand a step that makes use of the args as$data_dir(prefix style) and${data_file}(surrounding style).args: data_dir: /default/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: $data_dir/${data_file}.txt"]
Here’s an example YAML
ExecutionRecordfor that same pipeline. In this example, let’s assume that a customdata_dirwas supplied at runtime asdata_dir=/custom/data_dir. Let’s assume thedata_filewas left with its default value.original: args: data_dir: /default/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: $data_dir/${data_file}.txt"] amended: args: data_dir: /custom/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: /custom/data_dir/default_file.txt"]
The
originalis just the same as the original pipeline spec.The
amendedargsare all the exact values used at execution time, whether custom or default. Theamendedstepshave had$and${ ... }placeholders replaced by concrete values. It’s theamendedsteps that are actually executed.- Host-specific
- prototype: Step = None¶
A
Stepor partialStepwith attributes that apply to allstepsin the pipeline.The
prototypecan have the same attributes as anyStep. You can use prototype attributes to “factor out” attribute values that pipelinestepshave in common.Before pipeline execution, attributes provided for the
prototypewill be applied to each step and used if the step doesn’t already have its own value for the same attribute.After execution, the amended steps will be saved in the
ExecutionRecord. This should reduce guesswork about what was actually executed.Here’s an example YAML pipeline spec with a
prototypethat specifies theStep.imageonce, to be used by all steps.prototype: image: ubuntu steps: - name: prototype example one command: ["echo", "one"] - name: prototype example two command: ["echo", "two"]
Here’s an example YAML
ExecutionRecordfor that same pipeline.original: prototype: image: ubuntu steps: - name: prototype example one command: ["echo", "one"] - name: prototype example two command: ["echo", "two"] amended: prototype: image: ubuntu steps: - name: prototype example one image: ubuntu command: ["echo", "one"] - name: prototype example two image: ubuntu command: ["echo", "two"]
The
originalis just the same as the original pipeline spec.The
amendedstepshave theirStep.imagefilled in from theprototype. It’s theamendedsteps that are actually executed.
- steps: list[Step]¶
The list of
Stepto execute.The
stepsare the most important part of a pipeline! These determine what will actually be executed.Before pipeline execution, all
stepswill be amended withargsand theprototype.Execution will proceed by runnung each step in the order given. A step might be skipped, based on
Step.match_done. If a step stops with a nonzeroStepResult.exit_code, the pipeline execution will stop at that point.After execution, the
ExecutionRecordwill contain a list ofStepResult, one for each of thestepsexecuted.