proceed.model.Pipeline¶
- class proceed.model.Pipeline(version='v0.0.6', description=None, args=<factory>, prototype=None, steps=<factory>)¶
Specifies top-level pipeline configuration and processing steps.
Most
Pipeline
attributes are optional, butsteps
is required in order to actually run anything.- Parameters:
- version: str = 'v0.0.6'¶
Which version of the Proceed
Pipeline
specification, itself.You don’t need to set the
version
. It might be used by Proceed iteslf to check for version compatibility.
- description: str = None¶
Any description to save along with the pipeline.
The pipeline description is not used during execution. It’s provided as a convenience to support user documentation, notes-to-self, audits, etc.
Unlike code comments or YAML comments, the description is saved as part of the
ExecutionRecord
.
- args: dict[str, str]¶
Expected arguments that apply to the pipeline specification at runtime.
This is a key-value mapping from arg names to default values. The keys and values are both strings.
Pipeline
args
allow for controlled, explicit configuration of the pipeline at runtime. This is intended to make pipelines somewhat dynamic and portable without losing track of the dynamic values that were actually used.Before pipeline execution, given and default args values will be merged then applied to the pipeline’s
prototype
andsteps
. This means occurrences of arg names prefixed with$
or surrounded with${ ... }
will be replaced with the corresponding arg value (see YAML examples below).After execution, the arg values that were used as well as the amended
prototype
andsteps
will be saved in theExecutionRecord
. This should reduce guesswork about what was actually executed.Here are two examples of how you might use
args
:- Host-specific
data_dir
Your laptop might have a folder with data files in it, and your collaborators might have similar folders on their laptops. You could write a pipeline that expects a
data_dir
arg, allowing the host-specificdata_dir
to be supplied at runtime. This way everyone could use the exact same pipeline specification.- Daily
data_file
You might have a pipeline that runs once a day, to process that day’s data. You could write a pipeline that expects a
data_file
arg, allowing the name of each day’s data file to be supplied at runtime. The same exact pipeline specification could be reused each day.
Here’s an example YAML pipeline spec that declares two expected
args
and a step that makes use of the args as$data_dir
(prefix style) and${data_file}
(surrounding style).args: data_dir: /default/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: $data_dir/${data_file}.txt"]
Here’s an example YAML
ExecutionRecord
for that same pipeline. In this example, let’s assume that a customdata_dir
was supplied at runtime asdata_dir=/custom/data_dir
. Let’s assume thedata_file
was left with its default value.original: args: data_dir: /default/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: $data_dir/${data_file}.txt"] amended: args: data_dir: /custom/data_dir data_file: default_file steps: - name: args example image: ubuntu command: ["echo", "Working on: /custom/data_dir/default_file.txt"]
The
original
is just the same as the original pipeline spec.The
amended
args
are all the exact values used at execution time, whether custom or default. Theamended
steps
have had$
and${ ... }
placeholders replaced by concrete values. It’s theamended
steps that are actually executed.- Host-specific
- prototype: Step = None¶
A
Step
or partialStep
with attributes that apply to allsteps
in the pipeline.The
prototype
can have the same attributes as anyStep
. You can use prototype attributes to “factor out” attribute values that pipelinesteps
have in common.Before pipeline execution, attributes provided for the
prototype
will be applied to each step and used if the step doesn’t already have its own value for the same attribute.After execution, the amended steps will be saved in the
ExecutionRecord
. This should reduce guesswork about what was actually executed.Here’s an example YAML pipeline spec with a
prototype
that specifies theStep.image
once, to be used by all steps.prototype: image: ubuntu steps: - name: prototype example one command: ["echo", "one"] - name: prototype example two command: ["echo", "two"]
Here’s an example YAML
ExecutionRecord
for that same pipeline.original: prototype: image: ubuntu steps: - name: prototype example one command: ["echo", "one"] - name: prototype example two command: ["echo", "two"] amended: prototype: image: ubuntu steps: - name: prototype example one image: ubuntu command: ["echo", "one"] - name: prototype example two image: ubuntu command: ["echo", "two"]
The
original
is just the same as the original pipeline spec.The
amended
steps
have theirStep.image
filled in from theprototype
. It’s theamended
steps that are actually executed.
- steps: list[Step]¶
The list of
Step
to execute.The
steps
are the most important part of a pipeline! These determine what will actually be executed.Before pipeline execution, all
steps
will be amended withargs
and theprototype
.Execution will proceed by runnung each step in the order given. A step might be skipped, based on
Step.match_done
. If a step stops with a nonzeroStepResult.exit_code
, the pipeline execution will stop at that point.After execution, the
ExecutionRecord
will contain a list ofStepResult
, one for each of thesteps
executed.