proceed.model.Pipeline

class proceed.model.Pipeline(version='v0.0.6', description=None, args=<factory>, prototype=None, steps=<factory>)

Specifies top-level pipeline configuration and processing steps.

Most Pipeline attributes are optional, but steps is required in order to actually run anything.

Parameters:
  • version (str)

  • description (str)

  • args (dict[str, str])

  • prototype (Step)

  • steps (list[Step])

version: str = 'v0.0.6'

Which version of the Proceed Pipeline specification, itself.

You don’t need to set the version. It might be used by Proceed iteslf to check for version compatibility.

description: str = None

Any description to save along with the pipeline.

The pipeline description is not used during execution. It’s provided as a convenience to support user documentation, notes-to-self, audits, etc.

Unlike code comments or YAML comments, the description is saved as part of the ExecutionRecord.

args: dict[str, str]

Expected arguments that apply to the pipeline specification at runtime.

This is a key-value mapping from arg names to default values. The keys and values are both strings.

Pipeline args allow for controlled, explicit configuration of the pipeline at runtime. This is intended to make pipelines somewhat dynamic and portable without losing track of the dynamic values that were actually used.

Before pipeline execution, given and default args values will be merged then applied to the pipeline’s prototype and steps. This means occurrences of arg names prefixed with $ or surrounded with ${ ... } will be replaced with the corresponding arg value (see YAML examples below).

After execution, the arg values that were used as well as the amended prototype and steps will be saved in the ExecutionRecord. This should reduce guesswork about what was actually executed.

Here are two examples of how you might use args:

Host-specific data_dir

Your laptop might have a folder with data files in it, and your collaborators might have similar folders on their laptops. You could write a pipeline that expects a data_dir arg, allowing the host-specific data_dir to be supplied at runtime. This way everyone could use the exact same pipeline specification.

Daily data_file

You might have a pipeline that runs once a day, to process that day’s data. You could write a pipeline that expects a data_file arg, allowing the name of each day’s data file to be supplied at runtime. The same exact pipeline specification could be reused each day.

Here’s an example YAML pipeline spec that declares two expected args and a step that makes use of the args as $data_dir (prefix style) and ${data_file} (surrounding style).

args:
  data_dir: /default/data_dir
  data_file: default_file
steps:
  - name: args example
    image: ubuntu
    command: ["echo", "Working on: $data_dir/${data_file}.txt"]

Here’s an example YAML ExecutionRecord for that same pipeline. In this example, let’s assume that a custom data_dir was supplied at runtime as data_dir=/custom/data_dir. Let’s assume the data_file was left with its default value.

original:
  args:
    data_dir: /default/data_dir
    data_file: default_file
  steps:
    - name: args example
      image: ubuntu
      command: ["echo", "Working on: $data_dir/${data_file}.txt"]
amended:
  args:
    data_dir: /custom/data_dir
    data_file: default_file
  steps:
    - name: args example
      image: ubuntu
      command: ["echo", "Working on: /custom/data_dir/default_file.txt"]

The original is just the same as the original pipeline spec.

The amended args are all the exact values used at execution time, whether custom or default. The amended steps have had $ and ${ ... } placeholders replaced by concrete values. It’s the amended steps that are actually executed.

prototype: Step = None

A Step or partial Step with attributes that apply to all steps in the pipeline.

The prototype can have the same attributes as any Step. You can use prototype attributes to “factor out” attribute values that pipeline steps have in common.

Before pipeline execution, attributes provided for the prototype will be applied to each step and used if the step doesn’t already have its own value for the same attribute.

After execution, the amended steps will be saved in the ExecutionRecord. This should reduce guesswork about what was actually executed.

Here’s an example YAML pipeline spec with a prototype that specifies the Step.image once, to be used by all steps.

prototype:
  image: ubuntu
steps:
  - name: prototype example one
    command: ["echo", "one"]
  - name: prototype example two
    command: ["echo", "two"]

Here’s an example YAML ExecutionRecord for that same pipeline.

original:
  prototype:
    image: ubuntu
  steps:
    - name: prototype example one
      command: ["echo", "one"]
    - name: prototype example two
      command: ["echo", "two"]
amended:
  prototype:
    image: ubuntu
  steps:
    - name: prototype example one
      image: ubuntu
      command: ["echo", "one"]
    - name: prototype example two
      image: ubuntu
      command: ["echo", "two"]

The original is just the same as the original pipeline spec.

The amended steps have their Step.image filled in from the prototype. It’s the amended steps that are actually executed.

steps: list[Step]

The list of Step to execute.

The steps are the most important part of a pipeline! These determine what will actually be executed.

Before pipeline execution, all steps will be amended with args and the prototype.

Execution will proceed by runnung each step in the order given. A step might be skipped, based on Step.match_done. If a step stops with a nonzero StepResult.exit_code, the pipeline execution will stop at that point.

After execution, the ExecutionRecord will contain a list of StepResult, one for each of the steps executed.