proceed.model.Step¶

class proceed.model.Step(name=None, description=None, image=None, command=<factory>, volumes=<factory>, working_dir=None, progress_file=None, match_done=<factory>, match_in=<factory>, match_out=<factory>, match_summary=<factory>, environment=<factory>, gpus=None, user=None, network_mode=None, mac_address=None, shm_size=None, privileged=False, X11=False)¶

Specifies a container-based processing step.

Most Step attributes are optional, but name is required in order to distinguish steps from each other, and image is required in order to actually run anything.

Parameters:

name (str)
description (str)
image (str)
command (list[str])
volumes (dict[str, str | dict[str, str]])
working_dir (str)
progress_file (str)
match_done (list[str])
match_in (list[str])
match_out (list[str])
match_summary (list[str])
environment (dict[str, str])
gpus (str | bool)
user (str | int)
network_mode (str)
mac_address (str)
shm_size (str)
privileged (bool)
X11 (bool)

name: str = None¶: Any name for the step, unique within a Pipeline (required).

description: str = None¶

Any description to save along with the step.

The step description is not used during pipeline execution. It’s provided as a convenience to support user documentation, notes-to-self, audits, etc.

Unlike code comments or YAML comments, the description is saved as part of the ExecutionRecord.

image: str = None¶

The tag or id of the container image to run from (required).

The image is the most important part of each step! It provides the step’s executables, dependencies, and basic environment.

The image may be a human-readable tag of the form group/name:version (like on Docker Hub) or a unique id (like the IMAGE ID output of docker images).

steps:
  - name: human readable example
    image: mathworks/matlab:r2022b
  - name: image id example
    image: d209dd14c3c4

command: list[str]¶

The command to run inside the container.

The step command is passed to the entrypoint executable of the image. To use the default cmd of the image, omit this command.

The command should be given as a list of string arguments. The list form makes it clear which argument is which and avoids confusion around spaces and quotes.

steps:
  - name: command example
    image: ubuntu
    command: ["echo", "hello world"]

volumes: dict[str, str | dict[str, str]]¶

Host directories to make available inside the step’s container.

This is a key-value mapping from host absolute paths to container absolute paths. The keys are strings (host absolute paths). The values are strings (container absolute paths) or detailed key-value mappings.

steps:
  - name: volumes example
    volumes:
      /host/simple: /simple
      /host/read-only: {bind: /read-only, mode: ro}
      /host/read-write: {bind: /read-write, mode: rw}

The detailed style lets you specify the container path to bind as well as the read/write permissions.

bind: the container absolute path to bind (where the host dir will show up inside the container)
mode: the read/write permission to give the container: rw for read plus write (the default), ro for read only

working_dir: str = None¶: A working directory path within the container – the initial shell pwd or Python os.getcwd().

progress_file: str = None¶

File to create when the step starts, and rename to <progress_file>.done when the step succeeds.

This is an optional marker file that Proceed can use to indicate progress through the step and to decide whether step is already complete and can be skipped.

Step.progress_file should be a file path on the host – unlike Step.match_done, Step.match_in, and Step.match_out, which are patterns to match within Step.volumes.

Proceed will create Step.progress_file when starting to execute a step. If the step completes with a nonzero exit code, Proceed will append an error message to the file. If the step completes with a zero exit code, Proceed will append a success message to the file and rename the file, adding the suffix, .done.

When <progress_file>.done already exists the step will be skipped. This is intended as a convenience to avoid redundant processing. To make a step run unconditionally, omit Step.progress_file and match_done.

For example, say Step.progress_file is given as progress.txt. When beginning the step, Proceed will create progress.txt. When the step ends, Proceed will append a success or error message to progress.txt, and rename the file to progress.txt.done. Next time the step runs, if progress.txt.done still exists, the step will be skipped.

match_done: list[str]¶

File matching patterns to search for, before deciding to run the step.

This is a list of glob patterns to search for before running the step. Each of the step’s volumes will be searched with the same list of patterns.

If any matches are found, these files will be noted in the ExecutionRecord, along with their content digests, and the step will be skipped. This is intended as a convenience to avoid redundant processing. To make a step run unconditionally, omit Step.progress_file and match_done.

steps:
  - name: match done example
    match_done:
      - one/specific/file.txt
      - any/text/*.txt
      - any/text/any/subdir/**/*.txt

match_in: list[str]¶

File matching patterns to search for, before running the step.

This is a list of glob patterns to search for before running the step. Each of the step’s volumes will be searched with the same list of patterns.

Any matches found will be noted in the ExecutionRecord. match_in is intended to support audits by accounting for the input files that went into a step, along with their content digests. Unlike match_done, match_in does not affect step execution.

steps:
  - name: match in example
    match_in:
      - one/specific/file.txt
      - any/text/*.txt
      - any/text/any/subdir/**/*.txt

match_out: list[str]¶

File matching patterns to search for, after running the step.

This is a list of glob patterns to search for after running the step. Each of the step’s volumes will be searched with the same list of patterns.

Any matches found will be noted in the ExecutionRecord. match_out is intended to support audits by accounting for the output files that came from a step, along with their content digests. Unlike match_done, match_out does not affect step execution.

steps:
  - name: match out example
    match_out:
      - one/specific/file.txt
      - any/text/*.txt
      - any/text/any/subdir/**/*.txt

match_summary: list[str]¶

File matching patterns to search for, after running the step, to include when summarizing results.

This is a list of glob patterns to search for after running the step. Each of the step’s volumes will be searched with the same list of patterns.

Any matches found will be noted in the ExecutionRecord. match_summary is intended to enrich pipeline execution summaries with custom columns. See StepResult.files_summary for how matched files are treated. Unlike match_done, match_summary does not affect step execution.

steps:
  - name: match summary example
    match_summary:
      - one/specific/file.txt
      - any/text/*.txt
      - any/text/any/subdir/**/*.txt

environment: dict[str, str]¶

Environment variables to set inside the step’s container.

This is a key-value mapping from environment variable names to values. The keys and values are both strings.

steps:
  - name: environment example
    environment:
      MLM_LICENSE_FILE: /license.lic
      foo: bar

gpus: str | bool = None¶

Whether or not to request GPU device support.

When gpus is True / truthy, request GPU device support similar to the Docker run --gpus all. Note: the empty string "" will be treated as False. resource request.

steps:
  - name: gpus example
    gpus: true

user: str | int = None¶

User (and group) to run as in the container, instead of container default (usually root).

When user is omitted or None the container will run with the default user and group specified in the image. This is usually root, or sometimes an image-specific user and group.

When user is provided it must be a string user name or int uid, with group/gid optional, as follows:

self or self:group: The special user name self means run with the uid of the current user on the Docker host. Optionally, this can be followed by a group name or gid as in self:group. When this group is a string name it must exists on the Docker host and will be converted to a host gid.
user or user:group: Other string user names and group names are used as-is and must exist within the image / container.
uid or uid:gid: Integer uids and gids don’t have to exist within the image / container. It’s proably helpful if they exist on the Docker host.

steps:
  - name: default/root user example
steps:
  - name: host current user and group example
    user: self
steps:
  - name: existing container user example
    user: container-user
steps:
  - name: integer uid and gid example
    user: 1234:5678

network_mode: str = None¶

How to configure the container’s network environment.

When provided, this should be one of the following network modes:

bridge: create an isolated network environment for the container (default)
none: disable networking for the container
container:<name|id>: reuse the network of another container, by name or id
host: make the container’s network environment just like the host’s

mac_address: str = None¶

Aribtrary MAC address to set in the container.

Perhaps surprisingly, containers can have arbitrary MAC “hardware” addresses.

steps:
  - name: mac address example
    mac_address: aa:bb:cc:dd:ee:ff

shm_size: str = None¶

Max size of the /dev/shm shared memory in-memory-file-system.

Docker defaults /dev/shm to 64 megabytes. Steps that need more can use shm_size to increase this limit. Integer values will be treated as bytes, for example 1000. Values with a unit suffix will use larger units, for example 10b, 10k, 10m, or 10g.

steps:
  - name: more-shm
    shm_size: 2g

privileged: bool = False¶

Whether the step’s container should run with elevated privileges and device access.

This defaults to False. Please only set privileged to True temporarily, for troubleshooting!

steps:
  - name: elevated-privileged
    privileged: True

X11: bool = False¶

Whether to set up the container as an X11 client app with DISPLAY access.

This defaults to False, assuming most steps are noninteractive. Set X11 to True to set up the container as an X11 GUI client app with DISPLAY access. This will modify the container environment in a few ways:

DISPLAY: Proceed will set the DISPLAY environment variable in the step container to match the host environment.
/tmp/.X11-unix: If the /tmp/.X11-unix directory exists on the host Proceed will add this to the step’s Step.volumes. This lets the step container access local Unix sockets for connecting to a local X server.
Step.network_mode host: Proceed will set the step’s network mode to host. This lets the step container access TCP sockets for connecting to a remote/proxied X server as with ssh -X or ssh -Y.
XAUTHORITY: Proceed will set up the XAUTHORITY environment variable and .Xauthority cookie file based on the host environment. If the XAUTHORITY variable is set in the host environment Proceed will use this file path to locate the cookie file. Otherwise Proceed will use the default cookie file path which is the current host user’s $HOME/.Xauthority. If the cookie file exists on the host Proceed will add it to the step’s Step.volumes. Proceed will bind the cookie file to a fixed, known path in the container like /var/.Xauthority. Proceed will set the XAUTHORITY environment variable in the container to the same known path. Using a fixed path for the cookie file should avoid any dependency on the container user or HOME configuration (or lack thereof). All of this lets the step container authenticate with a remote/proxied X server as with ssh -X or ssh -Y.

steps:
  - name: x11-gui-client
    X11: True

proceed.model.Step¶

Table of Contents

Previous topic

Next topic

This Page