Tutorial: Hello World¶
Here is a “hello world” example for Proceed. It will show you how to use YAML to declare a pipeline that has one step. This should give you a feel for the syntax of Proceed’s input pipeline spec and output execution record.
pipeline spec¶
Create a new file called hello.yaml
with the following content:
steps:
- name: hello world
image: ubuntu
command: echo "hello world"
This is about as simple as a Proceed pipeline can get. It has a single step that prints “hello world” in an Ubuntu container.
The Proceed API docs have more details about what can go in the proceed.model.Pipeline
spec.
pipeline execution¶
Execute the pipeline using the proceed
command:
$ proceed run hello.yaml
Proceed logs to stdout what it intends to do next, what happened, and when. If all goes well you won’t need to know all of that. But if something unexpected happens or if you are revisiting things at a later time, this level of detail should help.
2023-03-21 14:48:56,598 [INFO] Proceed 0.0.1
2023-03-21 14:48:56,598 [INFO] Using output directory: proceed_out/hello_world/20230321T184856UTC
2023-03-21 14:48:56,598 [INFO] Parsing pipeline specification from: hello_world.yaml
2023-03-21 14:48:56,600 [INFO] Running pipeline with args: {}
2023-03-21 14:48:56,600 [INFO] Starting pipeline run.
2023-03-21 14:48:56,600 [INFO] Step 'step one': starting.
2023-03-21 14:48:56,600 [INFO] Step 'step one': found 0 input files.
2023-03-21 14:48:57,129 [INFO] Step 'step one': waiting for process to complete.
2023-03-21 14:48:57,137 [INFO] Step 'step one': hello world
2023-03-21 14:48:57,417 [INFO] Step 'step one': process completed with exit code 0
2023-03-21 14:48:57,441 [INFO] Step 'step one': found 0 output files.
2023-03-21 14:48:57,441 [INFO] Step 'step one': finished.
2023-03-21 14:48:57,451 [INFO] Finished pipeline run.
2023-03-21 14:48:57,451 [INFO] Writing execution record to: proceed_out/hello_world/20230321T184856UTC/execution_record.yaml
2023-03-21 14:48:57,456 [INFO] Completed 1 steps successfully.
2023-03-21 14:48:57,457 [INFO] OK.
In this simple example, the key result is the “hello world” part in the middle:
2023-03-21 14:48:57,137 [INFO] Step 'step one': hello world
auditable outputs¶
In addition to the main log, Proceed writes several files into a working subdirectory. These are indended to capture exactly what happened and to make the pipeline execution auditable.
proceed_out/
│
├─ hello_world/
│ │
│ ├─ 20230321T184856UTC/
│ │ │
│ │ ├─ proceed.log
│ │ ├─ step_one.log
│ │ ├─ execution_record.yaml
The subdirectories are named like this:
proceed_out/
│
├─ <name of the pipeline file>/
│ │
│ ├─ <execution datetime>/
│ │ │
│ │ ├─ *.log
│ │ ├─ execution_record.yaml
This default scheme should keep the outputs reasonably organized and should prevent collisions between executions.
You can customize the output scheme if you want, see proceed --help
for the options --results-dir
, --results-group
, and --results-id
.
proceed.log¶
As shown above Proceed writes its runtime log to stdout.
It also writes a copy of the same log to the working subdirectory in proceed.log
.
$ cat proceed_out/hello_world/20230321T184856UTC/proceed.log
2023-03-21 11:35:44,951 [INFO] Proceed 0.0.1
# ... a copy of the console log above ...
2023-03-21 11:35:45,815 [INFO] OK.
step logs¶
Proceed also writes the runtime log of each step to its own, separate file.
This includes the stdout and stderr of the step’s container process.
You can see the same output copied into the main proceed.log
.
But the individual step logs are focused on their own steps and omit prefixes like [INFO]
.
$ cat proceed_out/hello_world/20230321T184856UTC/step_one.log
hello world
execution record¶
In addition to these log files, Proceed saves an execution record for each run. This is an auditable record of facts like:
the pipeline spec that was used
results for each step like image id, exit code, timing, and checksums of input and ouput files
overall timing
$ cat proceed_out/hello_world/20230321T184856UTC/execution_record.yaml
original:
version: 0.0.1
steps:
- {name: step one, image: ubuntu, command: echo "hello world"}
amended:
version: 0.0.1
steps:
- {name: step one, image: ubuntu, command: echo "hello world"}
timing: {start: '2023-03-21T18:48:56.600323+00:00', finish: '2023-03-21T18:48:57.451028+00:00', duration: 0.850705}
step_results:
- name: step one
image_id: sha256:08d22c0ceb150ddeb2237c5fa3129c0183f3cc6f5eeb2e7aa4016da3ad02140a
exit_code: 0
log_file: proceed_out/hello_world/20230321T184856UTC/step_one.log
timing: {start: '2023-03-21T18:48:56.600597+00:00', finish: '2023-03-21T18:48:57.441764+00:00', duration: 0.841167}
skipped: false
Here is some explanation of this “hello world” execution record.
original
This is the input pipeline spec, as parsed from
hello_world.yaml
. The YAML formatting may differ somewhat from the input spec, but the content will be equivalent.amended
This is a version of the original, potentially altered at runtime by
proceed.model.Pipeline.args
and aproceed.model.Pipeline.prototype
Theamended
version is what actually gets executed, so it’s worth recording this explicitly. In this example, theoriginal
andamended
versions are the same.timing
This records UTC datetimes when the pipeline started and finished, and the duration in seconds.
step_results
This is a list of
proceed.model.StepResult
, one for each of the inputproceed.model.Pipeline.steps
. These step results will contain many of the interesting, auditable facts like unique image id, process exit code, and checksums of input and ouput files. See the linked API docs for more details.
This example is about as simple as a Proceed execution record gets.
The API docs for proceed.model.ExecutionRecord
lead to more examples of what can be included.