Tutorial: Hello World ===================== Here is a "hello world" example for Proceed. It will show you how to use `YAML `_ to declare a pipeline that has one step. This should give you a feel for the syntax of Proceed's input pipeline spec and output execution record. pipeline spec ------------- Create a new file called ``hello.yaml`` with the following content: .. code-block:: yaml steps: - name: hello world image: ubuntu command: echo "hello world" This is about as simple as a Proceed pipeline can get. It has a single step that prints "hello world" in an `Ubuntu `_ container. The Proceed API docs have more details about what can go in the :class:`proceed.model.Pipeline` spec. pipeline execution ------------------ Execute the pipeline using the ``proceed`` command: .. code-block:: shell $ proceed run hello.yaml Proceed logs to stdout what it intends to do next, what happened, and when. If all goes well you won't need to know all of that. But if something unexpected happens or if you are revisiting things at a later time, this level of detail should help. .. code-block:: shell 2023-03-21 14:48:56,598 [INFO] Proceed 0.0.1 2023-03-21 14:48:56,598 [INFO] Using output directory: proceed_out/hello_world/20230321T184856UTC 2023-03-21 14:48:56,598 [INFO] Parsing pipeline specification from: hello_world.yaml 2023-03-21 14:48:56,600 [INFO] Running pipeline with args: {} 2023-03-21 14:48:56,600 [INFO] Starting pipeline run. 2023-03-21 14:48:56,600 [INFO] Step 'step one': starting. 2023-03-21 14:48:56,600 [INFO] Step 'step one': found 0 input files. 2023-03-21 14:48:57,129 [INFO] Step 'step one': waiting for process to complete. 2023-03-21 14:48:57,137 [INFO] Step 'step one': hello world 2023-03-21 14:48:57,417 [INFO] Step 'step one': process completed with exit code 0 2023-03-21 14:48:57,441 [INFO] Step 'step one': found 0 output files. 2023-03-21 14:48:57,441 [INFO] Step 'step one': finished. 2023-03-21 14:48:57,451 [INFO] Finished pipeline run. 2023-03-21 14:48:57,451 [INFO] Writing execution record to: proceed_out/hello_world/20230321T184856UTC/execution_record.yaml 2023-03-21 14:48:57,456 [INFO] Completed 1 steps successfully. 2023-03-21 14:48:57,457 [INFO] OK. In this simple example, the key result is the "hello world" part in the middle: .. code-block:: shell 2023-03-21 14:48:57,137 [INFO] Step 'step one': hello world auditable outputs ----------------- In addition to the main log, Proceed writes several files into a working subdirectory. These are indended to capture exactly what happened and to make the pipeline execution auditable. .. code-block:: shell proceed_out/ │ ├─ hello_world/ │ │ │ ├─ 20230321T184856UTC/ │ │ │ │ │ ├─ proceed.log │ │ ├─ step_one.log │ │ ├─ execution_record.yaml The subdirectories are named like this: .. code-block:: shell proceed_out/ │ ├─ / │ │ │ ├─ / │ │ │ │ │ ├─ *.log │ │ ├─ execution_record.yaml This default scheme should keep the outputs reasonably organized and should prevent collisions between executions. You can customize the output scheme if you want, see ``proceed --help`` for the options ``--results-dir``, ``--results-group``, and ``--results-id``. proceed.log ........... As shown above Proceed writes its runtime log to stdout. It also writes a copy of the same log to the working subdirectory in ``proceed.log``. .. code-block:: shell $ cat proceed_out/hello_world/20230321T184856UTC/proceed.log 2023-03-21 11:35:44,951 [INFO] Proceed 0.0.1 # ... a copy of the console log above ... 2023-03-21 11:35:45,815 [INFO] OK. step logs ......... Proceed also writes the runtime log of each step to its own, separate file. This includes the stdout and stderr of the step's container process. You can see the same output copied into the main ``proceed.log``. But the individual step logs are focused on their own steps and omit prefixes like ``[INFO]``. .. code-block:: shell $ cat proceed_out/hello_world/20230321T184856UTC/step_one.log hello world execution record ................ In addition to these log files, Proceed saves an execution record for each run. This is an auditable record of facts like: - the pipeline spec that was used - results for each step like image id, exit code, timing, and checksums of input and ouput files - overall timing .. code-block:: shell $ cat proceed_out/hello_world/20230321T184856UTC/execution_record.yaml .. code-block:: yaml original: version: 0.0.1 steps: - {name: step one, image: ubuntu, command: echo "hello world"} amended: version: 0.0.1 steps: - {name: step one, image: ubuntu, command: echo "hello world"} timing: {start: '2023-03-21T18:48:56.600323+00:00', finish: '2023-03-21T18:48:57.451028+00:00', duration: 0.850705} step_results: - name: step one image_id: sha256:08d22c0ceb150ddeb2237c5fa3129c0183f3cc6f5eeb2e7aa4016da3ad02140a exit_code: 0 log_file: proceed_out/hello_world/20230321T184856UTC/step_one.log timing: {start: '2023-03-21T18:48:56.600597+00:00', finish: '2023-03-21T18:48:57.441764+00:00', duration: 0.841167} skipped: false Here is some explanation of this "hello world" execution record. ``original`` This is the input pipeline spec, as parsed from ``hello_world.yaml``. The YAML formatting may differ somewhat from the input spec, but the content will be equivalent. ``amended`` This is a version of the original, potentially altered at runtime by :attr:`proceed.model.Pipeline.args` and a :attr:`proceed.model.Pipeline.prototype` The ``amended`` version is what actually gets executed, so it's worth recording this explicitly. In this example, the ``original`` and ``amended`` versions are the same. ``timing`` This records UTC datetimes when the pipeline started and finished, and the duration in seconds. ``step_results`` This is a list of :class:`proceed.model.StepResult`, one for each of the input :attr:`proceed.model.Pipeline.steps`. These step results will contain many of the interesting, auditable facts like unique image id, process exit code, and checksums of input and ouput files. See the linked API docs for more details. This example is about as simple as a Proceed execution record gets. The API docs for :class:`proceed.model.ExecutionRecord` lead to more examples of what can be included.