Skip to content

feat: add retry option and count to github action #953

@tobiasdemendonca

Description

@tobiasdemendonca

Our CI relies on the submit github action. It is currently quite flakey, with the majority of fails due to the following sort of error where MAAS fails to deploy a machine:

  This job is waiting on a node to become available.
  **********************************************
  * Starting testflinger setup phase on <name> *
  **********************************************
  Cleaning up container if it exists...
  **************************************************
  * Starting testflinger provision phase on <name> *
  **************************************************
  2026-03-01 00:32:16,961 <name> INFO: DEVICE CONNECTOR: Running pre-provision hook
  2026-03-01 00:32:16,962 <name> INFO: DEVICE CONNECTOR: MAAS: Reading node's block device information
  2026-03-01 00:32:19,157 <name> INFO: DEVICE CONNECTOR: BEGIN provision
  2026-03-01 00:32:19,157 <name> INFO: DEVICE CONNECTOR: Provisioning device
  2026-03-01 00:32:19,157 <name> INFO: DEVICE CONNECTOR: MAAS: Fixing EFI boot order before provisioning
  2026-03-01 00:33:07,545 <name> INFO: DEVICE CONNECTOR: MAAS: Fixing EFI boot order before provisioning
  2026-03-01 00:33:55,931 <name> INFO: DEVICE CONNECTOR: MAAS: Releasing node <name>
  2026-03-01 00:34:11,260 <name> INFO: DEVICE CONNECTOR: MAAS: Successfully released node <name>
  2026-03-01 00:34:13,891 <name> WARNING: DEVICE CONNECTOR: MAAS: 'default_disks' and/or 'disks' unspecified; setting default storage layout to flat
  2026-03-01 00:34:16,445 <name> INFO: DEVICE CONNECTOR: MAAS: Acquiring node
  2026-03-01 00:34:19,817 <name> INFO: DEVICE CONNECTOR: MAAS: Starting node <name> with distro noble
  2026-03-01 00:34:39,785 <name> INFO: DEVICE CONNECTOR: MAAS: Timeout value: 60 minutes.
  2026-03-01 00:35:39,846 <name> INFO: DEVICE CONNECTOR: MAAS: 1 minutes passed since deployment.
  2026-03-01 00:36:42,533 <name> INFO: DEVICE CONNECTOR: MAAS: 2 minutes passed since deployment.
  2026-03-01 00:37:45,244 <name> INFO: DEVICE CONNECTOR: MAAS: 3 minutes passed since deployment.
  2026-03-01 00:38:47,763 <name> INFO: DEVICE CONNECTOR: MAAS: 4 minutes passed since deployment.
  2026-03-01 00:38:50,301 <name> ERROR: DEVICE CONNECTOR: MAAS: MAAS reports Failed Deployment
  2026-03-01 00:38:50,301 <name> ERROR: DEVICE CONNECTOR: Provisioning failed: Provisioning failed because MAAS got unexpected or deployment failure status signal.
  2026-03-01 00:38:50,301 <name> INFO: DEVICE CONNECTOR: END provision
  2026-03-01 00:38:50,301 <name> ERROR: DEVICE CONNECTOR: Provisioning failed because MAAS got unexpected or deployment failure status signal.

To reduce flakiness, perhaps a retry option could be useful for this action, perhaps allowing a user to specify number of retries too.

Action version used:
canonical/testflinger/.github/actions/submit@987fd7ab1467065217a9c6496eb2c3d85a250509

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions