Skip to content

Error when trying to upgrade consul version #22797

@samuelvgo

Description

@samuelvgo

Overview of the Issue

Upgrading from 1.12.9 to any higher version leads to a failure to start the cluster with a snapshot issue
Tried the upgrade to 1.13.9, 1.14.11 and 1.15.10 (Did all the necessary configuration adjustments before)

Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"

Reproduction Steps

Steps to reproduce this issue, eg:

  1. We have a single node, as it's not our production environment
  2. Adjust the config files under /etc/consul directory to match the version you want to upgrade
  3. Run apt install consul=version
  4. See the error in the log down there

Consul info for both Client and Server

Client info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
{
  "log_level": "TRACE",
  "enable_syslog": true,
  "log_file": "/var/log/consul/consul.log",
  "syslog_facility": "LOCAL0",
  "log_rotate_duration": "24h",
  "log_rotate_max_files": 3,
  "enable_script_checks": false,
  "server_name": "integration-dt-ci",
  "datacenter": "integration",
  "primary_datacenter": "integration",
  "bind_addr": "172.30.8.179",
  "client_addr": "127.0.0.1",
  "data_dir":"/var/lib/consul",
  "tls": {
    "defaults": {
      "key_file": "/etc/consul/ssl/integration-dt-ci.key",
      "cert_file": "/etc/consul/ssl/integration-dt-ci.pem",
      "ca_file": "/etc/consul/ssl/ca.pem",
      "verify_incoming": true,
      "verify_outgoing": true
    },
    "internal_rpc": {
      "verify_server_hostname": true
    },
    "grpc": {
      "verify_incoming": false,
      "use_auto_cert": true
    }
  },
  "enable_central_service_config": true,
  "enable_local_script_checks": true,
  "ui_config": {
    "enabled": true,
    "metrics_provider": "prometheus",
    "metrics_proxy": {
       "base_url": "http://prometheus.service.consul:9090"
    }
  },
  "connect": {
      "enabled": true
  },
  "addresses": {
      "http": "{{ GetAllInterfaces | include \"flags\" \"loopback\" | join \"address\" \" \" }} {{ GetInterfaceIP \"nomad\" }}"
  },
  "ports": {
    "grpc": 8502,
    "grpc_tls": 8503
  },
  "acl": {
    "enabled": false,
    "default_policy": "deny",
    "down_policy": "extend-cache",
    "enable_token_persistence": true,
    "enable_token_replication": true,
    "tokens": {
      "agent": ""
    }
  },
  "limits": {
    "http_max_conns_per_client": 2000
  }
}
Server info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
Server agent HCL config

Operating system and Environment details

VM on GCP running with Debian10 managed via terraform and Puppet

Log Fragments

Sep 19 14:31:39 integration-dt-ci systemd[1]: Starting consul agent...
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: connect: Connection refused
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: /dev/tcp/localhost/8502: Connection refused
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Starting Consul agent...
Sep 19 14:31:40 integration-dt-ci consul[5166]:               Version: '1.15.10'
Sep 19 14:31:40 integration-dt-ci consul[5166]:            Build Date: '2024-02-13 18:30:20 +0000 UTC'
Sep 19 14:31:40 integration-dt-ci consul[5166]:               Node ID: '93a0a9fe-bf84-eb66-4165-c1453a578c54'
Sep 19 14:31:40 integration-dt-ci consul[5166]:             Node name: 'integration-dt-ci'
Sep 19 14:31:40 integration-dt-ci consul[5166]:            Datacenter: 'integration' (Segment: '<all>')
Sep 19 14:31:40 integration-dt-ci consul[5166]:                Server: true (Bootstrap: true)
Sep 19 14:31:40 integration-dt-ci consul[5166]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, gRPC-TLS: 8503, DNS: 8600)
Sep 19 14:31:40 integration-dt-ci consul[5166]:          Cluster Addr: 172.30.8.179 (LAN: 9301, WAN: 8302)
Sep 19 14:31:40 integration-dt-ci consul[5166]:     Gossip Encryption: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Auto-Encrypt-TLS: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Reporting Enabled: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:             HTTPS TLS: Verify Incoming: true, Verify Outgoing: true, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Internal RPC TLS: Verify Incoming: true, Verify Outgoing: true (Verify Hostname: true), Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Log data will now stream in as it occurs:
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN]  agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Resolver state updated: {
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "Addresses": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "ServiceConfig": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "Attributes": null
Sep 19 14:31:40 integration-dt-ci consul[5166]: } ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [DEBUG] agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.619Z [TRACE] agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.622Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.635Z [INFO]  agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO]  agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [ERROR] agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO]  agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [ERROR] agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [INFO]  agent: Exit code: code=1
Sep 19 14:31:40 integration-dt-ci systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Resolver state updated: {
                                                  "Addresses": null,
                                                  "ServiceConfig": null,
                                                  "Attributes": null
                                                } ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Exit code: code=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions