-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
Overview of the Issue
Upgrading from 1.12.9 to any higher version leads to a failure to start the cluster with a snapshot issue
Tried the upgrade to 1.13.9, 1.14.11 and 1.15.10 (Did all the necessary configuration adjustments before)
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Reproduction Steps
Steps to reproduce this issue, eg:
- We have a single node, as it's not our production environment
- Adjust the config files under /etc/consul directory to match the version you want to upgrade
- Run apt install consul=version
- See the error in the log down there
Consul info for both Client and Server
Client info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
{
"log_level": "TRACE",
"enable_syslog": true,
"log_file": "/var/log/consul/consul.log",
"syslog_facility": "LOCAL0",
"log_rotate_duration": "24h",
"log_rotate_max_files": 3,
"enable_script_checks": false,
"server_name": "integration-dt-ci",
"datacenter": "integration",
"primary_datacenter": "integration",
"bind_addr": "172.30.8.179",
"client_addr": "127.0.0.1",
"data_dir":"/var/lib/consul",
"tls": {
"defaults": {
"key_file": "/etc/consul/ssl/integration-dt-ci.key",
"cert_file": "/etc/consul/ssl/integration-dt-ci.pem",
"ca_file": "/etc/consul/ssl/ca.pem",
"verify_incoming": true,
"verify_outgoing": true
},
"internal_rpc": {
"verify_server_hostname": true
},
"grpc": {
"verify_incoming": false,
"use_auto_cert": true
}
},
"enable_central_service_config": true,
"enable_local_script_checks": true,
"ui_config": {
"enabled": true,
"metrics_provider": "prometheus",
"metrics_proxy": {
"base_url": "http://prometheus.service.consul:9090"
}
},
"connect": {
"enabled": true
},
"addresses": {
"http": "{{ GetAllInterfaces | include \"flags\" \"loopback\" | join \"address\" \" \" }} {{ GetInterfaceIP \"nomad\" }}"
},
"ports": {
"grpc": 8502,
"grpc_tls": 8503
},
"acl": {
"enabled": false,
"default_policy": "deny",
"down_policy": "extend-cache",
"enable_token_persistence": true,
"enable_token_replication": true,
"tokens": {
"agent": ""
}
},
"limits": {
"http_max_conns_per_client": 2000
}
}
Server info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
Server agent HCL config
Operating system and Environment details
VM on GCP running with Debian10 managed via terraform and Puppet
Log Fragments
Sep 19 14:31:39 integration-dt-ci systemd[1]: Starting consul agent...
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: connect: Connection refused
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: /dev/tcp/localhost/8502: Connection refused
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Starting Consul agent...
Sep 19 14:31:40 integration-dt-ci consul[5166]: Version: '1.15.10'
Sep 19 14:31:40 integration-dt-ci consul[5166]: Build Date: '2024-02-13 18:30:20 +0000 UTC'
Sep 19 14:31:40 integration-dt-ci consul[5166]: Node ID: '93a0a9fe-bf84-eb66-4165-c1453a578c54'
Sep 19 14:31:40 integration-dt-ci consul[5166]: Node name: 'integration-dt-ci'
Sep 19 14:31:40 integration-dt-ci consul[5166]: Datacenter: 'integration' (Segment: '<all>')
Sep 19 14:31:40 integration-dt-ci consul[5166]: Server: true (Bootstrap: true)
Sep 19 14:31:40 integration-dt-ci consul[5166]: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, gRPC-TLS: 8503, DNS: 8600)
Sep 19 14:31:40 integration-dt-ci consul[5166]: Cluster Addr: 172.30.8.179 (LAN: 9301, WAN: 8302)
Sep 19 14:31:40 integration-dt-ci consul[5166]: Gossip Encryption: false
Sep 19 14:31:40 integration-dt-ci consul[5166]: Auto-Encrypt-TLS: false
Sep 19 14:31:40 integration-dt-ci consul[5166]: Reporting Enabled: false
Sep 19 14:31:40 integration-dt-ci consul[5166]: HTTPS TLS: Verify Incoming: true, Verify Outgoing: true, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]: gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]: Internal RPC TLS: Verify Incoming: true, Verify Outgoing: true (Verify Hostname: true), Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Log data will now stream in as it occurs:
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN] agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Resolver state updated: {
Sep 19 14:31:40 integration-dt-ci consul[5166]: "Addresses": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]: "ServiceConfig": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]: "Attributes": null
Sep 19 14:31:40 integration-dt-ci consul[5166]: } ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [DEBUG] agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.619Z [TRACE] agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.622Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.635Z [INFO] agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO] agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [ERROR] agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO] agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [ERROR] agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [INFO] agent: Exit code: code=1
Sep 19 14:31:40 integration-dt-ci systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Resolver state updated: {
"Addresses": null,
"ServiceConfig": null,
"Attributes": null
} ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Exit code: code=1
Metadata
Metadata
Assignees
Labels
No labels